Jon Senior Project

with No Comments

In the past couple of decades, there has been a significant growing amount of research on Natural Language Processing (NLP) which has largely been motivated by its enormous applications. Some of the well-known systems that use NLP techniques include Siri from Apple, IBM Watson and Wolfram|Alpha. There has also been much work done on building efficient NLIDBs that allow people without SQL backgrounds to query a SQL database in natural language. For instance, a research team at Salesforce developed WikiSQL with the aim of democratizing SQL so databases can be queried in natural language. WikiSQL is a large crowd-sourced dataset for developing natural language interfaces for relational databases that generates structured queries from natural language using Reinforcement Learning.

The purpose of my senior project is to solve the inequitable distribution of a crowd’s resources. The goal is to build a large Natural Language to Interfaces Database System for the Sharing and Gig economies. In other words, this means building a database of our current resources and services that can be queried and modified in the English natural language.

Given the scope of this project I will start with a small database for the Earlham student community. The application will connect students with certain needs with students who can fulfill those needs. I will start with simple queries and sentences related to the following contexts: Homework, Transportation, and sharable items.

Facilitating the connections between crowd members requires communication between the users and the database. The functionality of the application will be dependent on the constant input of information from users about their daily activities so that the algorithm will be better able to connect users. I realized that communicating with a chat-bot in natural language will be the best option to facilitate the constant input of information. I decided to use one the most widely used relational database management systems, PostgreSQL. Hence, the goal of this project is to democratize SQL so that users can query the SQL database in natural language (for example: “Who graduated from Earlham College this year?”) and modify the SQL database in natural language (for example: “I graduated from Earlham College this year”). There is huge potential in such systems where people can query a database system using natural language as it can create accessibility to a lot of people without SQL backgrounds.

Description of Paper
The paper will include an outline of and an introduction to Natural Language Processing (NLP). I will base my final paper significantly on my survey paper. Therefore it will contain sections on aspects of NLP, such as Natural Language Understanding, Knowledge Bases, Information Extraction and part-of-speech tagging. However, the primary focus of the paper will be on comparing the techniques discussed in my survey paper.

Description of Software
The application will consist of the following components:

  • User Interface – The application user interface will be a web application through which users can query and modify the SQL database. The frontend will be built using React and BootStrap CSS. The plan is to build a chat area where a user can communicate with the chatbox by typing messages. The results of the queries and sentences will appear beside the chat-box. Since this application is only for Earlham students at the moment, people will be able to login only with Earlham email addresses.
  • Server – The server will be built using Python’s Django framework since the main natural language processing component will be written in Python as well. It will essentially serve the results to the queries made on the user interface as well as update the database based on the information given.
  • Natural Language Processor – This is the main component of the application which will require the most amount of time and effort. Essentially the goal of this component is to identify characteristic patterns in natural language queries and sentences and convert them into SQL statements. The natural language processor will be implemented according to the algorithm described in the paper.