Stock Market Trend Prediction Using Sentiment Analysis

with No Comments


For decades people have tried to predict the stock markets. Some have used historical price trends to predict future changes, while others rely on their gut feeling to make predictions. The prices of stocks reflect the overall confidence the market has on the stocks. The level of this confidence, according to the behavioral economics, is a collective of society’s emotions towards a particular stock, which to some extent influences their decision-making. However, is there a way to know the collective mood of society towards a stock? Can this mood be extracted from newspaper articles and magazines? To address this question, I turn to the field of Natural Language Processing. With the help of various sentiment dictionaries, I ran various types of sentiment analysis over 1.7million newspaper articles published in The Guardian between 2000 and 2016. I then chart the changing sentiments over a time period against the various stock market indices to see whether or not news sentiment is predictive of economic indicators such as stock prices.



Software Architecture :


GitHub Link

Research Paper


Final Project

with No Comments



My Senior Project will be based on improving the accuracy of machine learning algorithms by using various statistical methods to correctly pre-process a dataset and then feed it into a machine learning algorithm.

The software part of my project will be based on designing a platform that works with R studio to run these tests on a dataset and then feed it to a machine learning algorithm and then analyze the results. This software recommends the series of ‘knobs’ that can turned on the dataset to better it for the algorithm. My paper will be based on the results that I found and whether or not pre-processing made a difference in the performance of these algorithms.

Survey Paper

with No Comments

Topic 1 : Data Mining Analysis and Prediction 

Week 1: March 12- 18

Went into more detail on the annotated bibliographies, organized the order in which would will help my paper and best fit the flow of ideas.

Week 2:   March 19 – 25

Looked into what the data mining tools and application that are mentioned in the papers. Checked if they could fall within the scope of the work that I want to do. Created the overall outline for my paper including how the major topics and methodology would progress. Created the workflow diagram and looked at other previously done Survey paper. Latex seems like a good tool to use in addition to Zotero to create templates.

Watched some TED talk videos on the topic:

Aaron Koblin: Visualizing ourselves … with crowd-sourced data

Hans Rosling: The best stats you’ve ever seen

Mathias Lundø Nielsen :How to Monetize Big Data

Week 3: March 26 – 31

Started connecting major topics in terms of how they fit the block structures for my paper and compiled paragraphs on some topics. Looked into other previously done works mentioned in the papers regarding Data Mining and the tools used in those research.


Week 4: April 1- 7 

Building on the outline and creating diagrams mentioned. Mostly going through papers to build on the brief few sentences mentioned for each topic.

Week 5: April 7 – 14

Worked on the second draft. Added more content to the paper, removed a couple of subtopic.

Week 6: April 14 -21 

Finished up the survey paper with all necessary topics and figures and diagrams as well as the conclusion.

Annotated Bibliographies

with No Comments


T1-  Data Mining, analysis and prediction 

Topp, N., & Pawloski, B. (2002). Online Data Collection. Journal of Science Education and Technology, 11(2), 173-178.

This paper touches on the history online data collection, some brief review of the more recent progress and work that is being done as well as how a database connected to the Internet collects data. It also presents a brief insight into where these methods might head towards in the future. Overall, this is a short 7-page article to give a good insight and a starting point as well good references.


Hand, D., Blunt, G., Kelly, M., & Adams, N. (2000). Data Mining for Fun and Profit. Statistical Science, 15(2), 111-126.

This is a more detailed paper regarding the different tool, models, patterns and quality of data mining. Even though it was written in 2000 is very useful is terms of getting a broader idea of model building and pattern detection. It looks at statistical tools and their implementation as well as the challenges to data mining through well explained examples and graphs.


Edelman, B. (2012). Using Internet Data for Economic Research. The Journal of Economic Perspectives, 26(2), 189-206.

Economist have always been keen to collect and analyze data for their research and experimentation. This paper introduces how data scraping has been employed by companies and businesses to extract data for their use. It is an excellent paper that combines data scraping with data analysis and where and how it has been used. It sets the foundation for data analysis and lists various other good papers in the particular field.



Buhrmester, M., Kwang, T., & Gosling, S. (2011). Amazon’s Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? Perspectives on Psychological Science, 6(1), 3-5.

Amazon’s Mechanical Turk helps bring together a statistician’s dream of data collection and an economist’s love for data analysis. It has proved to be an excellent platform to conduct research in not only economics but also psychology and other social sciences. This is a very short 4 page paper that looks at the mechanical Turk, what it has helped research and conclude and how it has been used to obtain high quality inexpensive data. This paper is significant in a sense that it is an application of the above-mentioned tools of collection, analysis and possibly prediction.


T2- A more informed Earlham : Interactive Technology for Social change

1/ Vellido Alcacena, Alfredo et al. “Seeing Is Believing: The Importance of Visualization in Real-World Machine Learning Applications.” N.p., 2011. 219–226. Web. 20 Feb. 2017.

2/ “And What Do I Do Now? Using Data Visualization for Social Change.” Center for Artistic Activism. N.p., 23 Jan. 2016. Web. 20 Feb. 2017.

3/ Valkanova, Nina et al. “Reveal-It!: The Impact of a Social Visualization Projection on Public Awareness and Discourse.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, 2013. 3461–3470. ACM Digital Library. Web. 20 Feb. 2017. CHI ’13.


T3– CS for all : Learning made easy.

1/  Muller, Catherine L., and Chris Kidd. “Debugging Geographers: Teaching Programming To Non-Computer Scientists.” Journal Of Geography In Higher Education 38.2 (2014): 175-192. Academic Search Premier. Web. 20 Feb. 2017

2/ Rowe, Glenn, and Gareth Thorburn. “VINCE–An On-Line Tutorial Tool For Teaching Introductory Programming.” British Journal Of Educational Technology 31.4 (2000): 359. Academic Search Premier. Web. 20 Feb. 2017.

3/  Cavus, Nadire. “Assessing The Success Rate Of Students Using A Learning Management System Together With A Collaborative Tool In Web-Based Teaching Of Programming Languages.” Journal Of Educational Computing Research 36.3 (2007): 301-321. Professional Development Collection. Web. 20 Feb. 2017.

Final Abstracts list

with No Comments

T1–  ::Data Mining, analysis and prediction 

This survey paper will first look at the tools used to gather and store data from user and other domains. It will then look at how, in the past, others have worked with data to make co-relations and predictions. It will then look attempt to look at publicly available data and try to find correlation with other market data. Our focus here will be to see the extent to which one data can be abstractly analyzed and linked to others and with what degree of certainty. It will involve working with a lot of data and analyzing it to find trends and patterns and possibly making predictions.


Topic 2 – CS for Social Change and Sustainability


Every year the different branches of campus such as Health Services, facilities, Public Safety, ITS and the registrar’s office send out emails to students that are lengthy reports which no one ever reads. Earlham facilities keep records on energy consumption that the students seldom look at and every now and then there are issues around campus that divides the student body but students rarely get to vote on.

To address these problems I suggest a mobile survey app that allows students to vote on issues as well as view various data from departments around the campus. These survey results and data will also be dynamically displayed on screens around the campus. It would involve learning and implementing graphic interface tools as well as visualization programs. If we link this through quadratics (as is done for student government voting), we can make sure that only Earlham students get to vote and each student gets to vote only once.

The ability to view data and trends on key statistics across from these departments would certainly help the students in a better-informed position and in a place to bring change.


T3 – CS for all

As I see my Econ professors struggle with STATA (a simple tool to work with data through commands), I cannot help but draw parallels on how it first felt to learn programming. Reality is that most people without a CS background have difficulty in learning these new tools and softwares. Softwares, most of which are outdated in their use, but, are still taught to students who usually resort to memorizing them to pass midterms. I think that it would be very helpful if we as CS students can help discover, learn, teach as well as document these softwares and help other departments. I propose an interactive interface like Code-academy where students are given tutorials that go progressively forward in complexity. Co-ordination from these departments would be essential to understand their needs and create an interface catered to help their students learn from scratch.


{ possible additions could be log-in mechanism via moodle to ensure students are spending the amount of time they should be taking these interactive courses”}



Capstone Abstracts- V1

with No Comments


T1 – One data predicts another

This survey paper will look at publicly available data and try to find correlation with other market data. For example, it would study how weather patterns or viral news stories could correlate to stock prices for certain stocks. It will try to see to what extent one data can be abstractly analyzed and linked to others with what degree of certainty. It will involve working with a lot of data and analyzing it to find trends and patterns.

Possible Ideas to build on: 

— > Will be looking into Behavioral economics and how certain events follow another. Using this, I will look for places to extract co-related data. 

— > Will involve a fair bit of learning STATA to work on data and derive co-relations. Some statistical modeling would be helpful. 

—> Stock market data is usually well kept however similar day to day data is rarely seen in other places. One possible topic being finding co-relations is to look in unusual places within the stock markets. for example: Boeings stocks might be brought down by President Trump’s tweets but what other markets have shown unusual reactions to his tweets. Perhaps a comparison of market changes with key words in tweets of with the most popular people on twitter on that area. 


T2- Computers, data and everything else.

This survey paper will look at how the trends and tools of data analysis have changed within the stock markets and particularly with the field of Economics. Infamously labelled “the dismal science”, economist are only now able to collect and manipulate data to prove their theories. It will look at how data analysis because of modern computing is affecting other fields.

Possible Ideas to build on: 

—> Databases used in the stock markets and how they have eased day to day operations. 

—> Other popular mass scale data collection tools and how development in computing has changed their workings. { This would be more of a history digging up, I would look up how and why the successors were picked over their predecessors.}} 

—> Some bits of this project could be used on the first idea. 


T3 – Data Mining

This survey paper looks at how and what data is being extracted from users and in what ways companies are storing and profiting from it. It looks at targeted advertisements, cyber security, the algorithms working in the background and the databases that sell our data.

Possible Ideas to build on: 

—> Look into tools of data mining. The use of cookies and pop up ads and data extraction from search bars. How are these companies getting smarter every day, what loopholes in are they employing.  How they create a virtual personal of you based on what they know about you so far. 

—> Learn how the government has in the past used data from social security and taxes to analyze various sociological aspects. Where else has such data analysis existed within the computer science. How can the two be related ?