Attached are the final versions of my paper and poster
This week, I spent time figuring out how to make the software publication ready, and discussed with Craig on whereabouts of the server and database
This week, I finalized my poster and prepared my paper for the evaluation draft submission. I also met with Craig to discuss taking the application live.
This week, I started working on my poster, and revised my paper based on the feedback I received from Dave and Ajit
This week I tested out the reader with student interface, and checked if self-check in and check-out worked properly. I also met with Craig and discussed plan to migrate the application to the server and perhaps have it ready for the EPIC expo.
This week I spent most of my time working on the paper, and finished the implementation and design sections. I also started working on the administrator interface and got a good portion of it done.
This week I finished implementing a rough version of the student user interface. I spent a considerable time discussing the logic behind to student check out and check and what measures were necessarily to put in. I received feedback from Ajit on the design, and modified my approach based on that.
This week I discussed the structure of my paper with Ajit, and received feedback on how to explain the design and implementation sections. I also started working on implementing the student user interface.
This week I finalized the schema for the database with Ajit, and familiarized myself with the PostgreSQL commands after receiving the log in information from Craig. I faced an unexpected challenge with the ordering of the RFID device from Ebay. Instead I researched for two days and found a few other cost friendly options in the US, and have proposed to the department to purchase one of them.
I am planning to finish all the software end of the project by the time I get my hands on the device!
This week I met with Ajit and discussed some of the necessary features for the user interface that the administrator will be using. This involved seeing a list of recently checked out items, adding new objects, and adding new users. I also met with Craig and discussed the back-end work. We decided to use Django and PostgreSQL.
For the past week I did:
- Completed data collection from Alpha Vantage API
- Started working on neural network implementation
- Met with my advisor and went over my main argument for my project
This week I will do:
- Keep working/complete on neural network implementation
- Complete the outline
This is my first post for CS488.
- Met with Ajit to plan the next few weeks, and decided to start the data collection this week.
- Edited the timeline a little bit to include implementation of the project.
- I’ve shared my box folder with Dave and Charlie and will soon be contacting Andy Moore for access to volumetric data collected by the Geology students across campus.
My first idea is for a application that allows the computer to be navigated with gesture control, initial thought is to use the camera that is on almost every laptop to map the mouse pointer between say the thumb and the forefinger, and when the thumb and forefinger touch emulate a the click of the mouse. Further interface could also be implemented such as a virtual keyboard or talk to text features, basically attempting to replace a mouse and keyboard, further research needed.
My second idea is either a stand alone software or a Photoshop add on for real time pixel art animation editors. Given a sequence of images with a specified distance apart, color pallet and speed at which to move through the images, one could make a change and the animation would update real time, also allowing the change of color pallets.
My third idea is a personal budget planning and expense tracking app. I person can track what they buy by inputting the cost of an item and categorize that item falls into (possibly further subcategories for more in depth statistics) ie $16.69 on groceries on 1/21/19, $32.55 on cloths on 1/22/19 etc. One can input there salary and how much they want to not spend and the app could keep track and suggest a budget for you, give statistics about your spending patterns etc.
This week I met with Ajit for an hour. We went over timeline and design of my project. I also met with Craig and ordered the RFID reader and tags after approving them.
For the past week I did:
- Obtained data from Alpha Vantage API
- Updated the project timeline for the semester.
- Prepared a topic presentation
- Met with my advisor and went over my updated project timeline.
This week I will do:
- Read, experiment, and implement neural network
This week, I was focused on making the poster and including the final results of my research.
During the past 2 weeks I have:
– Been able to program the communication between the Chrome extension and the classifier, with some modification in the project design. Instead of using Native Messaging, I set up a Flask server and make the two programs communicate through the HTTP POST method.
– Worked on the front end of the Chrome extension which displays the percentage breakdown of political ideologies in a news article.
– Worked on my poster and submitted it to my advisor for feedback.
In the coming week, I plan to work on the final draft of my paper given the feedback I received from the 2nd draft. I will also finalize the project as well as the poster
Submitted the CS second draft on Wednesday and waiting for Xunfei’s Feedback. The demo presentation went well without much questions from the audience. On working on making the poster for the presentation on Dec 5.
- Met with Ajit to talk about the follow up steps on Monday.
- Based on the feedback, we decided to focus efforts on the following:
- Updating the design framework/diagram,
- Writing and explaining the design of the project,
- Read other published papers to get an idea of the structure of the paper,
- Add transition paragraphs in the paper.
During the last week I encountered a few problems with my project
- While working on the demo, I noticed that the classifier is not doing a good job classifying liberal news sites. It classified a few articles on CNN and Slate as either conservative or neutral. This can be due to the fact that these articles, uses a lot of conservative related terms to criticize the conservative perspectives, which misled the classifier to classify them as non-liberal
- The native messaging module to communicate between the classifier and the chrome extension refuses to connect when the classifier module is imported. I have discussed this with Ajit and we are both working on it. Once we figure out a solution, the extension can communicate with the classifier so the remaining work should be minimal and mostly front-end related.
Worked on the demo presentation. Experimented with 2 datasets each taking 4 hours of run time. One observation I found is that changing the labels of the fake news changes the accuracy. It was found that detecting reliable news rather than fake news was statistically better in performance.
- Did second pass for more papers for this week, focusing on the design, and processes used,
- Met with Ajit on Monday to discuss the project and plan rest of the semester,
- Took Quiz 5 for Project update.
During the past week I was able to:
- Improved the average accuracy, recall, and F1-score of my MLP classifier to 80%
- Implemented goose3, a news scraping API to query raw texts from news articles, handled the data, vectorized the sentences and pass them to the classifier. However, the classifier currently fails to identify a lot of liberal or conservative sentences on news sites, and usually outputs 100% neutral sentiment even for historically biased sites such as Breitbart and Huffington Post. This result is surprising given the test results attained from the given dataset. My hypothesis is that these news articles employs metaphors and perhaps subtly biased sentences that make it difficult for an MLP classifier to detect.
- During the next week, I plan to us Native Messaging to connect the Chrome Extension I made with the classifier so that the Chrome Extension has its most important functionality.
The front end development part for my project is almost complete. The database setup and connection is completed. The integration of backend machine learning model with flask has worked. The flask prediction model has also been linked with the frontend. There are some issues with websites that are not blog and I am fixing the issue. Next step is to make a retrainable model.
I have been reading more recent papers on the topic, and I came across with the paper recently published (Oct 22nd, 2018).
The paper is called “Predicting Chinese Stock Market Price Trend Using Machine Learning Approach.”
- Did second pass for three papers for this week,
- Worked on First draft for proposal.
During the past week, I have been working mostly on the Google Chrome extension. I encountered some problems with scanning data from a given website. The extension listener does not seem to process DOM objects correctly and it couldn’t query the HTML content of a page. I will have to look more into the document of Chrome extension in order to get that to work. I have also experimented different activation functions for my neural network and so far the best result I got was 67% accuracy. I plan to look more into other specs of the MLP that can be modified in order to achieve at least 70% accuracy before integrating it into the Chrome extension
Worked on the front end component of the project. With the sys admin telling me that they cannot host flask for my project. I started to look for alternatives. Heroku could not be used as it did not provide support for sci-kit the way I needed. Worked with Ajit to edit my first draft paper. Mades some figures of the architectures.
I reviewed the papers I read for my literature review, and also I collected more papers to read for my topic proposal.
Since I would like to focus on the evaluation of existing machine learning predictive models, I do not plan to come up with new algorithms.
One thing that puzzles me with all the papers I read is that they all claimed that they performed well with high prediction accuracy. But my question here is that if they really work, then why don’t people use those algorithms earn a profit?
I think there are problems with evaluation methods in these types of research and I would like to focus on that in my project.
Because the paper by Patel et al. is close to what I want to do, I am thinking of implementing the algorithms (ANN, SVM, random forest, and used in the paper and focusing more on the evaluation methods.
The paper “Natural language based financial forecasting: a survey (2017)” surveys different evaluation methods for stock prediction models. Although the paper focuses on NLP on stock prediction, the evaluation methods can be applied for my project as well.
- Did second pass for three papers, posted on the box,
- Took Quiz 4 for CS388,
- Chose 3 papers for next week.
In the past week, I was finally able to get my MLP classifier to train and test with the IBC dataset without any error. The average result so far is at 66% with a standard-parameterized MLP in sci-kit learn. In the next week I hope to dig more into this to improve the result by using different, more suitable parameters and optimization functions for this task. I have also worked on the first draft of my paper and sent it to my adviser for feedbacks. The feedback is positive, the paper is submittable but we will talk more about how to improve it during our one-on-one this Friday. Overall it has been a productive week in terms of both my project progress and my paper.
Finished the machine learning pipeline with actual experimentation. Having issues getting Flask setup and have been in touch with the SysAdmins. Halfway through the first draft of the paper. Made new design for the system architecture.
- Finished literature review.
- Selected 1 topic from the remaining two for Quiz 3 (proposal)
During the past week I have:
– Cleaned up the data, converted it to the CSV file, tried feeding it to the MLP classifier. However, I’m running into some error scaling it before feeding the data to the MLP. I will have to look into another scikit learn tutorial for text classification this week and try to use it for my project.
– Talked to Ajit regarding the structure of my paper, continue working on the paper “Previous work” section
– Initialized a basic chrome extension and will look more into how it can scan data on a current webpage and its url address
After reading more about literature on stock prediction with sentiment analysis, I came to the conclusion that it is very difficult to obtain text data that is relevant for my research for free. I looked at Twitter streaming API and thought of start collecting data from now to next semester (3 to 4 months worth of Tweets). But the problem with Twitter streaming API is that it has request limits for free accounts and I will only be able to collect a small fraction of data.
Thus, I decided to work on the project, evaluating machine learning models for stock prediction.
The problem with previous studies is that each researcher used a different evaluation method, and thus, it becomes a challenge to come to a definite conclusion about which approach is the best. Moreover, evaluating a strategy with one criterion does not give an accurate assessment of a strategy. For example, an algorithm may have high directional accuracy, but it does not mean that the strategy can make a greater profit.
Therefore, I will backtest each machine learning algorithm with three different backtesting metrics: accuracy, closeness, and trading simulation.
I did an experimentation with Sci-kit Learn. The run-time for the program was more than 2 hours. Testing the multiple dataset has been an issue lately.
Progress on the draft of the paper. Related works is almost completed.
For the past week, I worked on literature reviews on two topics: stock prediction with natural language processing and stock prediction with machine learning.
One of the most important findings is that each researcher used different datasets to predict different prediction targets. Thus, it becomes a challenge to come to a definite conclusion on which approach is the best. For future research, backtesting different approaches on the same datasets over multiple stocks or indexes will be necessary to fairly compare the performance of different algorithms.
- Started 2nd pass for the papers.
- Continued work on literature review.
During the past week I have:
– Gotten close to getting the data to work. Right now the data is divided into leaves and nodes. I’ve been able to get the node-level labels but not the leaf-level labels. Once that is finished, I can quickly start training the classifier with the leaves (which represent words)
– Discussed to Ajit regarding the possibility of implementing cosine similarity between words for better training results, in case word2vec is not implementable. (There’s a chance that integrating word2vec into the classifier is out of the scope of this project)
– Started working on the Design Section of my paper
– Have an initial paper outline ready to be finalized with Ajit.
Worked on the first draft of the paper. Focusing on the related works and findings currently.
For the past week, I did the second reading for the 15 papers for the annotated bibliography and also prepared the presentation for the class.
I narrowed down my topics into two: stock prediction with sentiment analysis and stock prediction with machine learning. Major findings are the followings:
- Researchers use different sources for texts (twitter, news article, other social media etc)
- Researchers use different prediction targets (market movement, individual stocks, etc)
I will be working on literature reviews for this week.
- Finished bibliography.
- Selected two topics and prepared topics presentation.
- Started literature review.
In the past week, I have been working on the related for my three project ideas. I wrote an annotated bibliography for each topic with five different sources. This week got rid of one of the ideas and decided to work on these two:
- Sleep Pattern Classification
- Modified method for HDI calculation
During the past week, I was able to query the data from the IBC and looked more into it. It has come to my attention that the IBC was labeled both on the word level (leaf) and the sentence level (tree), with the leaf-level labels noisier. I will have to look more into which label level to use to train the neural net. I also implemented a simple classifier using MLP in Sci-kit learn using Jupyter notebook. I will have to look into how initializing a word vector matrix will help with this classification task (or maybe it won’t)
Build a rough machine learning pipeline for testing. Worked with Ajit to update timeline. Started with the first draft of paper.
- Continued working on Annotated Bibliography:
- Found more papers on the topics,
- Did the first pass reading for all of them.
There is no big update on my work for the past week.
I will be working on the annotated bibliography due this Friday.
During the past week I have:
– Started writing the introduction of my thesis paper
– Started implementing the first module of the project (word matrices initialization)
– Looked into the IBC corpus data, whose sentences are implemented as a tree data structure. I have queried some data to check for its integrity.
– Discussed to Ajit to agree on a timeline for writing my paper specifically
– Started looking into the concept of data pipeline, which shows potential to be implemented as a way to transfer and process data in this project (going from a web page through a google chrome extension to a classifier, and back)
Created a smaller dataset using pySpark for training and testing the fake news model.
- Met with Andy Moore and talked about projects regarding natural disasters.
- Realized that many of the ideas were too big for a semester, and started researching in Earthquake Early Warning systems
- Worked on Quiz 2, and collected papers for each topic.
- Worked on Annotated bibliography.
Over the past week, I was able to make the following progress with my project:
– Came up with a detailed design of my project, which lists all the modular components and their functions. These components might change as the implementation phase takes place, but so far it will be a guideline to stick to.
– Discussed with my project advisor (Ajit) and agreed on a timeline, at least for the next 3 – 4 weeks, which divides the tasks of writing the paper, implementing the neural network and the Google Chrome extensions.
– Obtained the IBC dataset with 4,000+ labeled sentences from the authors themselves. This is the most crucial part of implementing, training and testing the neural network. I will spend the next week to learn about the dataset, do some data cleaning if necessary and start implementing the CNN in SciKit Learn.
– Installed SciKit learn and started getting myself familiar with the tool by taking a course on Udacity. Also read about CNN and MLP implementation in SciKit learn
After skimming through 15 papers over three different topics, I am still most interested in the topic of “generate sentiment-based stock trading signals through NLP.” Since last week, I started taking a Coursera course called applied text mining in Python. I believe the skills I will acquire in this course will be helpful for my senior capstone project next semester if I decide to do the topic above. I also contacted a CS senior from last year who have done his senior capstone project in NLP. He kindly shared his senior capstone paper as well as some helpful resources for my research.
Worked with setting up sci-kit learn and testing environment. Got Craig to give me access to Pollock and Bronte.
- Read the papers about how to read a paper.
- Met with Michael Lerner regarding one of the strategies talked about in his research last year.
- Found 5-7 papers related to each of the three topic areas.
- Attempted the Quiz 1 for CS388.
Met with my advisor twice, worked on an updated timeline. Worked out a design framework and prepared the presentation slides.
For the past week, I did the following:
– Found research papers on my topics
– Skimmed through several papers and gained the better understandings of the topics
– After skimming through papers on my possible three topics, I am most interested in the topic “generate sentiment-based stock trading signals through NLP.”
For this week, I plan to do the followings:
– Start reading papers with greater care
– Start building an annotated bibliography
For this week, I plan to:
- Finalize the project timeline
- Look into how to plug the sample data into the classifier and link this functionality to the Chrome extension
- Take the Udacity crash course on Supervised learning
- Set up PyTorch and look for packages that implement CNN
In the last week I have:
- Reached out and had Ajit as my project adviser
- Contacted the IBC author to request the dataset
- Started reading about PyTorch and how to set them up
- Started looking into online crash courses about CNN
- Talked to my adviser about project design and next week’s plan
I talked with Dave about my senior project ideas as I had a concern that former students have done similar research previously.
The followings are some of the takeaways from the discussion:
- Finding a good niche within a field is a key. I should utilize my background in finance to do so.
- Having broad ideas where I could differentiate my project before delving into research is helpful.
- But I do not need to stick to the initial idea if I find something better along the way.
For this week, I plan to do the followings:
- Read several papers on my three ideas and get the better understanding of what have done in the field.
- Brainstorm some niche fields where I could apply ML into the stock prediction
- Met with Ajit to filter ideas regarding parallel computing, and machine learning.
- Emailed Andy Moore in Geology to talk about Earthquake and Tsunami predictions.
- Emailed Charlie for suggestions regarding my Structure From Motion idea.
- Searched for more specific details on work done in similar areas.
Started the project pipeline for Fake News Detection.
For next week, I plan to
- Set up the environment on my computer for SciKit learn (or potentially PyTorch)
- Collect data (the IBC) by emailing the authors
- Read through documentation and familiarize myself with SciKit learn and supervised learning.
- Reach out to potential project advisers
- Looked for three general areas that I want to do my research in, namely:
- Structure from Motion
- Disaster prep and management
- Parallel Computing
- Searched for some related work that has happened in these areas.
- Pillow AI: I am thinking of having Arduino device built-in the pillow, which can be charged and have heart-rate sensors to receive heart-rate while the person is sleeping. Having this data, I could determine sleeping patterns and find the Light Sleep phase. After having a light sleep phase, I can send the alarm signal to the phone to wake up a person closest to the time when they set the alarm.
- Signature Recognition: I can use some deep learning algorithms, create a testing dataset and collect signatures of a lot of people. After that, I want to determine if the given signature is fake or a person’s real one.
- Human Development Index: I’ve been working on this project with my Econ professor this summer on the research, but the project turned out to be so exciting I might use it for my senior research project. So the basic idea is that have a platform(website) where people can go and choose their indicators(whatever they think are essential for country’s development) for the Human Development Index and get a new ranking of those countries. Keeping track of all given inputs from users, I can make some cool data analysis with it. Back-end will be python with pandas library, and dataset will be streaming live Restful API from Worldbank database.
Idea 1: A web application that lets users test the performance of systematic trading strategies with user selected parameters.
The purpose of this application is to let users try out different inputs and test how his/her trading strategy would have worked on historical data. For trade signals, I am thinking of using simple moving average and several machine learning methods. The result will be shown visually in graphs and tables.
Through this project, I will be able to learn the technical aspects of machine learning as well as how to build an interactive web application.
Idea 2: A Study of different machine learning methods in stock price predictions.
In this project, I aim to learn several machine learning methods (probably several different neural network methods) and analyze the performance of different strategies. Unlike idea 1 above, I do not intend to make a web application for this project. Thus, I plan to go deeper in learning the technical parts of machine learning methods.
Idea 3: Stock trend prediction through sentiment analysis using Natural Language Processing.
In this project, I aim to learn the fundamental of NLP, generate sentiment-based trade signals, and test the performance of the strategy. Similar to the idea 2 above, I do not intend to make a web application for this project. Thus, I will go deeper in learning technical of NLP.
A Data Science and Machine Learning Project to explore the stock data of a particular stock exchange. The exploration will be focused on observing the repetitive trend in stock markets and relating it to the business cycles. Some questions that can be asked in this project is as follows:
- Is there any particular pattern that stocks market follow in between the end of December and start of January. This time period is said to be a speculative time for investors and trader. Particularly, it is observed that traders can benefit by buying in December and selling in January because of the closure of accounting books of firms.
- Another interesting phenomenon would be to ask if there is a trend in between bull market and bear market. That does a bull market always have to be followed by a bear market and vice versa.
The main resource for this project would be “Python for Finance” Analyze Big Financial Data by O’Reilly Media. Some other resources are as follows:
A portfolio tracker that keep tracks of investments in stocks in a particular market. Keeping in mind the time limitation, it would be better to focus on small markets for this project. The web-based application will provide different portfolios to users to keep track of their investments and to easily look at their best and worst investment.
In this project, the major component of research would be figuring about how to structure the database design for such a system as well as enforcing multiple levels of database transactions logging. A further investigation might be in mirroring the data for backup. Along with this, the project can have a data analysis research segment for any market that might suffice the need of this project.
The research component of this project will also lie in using Model View Controller design pattern to develop such a system. This project essentially has two part, the software design, and the data analysis research. If this project is taken, serious amount of planning has to be done to ensure that all both the component of the project is completed,
The project is about creating a software that can determine an optimal value for a company by looking at their balance sheets records in the past to predict future cash flows. Financial analysis methods such as DCF, DDM and FCE can be implemented in this approach (only one). This system would be automated using machine learning and data analysis.
The main research for this project is coming up with a model that can predict the future cash flows of a company by looking at past trends. Regression will be one of the core Machine Learning Techniques that will be applied in this research. Some resources for this project will be “Python for Finance” Analyze Big Financial Data by O’Reilly Media.
The valuation of the company is doing using what finance people call as the time value of money adjustment. Basically, what this means is that getting $100 today is better than getting in tomorrow or anytime in the future. Thus, all future cash flows that the company generates needs to be discounted at today’s value. In order to do this, we need to figure out the discount rate. There are different approaches we can take for this. For instance, we can use the interest rate provided by the Federal Reserve or we can make our own that can reflect the real financial scenario better. The Capital Asset Pricing Model can be used in this scenario but there are things such are beta and the free interest rate that needs to be estimated. This estimation can be the second part of the research.