Final deliveries for CS488
Poster: CS488 Poster – Minh Vu
Github directory: https://github.com/mdvu15/CS488-Senior-Capstone
During the past 2 weeks I have:
– Been able to program the communication between the Chrome extension and the classifier, with some modification in the project design. Instead of using Native Messaging, I set up a Flask server and make the two programs communicate through the HTTP POST method.
– Worked on the front end of the Chrome extension which displays the percentage breakdown of political ideologies in a news article.
– Worked on my poster and submitted it to my advisor for feedback.
In the coming week, I plan to work on the final draft of my paper given the feedback I received from the 2nd draft. I will also finalize the project as well as the poster
During the last week I encountered a few problems with my project
During the past week I was able to:
During the past week, I have been working mostly on the Google Chrome extension. I encountered some problems with scanning data from a given website. The extension listener does not seem to process DOM objects correctly and it couldn’t query the HTML content of a page. I will have to look more into the document of Chrome extension in order to get that to work. I have also experimented different activation functions for my neural network and so far the best result I got was 67% accuracy. I plan to look more into other specs of the MLP that can be modified in order to achieve at least 70% accuracy before integrating it into the Chrome extension
In the past week, I was finally able to get my MLP classifier to train and test with the IBC dataset without any error. The average result so far is at 66% with a standard-parameterized MLP in sci-kit learn. In the next week I hope to dig more into this to improve the result by using different, more suitable parameters and optimization functions for this task. I have also worked on the first draft of my paper and sent it to my adviser for feedbacks. The feedback is positive, the paper is submittable but we will talk more about how to improve it during our one-on-one this Friday. Overall it has been a productive week in terms of both my project progress and my paper.
During the past week I have:
– Cleaned up the data, converted it to the CSV file, tried feeding it to the MLP classifier. However, I’m running into some error scaling it before feeding the data to the MLP. I will have to look into another scikit learn tutorial for text classification this week and try to use it for my project.
– Talked to Ajit regarding the structure of my paper, continue working on the paper “Previous work” section
– Initialized a basic chrome extension and will look more into how it can scan data on a current webpage and its url address
During the past week I have:
– Gotten close to getting the data to work. Right now the data is divided into leaves and nodes. I’ve been able to get the node-level labels but not the leaf-level labels. Once that is finished, I can quickly start training the classifier with the leaves (which represent words)
– Discussed to Ajit regarding the possibility of implementing cosine similarity between words for better training results, in case word2vec is not implementable. (There’s a chance that integrating word2vec into the classifier is out of the scope of this project)
– Started working on the Design Section of my paper
– Have an initial paper outline ready to be finalized with Ajit.
During the past week, I was able to query the data from the IBC and looked more into it. It has come to my attention that the IBC was labeled both on the word level (leaf) and the sentence level (tree), with the leaf-level labels noisier. I will have to look more into which label level to use to train the neural net. I also implemented a simple classifier using MLP in Sci-kit learn using Jupyter notebook. I will have to look into how initializing a word vector matrix will help with this classification task (or maybe it won’t)
During the past week I have:
– Started writing the introduction of my thesis paper
– Started implementing the first module of the project (word matrices initialization)
– Looked into the IBC corpus data, whose sentences are implemented as a tree data structure. I have queried some data to check for its integrity.
– Discussed to Ajit to agree on a timeline for writing my paper specifically
– Started looking into the concept of data pipeline, which shows potential to be implemented as a way to transfer and process data in this project (going from a web page through a google chrome extension to a classifier, and back)
Over the past week, I was able to make the following progress with my project:
– Came up with a detailed design of my project, which lists all the modular components and their functions. These components might change as the implementation phase takes place, but so far it will be a guideline to stick to.
– Discussed with my project advisor (Ajit) and agreed on a timeline, at least for the next 3 – 4 weeks, which divides the tasks of writing the paper, implementing the neural network and the Google Chrome extensions.
– Obtained the IBC dataset with 4,000+ labeled sentences from the authors themselves. This is the most crucial part of implementing, training and testing the neural network. I will spend the next week to learn about the dataset, do some data cleaning if necessary and start implementing the CNN in SciKit Learn.
– Installed SciKit learn and started getting myself familiar with the tool by taking a course on Udacity. Also read about CNN and MLP implementation in SciKit learn
For this week, I plan to:
In the last week I have:
For next week, I plan to