Weekly update

with 1 Comment

During the past 2 weeks I have:

– Been able to program the communication between the Chrome extension and the classifier, with some modification in the project design. Instead of using Native Messaging, I set up a Flask server and make the two programs communicate through the HTTP POST method.

– Worked on the front end of the Chrome extension which displays the percentage breakdown of political ideologies in a news article.

– Worked on my poster and submitted it to my advisor for feedback.

In the coming week, I plan to work on the final draft of my paper given the feedback I received from the 2nd draft. I will also finalize the project as well as the poster

Weekly update (11/14)

with No Comments

During the last week I encountered a few problems with my project

  • While working on the demo, I noticed that the classifier is not doing a good job classifying liberal news sites. It classified a few articles on CNN and Slate as either conservative or neutral. This can be due to the fact that these articles, uses a lot of conservative related terms to criticize the conservative perspectives, which misled the classifier to classify them as non-liberal
  • The native messaging module to communicate between the classifier and the chrome extension refuses to connect when the classifier module is imported. I have discussed this with Ajit and we are both working on it. Once we figure out a solution, the extension can communicate with the classifier so the remaining work should be minimal and mostly front-end related.

Weekly update (11/7/18)

with No Comments

During the past week I was able to:

  • Improved the average accuracy, recall, and F1-score of my MLP classifier to 80%
  • Implemented goose3, a news scraping API to query raw texts from news articles, handled the data, vectorized the sentences and pass them to the classifier. However, the classifier currently fails to identify a lot of liberal or conservative sentences on news sites, and usually outputs 100% neutral sentiment even for historically biased sites such as Breitbart and Huffington Post. This result is surprising given the test results attained from the given dataset. My hypothesis is that these news articles employs metaphors and perhaps subtly biased sentences that make it difficult for an MLP classifier to detect.
  • During the next week, I plan to us Native Messaging to connect the Chrome Extension I made with the classifier so that the Chrome Extension has its most important functionality.

Weekly update

with No Comments

During the past week, I have been working mostly on the Google Chrome extension. I encountered some problems with scanning data from a given website. The extension listener does not seem to process DOM objects correctly and it couldn’t query the HTML content of a page. I will have to look more into the document of Chrome extension in order to get that to work. I have also experimented different activation functions for my neural network and so far the best result I got was 67% accuracy. I plan to look more into other specs of the MLP that can be modified in order to achieve at least 70% accuracy before integrating it into the Chrome extension

Weekly update (10/21 – 10/27)

with No Comments

In the past week, I was finally able to get my MLP classifier to train and test with the IBC dataset without any error. The average result so far is at 66% with a standard-parameterized MLP in sci-kit learn. In the next week I hope to dig more into this to improve the result by using different, more suitable parameters and optimization functions for this task. I have also worked on the first draft of my paper and sent it to my adviser for feedbacks. The feedback is positive, the paper is submittable but we will talk more about how to improve it during our one-on-one this Friday. Overall it has been a productive week in terms of both my project progress and my paper.

Weekly update

with No Comments

During the past week I have:

– Cleaned up the data, converted it to the CSV file, tried feeding it to the MLP classifier. However, I’m running into some error scaling it before feeding the data to the MLP. I will have to look into another scikit learn tutorial for text classification this week and try to use it for my project.

– Talked to Ajit regarding the structure of my paper, continue working on the paper “Previous work” section

– Initialized a basic chrome extension and will look more into how it can scan data on a current webpage and its url address

Weekly update – (10/7 – 10/13)

with No Comments

During the past week I have:

– Gotten close to getting the data to work. Right now the data is divided into leaves and nodes. I’ve been able to get the node-level labels but not the leaf-level labels. Once that is finished, I can quickly start training the classifier with the leaves (which represent words)

– Discussed to Ajit regarding the possibility of implementing cosine similarity between words for better training results, in case word2vec is not implementable. (There’s a chance that integrating word2vec into the classifier is out of the scope of this project)

– Started working on the Design Section of my paper

– Have an initial paper outline ready to be finalized with Ajit.

Weekly update (9/30 – 10/6)

with No Comments

During the past week, I was able to query the data from the IBC and looked more into it. It has come to my attention that the IBC was labeled both on the word level (leaf) and the sentence level (tree), with the leaf-level labels noisier. I will have to look more into which label level to use to train the neural net. I also implemented a simple classifier using MLP in Sci-kit learn using Jupyter notebook. I will have to look into how initializing a word vector matrix will help with this classification task (or maybe it won’t)

Weekly update (9/23 – 9/29)

with No Comments

During the past week I have:

– Started writing the introduction of my thesis paper

– Started implementing the first module of the project (word matrices initialization)

– Looked into the IBC corpus data, whose sentences are implemented as a tree data structure. I have queried some data to check for its integrity.

– Discussed to Ajit to agree on a timeline for writing my paper specifically

– Started looking into the concept of data pipeline, which shows potential to be implemented as a way to transfer and process data in this project (going from a web page through a google chrome extension to a classifier, and back)

Weekly update (9/16 – 9/22)

with No Comments

Over the past week, I was able to make the following progress with my project:

– Came up with a detailed design of my project, which lists all the modular components and their functions. These components might change as the implementation phase takes place, but so far it will be a guideline to stick to.

– Discussed with my project advisor (Ajit) and agreed on a timeline, at least for the next 3 – 4 weeks, which divides the tasks of writing the paper, implementing the neural network and the Google Chrome extensions.

– Obtained the IBC dataset with 4,000+ labeled sentences from the authors themselves. This is the most crucial part of implementing, training and testing the neural network. I will spend the next week to learn about the dataset, do some data cleaning if necessary and start implementing the CNN in SciKit Learn.

– Installed SciKit learn and started getting myself familiar with the tool by taking a course on Udacity. Also read about CNN and MLP implementation in SciKit learn

Weekly Update (9/9 – 9/15)

with No Comments

For this week, I plan to:

  • Finalize the project timeline
  • Look into how to plug the sample data into the classifier and link this functionality to the Chrome extension
  • Take the Udacity crash course on Supervised learning
  • Set up PyTorch and look for packages that implement CNN

Weekly update (9/2 – 9/8)

with No Comments

In the last week I have:

  • Reached out and had Ajit as my project adviser
  • Contacted the IBC author to request the dataset
  • Started reading about PyTorch and how to set them up
  • Started looking into online crash courses about CNN
  • Talked to my adviser about project design and next week’s plan

Plan for the week starting 9/2

with No Comments

For next week, I plan to

  • Set up the environment on my computer for SciKit learn (or potentially PyTorch)
  • Collect data (the IBC) by emailing the authors
  • Read through documentation and familiarize myself with SciKit learn and supervised learning.
  • Reach out to potential project advisers