Weekly update – Rei

with No Comments

During the past week, my focus has been on:

  • Preparing for the demo.
  • Working on the second draft of the paper.
  • Manually labeling the corpus.
  • Working on getting the overlapping tests working.
  • Outlining the poster.

 

Weekly update – Rei

with No Comments

During the last week, I mostly continued working on getting the exhaustive parameter search working for the classifiers. At first, I was having a few errors but in the end, I was able to get it working and get results.

Next, I worked with the Open American National Corpus. I extracted all the text files in one folder. I was able to convert those text files into a csv where each sentence is contained in one row. After that, I run a script which created two csv files: one containing sentences which can potentially contain analogies and one that potentially don’t. I have started labeling them manually.

I have also started preparing for the demo and the second draft of the paper.

Weekly update – Rei

with No Comments

For the first half of the last week, I was working on the first draft of the paper. Since then, I have been working on getting the exhaustive search working. It has been a struggle. I was trying to implement a pipeline between the classifier and feature extraction tools. However, this seems to be incompatible with data structures I have been using. As a result, I have decided to not use the pipeline, but rather do the exhaustive searches separately for the classifiers.

 

On the other hand, I have started working with the Open American National Corpus. I should have labeled data in the next two weeks at a maximum.

 

Finally, I started working on an outline for the poster.

Weekly update – Rei

with No Comments

During the past week, I have mostly worked on the first draft of the paper. I did a quick review with Dave on Monday. Then, I reworked on the paper based on the suggestions from Dave.

I have also worked on the coding aspect. I am almost done with implementing the exhausting grid search for hyperparameter tuning for the classifier and the feature extraction tools. The results of the grid search should improve the overall performance of the system.

I am also close to deciding on the corpus that will be used for the experiments. Once I have that, getting the results should be relatively straightforward.

Weekly update – Rei

with No Comments

For the past week, I have been looking at some available corpora found online. I was able to get one for free, namely the Open American National Corpus. It seems like it might be a better fit than the Blogs corpora. In the meantime, the TSVM is working and I have been getting some basic results. I have started working on implementing an exhaustive grid search to find the best parameters for the TSVM. I have also worked on the first draft of the paper, which is due next week. I am hoping to get the results from the grid search before Wednesday, so I can add those to the paper.

Weekly progress – Rei

with No Comments

Last week I was having a problem with a deprecation warning. I looked into that and I thought I fixed it. It seemed to be working fine, but I got the same error at a different point. So I am just ignoring the warning for now. This lead to getting the TSVMs working and getting some results. For the next step, I intend on getting bigger datasets and also looking at the available corpora. I am currently considering the Corpus of Contemporary American English.

Moreover, I reread my last year’s proposal and started to prepare to work on the first draft of the paper.

Weekly update – Rei

with No Comments

Continuing from last week, I was able to change the implementation of the Scikit-Learn wrapper for TSVM from Python2 to Python3. I am currently getting a new deprecation warning caused by TSVM. It seems like the code for TSVM assumes I am using an older version of Numpy. For this week, I plan to look more into that so I can figure out how to solve the issue.

I have randomly generated a sample from the Blogs corpora and during this week(as well as the next one) I will manually label it. Moreover, I plan on rereading my proposal to determine the parts which are going to be useful in my final paper.

Weekly update – Rei

with No Comments

During the past week, I run into a few problems. I kept getting a PendingDeprecationWarning coming from one the methods used from Scikit Learn. It seems that something has changed or is currently changing on Numpy that affects Scikit learn modules. For now, I was able to ignore the warning. However, I will look more into it to make sure that the warning won’t become a bigger problem in the future.

I was able to find a Scikit wrapper for TSVM, and I tried implementing it. I am currently getting an error coming from the implementation of the TSVM. I suspect it is due to being written for Python 2. I will make the necessary changes so I can use with the current version of Python. For example, it uses xrange() instead of range(). I believe that once I make the necessary changes, the TSVM should work as suspected.

Moreover, the Blogs corpus was too big to be imported as CSV on Google Sheets. So I am currently implementing a simple script which will randomly create a smaller sample, which I will manually label afterward.

Weekly update – Rei

with No Comments
  • This past week:
    • Looked into the SKL issue; I now understand what’s happening.
    • Had to change a few things – LabelPropagation only accepts numpy arrays
    • When making the pipeline as a structure, I need to do it differently than what has been done in the past.
    • Got results for label propagation
  • This coming week:
    • Get things working with TSVMs.
    • Start manually labeling the blogs corpus

This week’s update – CS488 – Rei

with No Comments
  • This past week
    • Have run into difficulties this week:
      • Blogs corpus compiler isn’t working.
      • Looking into making a small implementation of semi-supervised learning
        • This seems to be working
        • Trying to do the same with the SVM one from github
          • We think we may be getting close to this. We have found a Scikit-learn wrapper.
  • This coming week:
    • First: Looking at the scikit learn issue
    • Do the same implementation
      • Get some basic results – confusion matrices, etc.

Last week update – CS488 – Rei

with No Comments
  • This past week:
    • Got blogs extraction going
    • Looked at algorithms
      • Have found ones that seem to work in a similar way to scikit-learn modules.
      • Generative adversarial networks?
      • Label propagation from scikit-learn.
      • Found the representations to use.
  • This coming week:
    • Get TSVM’s working with the data we have
      • The blogs corpus is currently unlabeled.
      • We have a labeled corpus, but it’s smaller.