CS488 – Week 5

with No Comments

This week I first worked on creating an outline for my final paper, which was useful as it sharpened my current understanding of my project and where it is headed. I also was working with a new model and was able to successfully train it, save checkpoints and load them. I also created basic pre-processing functions for my data to match the format of input sets.

Loading of weights did seem to not work with this model. When I reached a checkpoint with 80%+ accuracy and saved the weights, I followed up with loading the weights and feeding in test data from the dataset, but accuracy dropped to 5%. This was extremely confusing and is my priority to understand this week otherwise I will have to find another model.

CS 488 – Week 4 – Updates

with No Comments

TensorFlow is working fine on Lovelace now. But I just found that the demo uses TensorFlow 1 while the latest version installed on Lovelace is TF2………. The demo has a lot of code. I am not sure if i should work on this one and update all codes to TF 2, or just find another resource…… I talked to Xunfei, she told me to try other resource briefly. Because update that demo is not a small work.

CS488 – Week 3 – Update

with No Comments

I am still having issue on running the demo code from GitHub. I requested installation of TensorFlow in python3 on lovelace but it seems there’s still error. It is probably the issue of environment setting. I will communicate with the admins.

TensorFlow is working fine on Lovelace now. But I just found that the demo uses TensorFlow 1 while the latest version installed on Lovelace is TF2………. The demo has a lot of code. I am not sure if i should work on this one and update all codes to TF 2, or just find another resource…… I talked to Xunfei, she told me to try other resource briefly. Because update that demo is not a small work.

CS488 – Week 4

with No Comments

This week I did a lot of research and work on the more anthropological side of my project. I emailed Tom Hamm and Greg Vaughn and got some great information about where I could find the foundations of old buildings around campus that I could use for my project. This information will hopefully be detailed enough for me to create some labeled training images.

I also spent some time this week learning fast.ai, which I have settled on for now as the best option for identifying images. The library is extensively documented, and extremely robust. As soon as Layout or a similar machine is back up, I will be able to start testing code, but for now, learning the library is just as important.

CS 488 – Week 4

with No Comments

Last week I worked on collecting and preprocessing the data using Groupon API. I also started learning about and implementing my autoencoder model. So far the obstacle has been the learning curve but I have been extensively reading about neural networks and Keras and should be able to continue working on the project without any hiccups. Next week I plan to start my first draft of the paper as well as have a somewhat working version of the autoencoder model.

CS 488 – Week 4

with No Comments

This week, I started the data preprocess for my image data. The steps include, resizing, cropping, normalizing and lastly change to tensor value so that it can be fit in a neural network. For the numerical data set, I started looking into different algorithms which are not as computationally expensive as neural network such as k-nearest neighbor, support vector machine. In that way, I can test it when Layout server is still not available.

CS 488 – Updates – Week 4

with No Comments
  • I have spent this week analyzing the data sets that I have to see if there are any outside things that I need for these data sets to be able to be tested using Weka.
  • I have found that some required me to have my app registered with Facebook Developers and Disqus and some were not actually in proper .csv format and so Weka (the tool that I am using to test classification methods) could not read it.
    • This meant that I have a lot smaller pool of articles that I am able to replicate.
    • I have found 27 different data sets but I haven’t read all the papers those data sets are used in and some of the papers that mention the data sets are just explaining how they created the data sets and not how to use them in this context.
  • Because of all of these little setbacks, I am working on just finding smaller sample data to test Weka with, so that I can make sure Weka is working and I am focusing on recreating the results from Castello et al.’s work for the moment.
  • Castello et al.’s data format is different than what I have used for Weka before and I have to do some more digging to see if I need to combine the fake news data set with the credible news data set for each year first before sending it through Weka, or if I can just open both within Weka and tell it how to find what it needs.

CS 488 – Week 4

with No Comments

In the past week, I worked on generating five recommendations from each of the the six product categories. I still have a confusion about the cosine similarity formula so I’m planning to meet with other faculties in the following week while keep working on the next task. Other than that, there wasn’t any obstacle and I just need to make the function return the results in a nice and clean way.

CS488 – Week 4

with No Comments

This week I made efforts to get predictions from my model that was trained last week. However, after some hours spent understanding the code, I realized that this model is not for practical use but rather theoretical predictions, as each query set requires a supporting set.

Following this setback, I have now found some models to train from a smaller dataset in comparison to FewRel. I believe these models are able to be used practically on random query sets. With the smaller training time required for them, I should be able to verify which is best for my project this week.

CS 488 – Week 4 – Update

with No Comments

This week, I continued my testing for the physical aspect of my project. During this testing, I tried to focus on ECOpen since it says there is no encryption associated with the network. Come to find out, there is still an authentication process that one must go through when trying to connect to ECOpen. So when I ran a packet sniffer on a device that was on an ECOpen channel, I could not see any data. (This is a good thing and was noted). I also finished preparing my social engineering test which will begin tomorrow, February 12th. This next week will consist of my social engineering test, processing results from physical test, and working on my paper.  

CS 488 – Week 3 – Updates

with No Comments

Last weekend, I spent time with a small group of friends filling out a spreadsheet of information for 2020 Senate candidates. So far, 154/348 filed candidates have been added to the sheet. During that time, we discovered that a few candidates operate their campaign on a public Facebook profile instead of a Facebook page. In talking with Charlie, he guessed that the process in the API to collect profile data shouldn’t be too different from page data. Therefore, I am planning to collect this data as well, while noting the names with profiles in case their results are drastically different from overall results. Next, I plan to develop the scripts to start collecting and analyzing small amounts of data, planning to scale and automize them later.

cs488 – Week 3

with No Comments

This week was a big planning week for me. I spent a lot of time writing down notes and ideas, as well as researching the details of what I need for my project. I also spent some time gathering resources for my project in the form of data from Iceland. A combination of 2018 and 2019 data will provide me a much-needed training/testing case.

I have progressed in my implementation, further streamlining the process of creating various edge detections of original images. This week I added the Prewitt edge detection algorithm and improved my Caney edge implementation to have a tight, wide, and auto mode.

I have also been researching technologies for image recognition via machine learning with multiple channels. This is the idea that a single “object” in the AI can have multiple images associated with it, and it is necessary for my project.

CS488 – Week 3

with No Comments

This week I was able to create a saved checkpoint of my learning model for semantic relation extraction. This hopefully means I won’t need to train it further and can now focus on feeding it my data, which now needs to be pre-processed before being fed into the model. A basic GUI window was also up and running this week with PyQt5 which was great to see! I will be writing more code in the coming weeks now so I need to ensure that my project files are organized.

CS 488 – Week 3

with No Comments

This week, I tried to implement some models and was hoping to get it on our Layout server with GPUs. However, the system admins were still working on that and I could not ssh to the server. Therefore, I created a google cloud free trial account and started writing and testing my model on their server. 

CS 488 – Week 3

with No Comments

Since my project involves a significant part that’s marketing, I was advised by my instructor to talk to Seth and other professors about how I should approach a dataset. After talking to them, I have decided that a good approach would be creating a dataset using the readability formulas. First I will calculate the average readability and then filter the dataset using that average readability. A marketing dataset has been extremely hard to find, but asking around has led me to the Groupon API – it lets me get 100 deals per second which will help me easily scrape millions of deals in a few days. I plan to run a script in the background that does it. Since last week, I have also successfully implemented word2vec using Genism – a python library. 

CS 488 – Week 3 – Update

with No Comments

In the past week, I worked on calculating the cosine similarity between the ingredient composition of an inputted item and that of the rest of the items in the data. I am struggling to decide on which formula to use for this, since the related project used the equation different from the “typical” formula used to compute cosine similarity. I will need to look into this more next week. 

CS 488 – Update – Week 3

with No Comments
  • I have started to keep a log of what I do every day for this project so if something goes wrong I know where to back up and begin again. This will also help later when writing about my process for the poster/paper
  • I have started mapping all the datasets I found to what papers used them so that I could figure out which papers I could replicate
  • I have started trying to replicate papers as well using Weka just to make sure I’ve set up everything correctly so that I can properly set up my own tools
  • I’m having issues with how vague all the research papers are, however. So I think to fix that issue, I’ll need to email the researchers which more questions so I can actually replicate them and know what tools they used.

CS 488 – Week 3 – Update

with No Comments

This past week I have really dug into my physical testing. Using Kali Linux and a wireless adapter (supports monitor mode), I was able to use commands to see which networks were available and from there, I could see all of the clients connected to each network. However, I only could see the BSSID (MAC Address) of each device, nothing more. I then went in to WireShark which showed me a little more data. I could potentially see what type of device it was. However, all data was encrypted in ECSecure. Trying to break the encryption was hard as we have hundred of users with different passwords. It’s not just a single password for the ECSecure network (that would be too easy to break). I plan to continue this testing and see what else I can find through ECOpen.

I have also started to set-up my Social Engineering experiment that way everything is ready when the start date arrives.

CS 488 – Week 2

with No Comments

This week I have looked at some papers of most recent models for classifying images to build for my dataset. I encountered some challenges while reading those papers since there were terms that were hard to understand. Next week, I will continue to work on the image dataset and model.

CSS 488 – Week 2

with No Comments

Due to a lack of available usable datasets, after talking to my advisor and instructor I decided to modify my project to focus on readability and sentiment instead. I researched papers on readability and sentiment this last week and have starting writing code using python(Keras). My next week’s goals are to have some working code for a trained network that produces more readable code. I still need to look a bit more into what constitutes as readable when it comes to marketing material.

CS 488 – Update – Week 2

with No Comments
  • This week I recovered all of the data sets I found last semester that were on my other computer. I then downloaded and extracted the data.
  • I also chose to set up my own Developer SQL database on my laptop so that I can keep my training data and the user data in one accessible place.
  • Because I wasn’t able to have my mentor meeting last week, I wasn’t sure where to begin with all the work I’ve set up. So I’ve decided to go back through all of my notes on the research papers I have read and create a giant spreadsheet detailing the tools used, features used, classification methods used, whether the dataset or the code was available, and if I’ve contacted the authors of these papers for more info.
    • This will help me figure out how I’ll need to create the learning loop to not forget any feature or method.
    • This will also help me show my advisor exactly what was in previous work and what I have to build off of

CS 488 – Week 2 – Update

with No Comments

This past week, I have started phase 1 of my project, testing the physical security of the network. Along with starting this phase, I started to write the Google survey that will be used w/ the social engineering experiment. I also ordered the hardware needed for the social engineering test. I have not encountered any obstacles. This next will I will continue to use WireShark to test the physical network using both a wireless and ethernet adapter. 

CS488 – Week 2

with No Comments

In the past week, I have spent most of my capstone time organizing my project and testing some options for the machine learning component. I have been working with fast.ai and ImageAI python packages, trying to set up some groundwork for when I have data ready.

I have also organized all the algorithms that I want to try, at least until after I can compare some results (after I see the results, I may opt to implement more)

My hope for the next week is to make progress on acquiring training data with drones, or at least narrow down where I might want to survey.

CS488 – Week 2

with No Comments

I forked MLMAN, a PyTorch model that achieved the second-highest accuracy of validation on the FewRel dataset for semantic relation extraction. Running locally with a useful amount of iterations, it took to long to train, so I will be training the module on hopper and saving the model there to fetch for local use. With this saved model, I hope to start pre-processing and feeding sentences into it for validation.

CS 488 – Week 2 – Updates

with No Comments

My project is to collect and study the Facebook Reactions and comments on posts by U.S. politicians to see if bias exists based on the gender of the politician. I have decided with Charlie’s advice to focus my project on the 2020 Senate races. The 2020 Presidential election doesn’t have enough candidates to be a good sample size. The 2020 House races would likely have a wide variety of candidate strategies based on the district, many districts with no competition, and less voters per race. By contrast, the Senate races have enough candidates to be a good sample size, while also having more voters per race, meaning there should be more Facebook Pages with enough user activity to be used in my dataset.

This week I found sources for the Senate races, created a spreadsheet for candidates, and decided on which relevant columns should be in the spreadsheet. I am filling out the sheet first for races where the filing deadline has passed for the primary first. Next, I plan to learn how to access the Facebook API using the Facebook SDK Python library, and to collect sample data for candidates I have already added to the spreadsheet.

CS 488 – Week 2 – Updates

with No Comments

I decided to change my modeling method to neural networks. I have read a paper called Text-Independent Speaker Verification Using 3D Convolutional Neural Networks and checked their resources on GitHub. I tried to run their demo but required packages couldn’t be installed on my laptop. i probably need to request a place to run on CS/Cluster from the SysAdmins. I also found other similar resources on GitHub. My next step is to run them with testing files. I also had the first weekly meeting with my advisor Xunfei to discuss timeline and future plans. 

CS 488 – Week 2

with No Comments

I made a visualization (plot) displaying ingredient composition similarity between different products and skin types. I attached two drop-down options for users to select from product categories and skin types. I also attached labels to the graph so that it displays the product’s name, brand, price, and rank. 

CS 488 – Week 1

with No Comments

I go through the project again because it has been a while since I had CS 388 last Spring. I downloaded the data set and started doing some data manipulation and preprocessing. I will start looking at the models for image data set next week.

CS 488 – Week 1 Update

with No Comments

In the past week, I loaded the data, extracted ingredients from products, and made a document-term matrix containing product names and ingredient composition. I plan to visualize ingredient similarity between products this week. I haven’t faced many obstacles yet, but I want to finish things earlier than planned to allow some time for future obstacles.

CS 488 – Week 1 – Update

with No Comments
  • I bought a new computer over the break because my older one was unreliable and crashed unexpectedly from time to time. So I spent this first week setting up the computer and downloading the tools that I believe I’ll be using.
  • I also have spent a lot of time hunting down the data sets from the research papers that I have read and have a collection of over 22 different fake news data sets.
  • I created my presentation slides which helped me think about the project in a different way since I need to think about how to explain things in a way that will make sense to everyone and not just myself.
  • Finally, I chose my adviser and set up a meeting time and shared notes space but we were unable to meet this week since she will be at a conference.

CS488 – Week 1 Update

with No Comments

This week I created the presentation for Wednesday, which helped to make clear to me my new current goal after work done over break. I have found some new datasets and repositories for models online, which I will be presenting to my advisor to figure out which best suits my project. I have also tried to better breakdown my timeline following the selection of a module for the following month, and have personal project goals. I researched some libraries for GUI implementations, currently leaning towards Electron (Java) or PyQt5 (Python).

CS 488 – Week 1 – Update

with No Comments

This week was mainly for refreshing myself on the details of my project. I finalized Charlie to be my advisor for 488 and set up a weekly meeting time with him. I also completed the 3 slide powerpoint in preparation for the presentation in the joint class of 388/488. I adjusted my timeline and plan to start the first phase of my project on Monday. I did not have any obstacles this week. Within this next week I plan to start the physical testing phase of my project. 

CS488 – Week 1 Update

with No Comments

This week has been mostly organizational for me. I found some more resources on Github that I want to try and make use of, and I worked on my design plan for implementation. I talked with Igor about technologies I can use, and what I might need to use them effectively.

The main obstacle right now is the amount of structure that my project requires, which is why I am taking my time to create a solid plan for how things will connect to one another.

Next week, as my design becomes concrete, I will start coding different segments of my project, using some of the preliminary work I have done as a guide.

CS 488 – Week 1 – Updates

with No Comments

First of all, I decided my advisor to be Xunfei who was my advisor as well last semester. We decided our weekly meeting time. I have read some new papers and decided to change my modeling method from GMM-UBM to Neural Networks, and combine with i-vectors or x-vectors. I have found related code sources about Deep Neural Networks/Convolutional Neural Networks for speaker verification on GitHub. GMM-UBM is one of the most classical and dominant methods for speaker verification, but its accuracy decreases as the amount of users increases. Nowadays, there are new methods performs better than it, like Deep Neural Networks/Convolutional Neural Networks. This change on my project might be more challenging because I am using a new method which probably has fewer recourses. But I really want to make the accuracy for speaker verification higher than 90%. 

CS388 – Week 13 Update

with No Comments

In the past week, I have been working mostly on my presentation and my proposal. My proposal is close to a finished state, but I am still working on collecting preliminary results. I have also been trying to create new figures (images and charts) which are easier to read on printed copies of my proposal.

For the implementation itself, I am still working on the things I outlined in the first section of my project timeline (setting up the pipeline of the project without adding all the features at each stage), to try and get a minimum version working. I think that this will take a couple more weeks, but I am hopeful that it will lead to me having some buffer time next semester during my implementation of the project.

CS388 – Week 13 – Update

with No Comments

I finished my presentation. My next step is to add abstract and more introduction to my proposal paper, and finish the final version of it. I have done more research in the past week and planed to change my modeling method from GMM-UBM to Convolutional Neural Network or Deep Neural Network. GMM-UBM is very classical but also “old-fashioned”. CNN and DNN are newer and better. GMM-UBM’s performance lowers as the amount of speakers increases. But I do not have enough time to change method for this semester. I will do more research during winter break and probably change next semester. 

CS 388 -Week 15 – Updates

with No Comments

I have researched and read a few more papers in the last week. I have expanded upon my analyze -> split -> replace modules with actual implementation details using an encoder-decoder model to swap less engaging text with more engaging text. In order to do this, the text needs to be vectorized and then trained. I have also found a module that can help me achieve that. I have also extensively worked on my proposal presentation.  I also met with my advisor and went over the presentation and was advised to explain the slides in a way that a person with no understanding of neural networks can understand what is being communicated.

CS 388 – Week 14 – Updates

with No Comments
  • I spent the vast majority of this week looking for projects that have specifically detailed how they implemented a fake news detector and reading through the articles I’ve already found.
  • While some have given a lot more detail on their process, unfortunately, I can’t understand some of the details.
    • A lot of the details go into the mathematical aspects of machine learning and convolutional neural networks. That’s very difficult for me because math is not my strong suit.
    • I will either have to find a tutorial that will actually explain it well or I might have to compromise my big goals for this project. I need help finding papers or tutorials that clearly explain their processes so I can move forward in the way that I want to.

CS 388 – Week 13 – Updates

with No Comments
  • I focused this week on fixing my first proposal.
    • I re-did all of my diagrams so that they would use the proper shapes
    • I re-wrote my design section
    • I add more to my introduction to better explain the importance and the gaps
    • I elaborated about the timeline and gave a high level overview by month
  • I also did research into the postgres database using SQL because that seems like the best tool for my project.
  • Next week over the break, I hope to go more in depth into my readings and start to finalize the tools I want to use

CS388 – Week 12 – Update

with No Comments

I read some new papers and research about different modeling algorithms and started to worry about the accuracy on my system. The accuracy is not only rely on the modeling but also based on the dataset for training and the quality of acoustic input (the speaking environment). But selecting a suitable modeling algorithm is important. Now the popular models are: HMM, VQ, DTW, GMM, UBM, i-Vector. I temporarily chose hybrid GMM-UBM. I might change in the future or mix other modeling to enhance the accuracy. My goal is to reach an accuracy at least 90%.

CS 388 – Week 13 – Updates

with No Comments

This week, I continued working on my project proposal, submitting my second draft after some much-needed updates. I still need to work further on the Related Works section. I additionally continued working on early implementation of the project. Lastly, I prepared a first draft of my presentation slides.

CS388 – Week 13 -Update

with No Comments

This week, I worked on the similar project posted online. While working on it, I found some challenges in modifying the content-based data set to fit the collaborative-filtering method. I might end up modifying my project from a hybrid recommender to a content-based recommender. But I will keep looking for alternatives to make it possible.

CS388 – Week 11 – Update

with No Comments

I discussed my proposal draft with my advisor. I got her feedback and suggestion, and knew how to revise and improve my proposal. In the past week, I read more papers about the GMM-UBM modeling method that I plan to use for my project. I understood the specific procedure now but it is still hard to fully understand this principle… Now my another problem is to find a suitable dataset and decide if my system is text-dependent. There are three primary ways for speaker verification now: text-dependent, mixed, text-independent. The text-independent way is very difficult and complicated to do because user can say anything to pass the verification. But text-dependent way is restricted and not safe for spoofing attacks. For example, people can replay pre-recorded voice to pass the verification. Therefore, the mixed way is better. It restricts the text in a way but safe for spoofing attacks. For example, they user can only speak numbers one – ten, but every time the text is random. But it is hard to find a dataset of all audio file in numbers in English. Now I need to decide which text way my system will use. 

CS388 – Week 12 – Update

with No Comments

This previous week, the work I’ve done has been two-pronged, as has become the norm and will continue to be for the rest of this semester. First, I continued work on the basic implementation of the game. I currently have the control module working, as well as a looping stage that I created in order to test the controls. On the proposal side, I’ve been making edits based on the in-class peer review that we did, as well as working more recently based on the feedback given by Xunfei. I also met with Xunfei to go over her feedback of my first draft, and updated her on my progress.

CS 388 – Week 12 – Updates

with No Comments

This week, I investigated the technologies being used in my found papers more closely to find which technologies would be more feasible for my project. For data collection, I have found that the facebook-sdk python library (https://pypi.org/project/facebook-sdk/) used by Pool and Nissim is the best option to connect to the Facebook Graph API, since it looks well documented and has all the options I might need. I also decided to use the Facebook Pages of politicians as my dataset. I reread As the Tweet, So the Reply?: Gender Bias in Digital Communication with Politicians by Mertens et al. to see if their methods could be adapted to my project. I will need to look at their references for methods in more detail to see if I can feasibly apply them to my project.

CS 388 – Week 12 – Updates

with No Comments

This past week, I worked on mainly reading my new papers. I did a third pass reading on all my old papers and did at least second pass reading on the new ones. I tried finishing more than half of the existing project on python notebook and played with the data set. I now have a better sense of how to start my project next semester. I also met Xunfei and updated my progress to her. As soon as the feedback for proposal draft 1 comes out, I will be revising my writing and finishing the existing project I have been working on. I also plan to test the existing project on collaborative filtering to make sure it works with a different data set.

CS 388 – Week 12 – Updates

with No Comments

In the past week, I have spent most of my time working on the first draft of the proposal. I decided to research and include a new category of papers in my proposal that I had not spent a lot of time before on. The new category that I included was “Sentiment Analysis.” While working on the proposal and refining the design of my framework, I realized that sentiment analysis, something that has been thoroughly covered by researchers of neural networks is very close to my research since I also need to know the sentiment behind the email/piece of text that is to be improved. 

CS388 – Week 12 – Update

with No Comments

Project Repo

In the past week, I have used my peer review from Jordan, as well as my own proof-reading of a physical copy of my draft to fix a lot of errors. I wrote my draft in a bit of a rush, and as a result, there were a lot of formatting errors, most of which I have now fixed. I have also updated some of my diagrams in accordance with feedback I have received and expanded some content in my draft that needed to be clarified.

In addition to working on my draft, I have been working on my project itself (preliminary work can be found in the git repo). I created a mockup GUI to give me some ideas about how I want to design the actual version next semester, as well as testing some implementations of different filters, operators, and edge detectors. Some of these results will hopefully be represented in the next version of my draft.

CS388 – Week 12 – Update

with No Comments

During the past week, I finished the first draft of my proposal and started to make those changes for the second draft. I have also continued reading some papers for their next pass. I continued to watch videos and read content related to the USB Rubber Ducky. I have started to put together some scripts that I would like to use for the attack. I also spoke with Charlie to refine my methods for the physical attacks I am going to implement. I now have a better/ more related CS implementation for this attack than what I previously had. During this next week, I am going to be working more with Metasploit on Kali Linux. 

CS 388 – Week 12 – Updates

with No Comments
  • I spent a lot of time this week trying to closely read the texts I’ve found already to try and find any mention of the data set they are using. This was very difficult because the research articles usually don’t name what their data set was called or don’t explain where to find the data set they were using. There is not a lot of details in these papers about the researchers’ process and methodology in a way that would allow me to replicate their results. This made finding fake news data sets extremely difficult. However, through the close reading and intense web searches, I have found 21 fake news related data sets. 
  • I also spent a lot of time researching what would perhaps be the best machine learning tool to use for my project. I’ve narrowed it down to these possibilities: Oryx 2, Tensorflow, Azure ML Studio, Weka, Shogun, AWS CLI, TensorBoard, Kerras, Caffe2. I think that I might be able to use more than one for my project to get the best results but more research still needs to be done about which tool is better for the type of data set I have (which is not a timeseries data set).

CS 388 – Week 11 – Updates

with No Comments
  • Started my work in reviewing new found research that has more relevant research about a fake news detector application
  • Finished my first draft for my project proposal
    • I will need to update my related works section because of the new research I’ve found
    • After the peer review session, I will need to go back and redesign my figures so that they are easier to read
  • Started finding datasets which is very difficult because the research papers never tell you where to find the data set they use and often they never name the dataset either.
    • However, I was able to find some github repositories with datasets and some websites of the authors in the research papers that actually linked to the dataset of their works

CS388 – Week 10 – Update

with No Comments

Finished my first draft of proposal. I read some blogs about speaker verification tech and found out that I was wrong on some aspects (actually I was confused). Those blogs help me understand more and deeper about speaker verification. So I revised my framework and flowcharts: take voice input -> feature extraction -> modeling -> database. The modeling part is the most difficult part in speaker verification. The most popular models are: Hidden Markov Model, Gaussian Mixture Model, Vector Quantization, etc. I am not sure which one I will use for sure. It all depends on my dataset and customer need. I need to experiment several models to know which one I want the best. But I chose GMM temporarily on my proposal.

CS388 – Week 11 – Updates

with No Comments

This past week, I’ve finalized the basic design for the game I will be implementing. It will be a horizontal auto-runner, where the player ducks/jumps to avoid obstacles to the beat of the music in order to keep playing. I continued familiarizing myself with Unity2D, and plan on starting work on the game this upcoming week. Additionally, I wrote up the first draft of my project proposal.

CS388 – Week 10 – Updates

with No Comments

This past week, my work has been split in two directions: First, I’ve been refamiliarizing myself with Unity, by means of going through my Game Design second project. Further than that, I’ve been familiarizing myself with Unity2D for the first time, which I plan on using for the senior project due to the simplicity as compared to Unity3D. Besides getting used to the main software engine I will be using, I also continued reflection on my proposal outline; I’ve been looking more into different PCG-G algorithms and have decided on using the chunk paradigm as my second stage generation algorithm. Its stages won’t be as directly aligned to the music, but it should improve efficiency.

CS 388 – Week 11 – Updates

with No Comments

I extensively worked on the proposal last week, reading more papers and writing out what I plan to do helped me figure out the scope of the proposed project. I also experimented a bit more with tensorflow. I made some changes to my initial framework design to now include a frontend and backend for the end-user to interact with.

Week 11

with No Comments

I finally received the Gourmet Dataset. In fact, I received a devised version that has twice as many images as the original one. I also have the Yelp dataset, although that dataset has not be curated by humans, I am hoping to use it for training my algorithm in addition/instead of ImageNet or AVA.

Since I already have gotten access to the datasets, I have been reading about ResNet/AlexNet implementations, which was my goal for next week.

CS 388 – Week 10 – Updates

with No Comments

I have narrowed my project to studying Facebook Reactions and how reactions may differ based on the gender of the post creator. I have also found papers that focus on facebook reactions. Because Facebook Reactions were released as a feature in 2016, the papers on the subject are limited, and I haven’t found any relating to gender. However, some papers I found analyze facebook reactions in a way that would be interesting to compare between the gender of the post creator. For example, one paper uses the reactions to measure the controversy of a post, so I could measure if posts by women are more controversial in general than that of men. I also found tools from some of these papers that I could use in my project.

CS 388 – Week 10 – Updates

with No Comments

After meeting with Xunfei, I decided to modify my diagram a bit so I redesigned it from my practice proposal. I also collected more papers that could be used in my proposal and read more articles and research papers. I also found an online tutorial of a project that is closely related to mine, so I enrolled in the course for free and downloaded the jupyter file to play with it on my own. I think doing this now will help me figure out some possible options and directions for next semester. I also made a timeline of my work for this semester as well as next semester. I asked Xunfei some remaining questions about the proposal and my project in general to clarify my thoughts. I also checked out the rubric for project proposal and brainstormed ideas for my first draft of proposal.

CS 388 – Week 10 – Updates

with No Comments

In order to get more familiar with neural networks I decided to use a program that lets you create neural networks. In order to do this I started reading about tensorflow and tensorflow graphs and their inner workings like variables, constants and operations. I read some tutorials on tensorflow and  also studied about the Keras model subclassing API which is one of the building blocks of tensor flow to start building a simple neural network.  I also read I also searched for more papers that are similar to my research and read Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification, Semantic Clustering and Convolutional Neural Network for Short Text Categorization in order to familiarize myself more with neural networks that are used for text classification.

CS388 – Week 10 – Updates

with No Comments
  • This past week I have been trying to find more research papers that discuss how to create a prototype fake news detector instead of papers that just talk about how to discern fake news from media.
    • The hunt for those kinds of papers is more difficult than the one for methods of fake news detection which tells me that my work in creating a user application for fake news detection is very much needed.
    • It also seems like a lot of these papers are geared specifically towards twitter and so hopefully my research can fill a gap.
  • I have also been trying to figure out a good timeline for myself both for this semester and the next. I do believe this project is feasible if I can just find some data sets in a timely manner.

Week 10

with No Comments


I have been looking more into the image processing part. I have created my first draft of the code to alter the colors of an image. I have also looked into the rotating of the food in the image. This seems not doable (in the way and scope I wanted to), so I changed my framework to take a video as an input, instead of an image. The video can then be split into images, and the images from the better angels will be picked. I have also written to the Gourmet Food  Dataset researchers to ask for their dataset, but have not received a reply yet. I have been looking at the yelp dataset. I have found an online project that assumed all images taken with DSLR cameras were good, and the rest wasn’t. This seems to have worked pretty well for the classifying. I will look into that.

CS388 – Week 10 – Update

with No Comments

During this past week, I have revised how I want to implement my social engineering attack. I want to use what is called a USB rubber ducky where you insert a MicroSD card into the USB. This card has payloads on it which you insert into the victims computer and then the payload is executed. Many different types of payloads can be written. These scripts are written in a language called duck script.

Charlie and I also discussed how to better implement my physical attack. This includes using a wireless adapter as well as ethernet cords to jack into ports around campus and see how easily I can get in.

CS388 – Week 9 – Update

with No Comments

I found a MFCC library in GitHub and explored it a little bit. It directly takes a wav file as input and returns one N*1 array (a sequence of acoustic vector). I recorded my voice and converted to a wav file. I briefly tested the code. It took my wav file and return an array containing a sequence of vectors. I will use this library in my project. But there are many related factors that i need to study. I also wrote the timeline for the rest of this semester and next semester. My next step is to keep working on this MFCC library and explore the Dynamic Time Warping library in GitHub. 

CS388-Week10-Update

with No Comments

I finished a first pass of all but one of the papers in my reading list, and also read some of the papers that are highly relevant to my project, which I had already read for a first pass, for a second or third pass depending on how relevant the content seemed. I have also outlined the introduction and motivation for my project, and am working through the related works section. Apart from the project proposal, I have also spent some time trying to find some ‘gadgets’ that will assist with the proof for a simpler variant of parks puzzle, which is an effort that has not yet borne much fruit.

CS 388 – Week 9 – Updates

with No Comments

This week, I worked more on my proposal outline. Tuesday morning, I met with somebody from EPIC to go over my grant application to go to GDC, at which I may try getting some playtesting data from my project from professionals. This morning, I met with Xunfei to look at my proposal outline before revising and finalizing it.

CS 388 – Week 8 – Updates

with No Comments

During this week, I continued work on my literature review after meeting with Xunfei. After finishing the review, I also started work on my proposal outline, and continued looking for more resources to use in my project. I specifically need to find more procedural generation source code for game stages, I’m fairly happy with my two music generation methods.

CS388 – Week 9 – Updates

with No Comments
  • I’ve been working on the Lit Review and Proposal Outline this week and I finally finished the Lit Review fully.
  • My Lit Review included sections on:
    • Data sets: what kind of data has been tested and what do they extract from the data to use as a way of identifying misinformation
    • Identification/Classification Methods: what approaches did people take to test the data and have it respond with whether or not it was fake news
    • Prototype Design Consideration: some papers outlined what a good prototype detector should have or be used for and that is something I want to deliver on so it was important I note what I found
  • The Proposal was a bit challenging
    • While drawing designs, I realized that there are a lot of parts to my idea (not that it’s unfeasible though) so I’ll have to sit down and not only figure out a good overall framework but good designs for all the smaller parts
    • I also struggled with the methodology, budget, and timeline section because my framework is very much in flux since I actually need to figure out what works before doing the meat of my project.

CS 388 – Week 9 – Updates

with No Comments

I am working on finding more papers that study gender bias in social media posts and narrowing my idea further. One challenge is finding a feasible way to collect data from a site (since APIs have limits), or finding an existing data set or web scraper that fits my needs. I am also looking for authors that have published their code for their work and/or who have described their methods in detail.

CS 388 – Week 9 – Updates

with No Comments

This past week I worked on revising my literature review as well as writing my proposal outline that will serve as a starting point for my first proposal draft. I met with my advisor who helped me to come up with a good starting dataset for my initial neural network. I will continue read about neural networks and maybe try to implement a simple one in the upcoming weeks.

CS388-Week9-Update

with No Comments

I went over the broad categories that need to be addressed by the project proposal, and created a proposal outline for Assignment 7. I also worked with my advisor, Igor, to find a good candidate for the reduction adn start working on the proof. We found that there was a natural way of reducing a subset of 3SAT, 2SAT with distinct variables was, to an instance of Parks Puzzle. For next week I hope to generalize the technique to a larger subset of 3SAT.

CS388 – Week 9 – Updates

with No Comments

This week, I changed my method from hybrid to content-based filtering because there isn’t much research done in the hybrid method. So I chose to improve the content-based filtering instead. I also wrote my proposal outline and revised my diagram with the help of Xunfei. I might explore more ways to improve the existing method and see if there is anything else I can add.

CS388 – Week 9 – Updates

with No Comments

For this week, I wrote my proposal outline. During the next two weeks I will use this to construct the 1st draft of my proposal. I also spoke with Charlie about taking a different approach to the social engineering aspect of my project. Most of these I have found YouTube videos to demonstrate and describe the process but I have yet to find any hard research.

I also constructed more details for implementing the technical part of my project. This will also be discussed with Craig.

CS388 – Week 8 – Update

with No Comments

I finished my proposal outline. The next step is to write my proposal draft. I also discussed with Xunfei and she help me drew a better flowchart. I gained a clearer understanding about the flow of my project. I downloaded a SDK of the iFlytek company’s voiceprint recognizer product for reference.

Week 9

with No Comments

During my weakly meeting with Igor, he brought to my attention a better way to increment the ranking algorithm. In the first round, a certain number of image processing techniques will be applied to the original image and the top 10 or so images will be passed on to the next round. For each round after, permutations of the image processing techniques will be applied to the images, and the next 10 winners will be promoted to the next round. This way, we can keep applying several techniques on the images, and find the best combination. The process would go on until either the machine has a confidence interval beyond a certain threshold, or a certain number of rounds have passed. The latter is important so the machine does not keep going for ever (or for too long) if the image is simply to bad be made decent. This brings me to the question of what to do if no food is found in the image. Should it return an error, or maybe apply the process to the image and see how it turns out? It is possible that the user submitted an image that contains an unusually morphed food, which the AI might not recognize as food, but still be able to make look good.

I also have heard about genetic algorithms, and will look into those as a safety net/supplement.

CS388 – Week 8 – Updates

with No Comments
  • I wrote my project proposal this week and I already had most of the info ready.
  • The sections that I didn’t feel really prepared for were the design and what software/hardware do you need sections.
    • I didn’t have a good idea of what should go into my design and what counts as a component
    • I’m also not sure what kind of software/hardware I need because I’m not completely sure what my own unique approach will be so I don’t know which software/hardware is the best for my approach yet

CS388 – Week 7 – Updates

with No Comments
  • After reading articles for my literature review, I see that there are a lot of different ways to identify and classify media with misinformation.
    • I will need to do a bit of work to be able to combine all the methods in a way that it will be able to identify and classify media of all different topics and types
  • I’ve also split up my project into smaller more manageable goals to accomplish
    • First Level Basic Goals:
      • Find a large enough dataset that is properly vetted as credible and one that is properly vetted as not credible
        • I want this dataset to be on a variety of topics
      • Find an algorithm/classifier that is accurate 80%-100% of the time on the dataset with a variety of topics
      • Find the key features that are the most reliable for classifying
    • Second Level Goals:
      • Expand the dataset to include pictures where the text was extracted from it
      • Re-test the algorithm/classifier to make sure a drop in accuracy hasn’t occurred
      • Re-test that the key features for articles can apply to the text within a picture
    • Third Level Goals:
      • Expand the dataset to include videos where they are transcribed as accurately as possible.
      • Re-test the algorithm/classifier to make sure a drop in accuracy hasn’t occurred
      • Re-test that the key features for articles can apply to the transcriptions
    • Fourth Level Goals:
      • Create an app/website where you can upload a piece of media and the app will use the algorithm/classifier and tell you if it is credible or not
    • Fifth Level Goal:
      • The app/website will keep a record of things that have been deemed credible or not 
      • Create a browser extension that will take the media from the current tab and check if it is credible
    • Sixth Level Goal:
      • The app/website and browser extension will scan and search for certain keywords set by the user and check new content that’s been uploaded

CS 388 – Week 6 – Updates

with No Comments
  • I’ve chosen Charlie as my adviser for my project
  • The idea that I picked for my Capstone is my Fake News Detection Idea
    • The basic idea is that I would create a website/application and a website extension that takes mediums as input and will tell the user if it is factual or not

Week 8

with No Comments

In the past few weeks I have settled on my topic being exploring gender bias on a website using a combination of computational linguistics and quantitative analysis. After writing my literature review on work that explored a variety of sites, I decided this week to focus on a social media site for my project. My next step is to explore more papers focused on analyzing social media and the APIs available for different sites in order to choose which site I want to focus on.

CS 388 – Week 8 – Updates

with No Comments

I read over 20 papers in the last two weeks to work on my literature review. I met with my advisor and talked to him about my proposal and what could be improved in my literature review. I have found some projects/papers that touch upon what my project aims to be. This will help me find and establish a starting point once I start working on my project. One of the challenges that I am currently facing is finding a dataset. I have come across a few datasets that I can use from kaagle.com.

This is what my initial model looks like (This does not delve deeper into how the neural networks are configured.)

Week 8

with No Comments

While preparing my diagram for the quiz, I got a much better conceptual understanding of what I want my project to look like. I have also found nice papers this week. I started thinking of a few different image processing techniques that might help with making the image, and picked a computer vision algorithm for my AI (AlexNet) . I also decided on an image ranking algorithm to decide which image to return, a binary comparison. I feel significantly more comfortable about my project now that I have a more concrete idea for my software architecture, even though I am still fuzzy on the implementation details/

CS388 – Week 7 – Update

with No Comments

This week, I wrapped up my first draft of the literature review. I’ll be meeting with the writing center as I embark on my final draft. I’ve continued working on nailing down my exact idea that I’ll be proposing, as well as looking into the available resources found throughout the papers I’ve read, from algorithms to source codes. I’ve done some basic work on a prototype game, but have been too busy to make much progress yet. I think a good portion of my project may include comparisons between different methods and combinations of methods between the PCG-G (different algorithms, mostly) and music generation (mostly grammar-based versus machine learning).

CS388-Week8-Update

with No Comments

This week I worked on improving my understanding of the Parks Puzzle and exploring possible proof techniques to show that it is NP complete from two directions. I continued working on the Time Complexity chapters of “Introduction to The Theory of Computation” by Michael Sipser to round out my theoretical understanding, while also solving many instances of the puzzle using an app on my phone. I came onto one general idea for the proof involving only ‘AND’ and ‘OR’ gadgets that I discussed with my advisor, who made some suggestions involving an ‘IFF’ gadget, which I am going to continue working on. I also received feedback on my literature review, which showed some significant problems that I corrected according to the grading rubric.

CS388 – Week 7 – Update

with No Comments

I finalized my proposal to “Applying Voiceprint Recognition Technology to Identity Verification”. The keywords are voice recognition, voiceprint, feature extraction, voice detection, voice verification. The difficulty I might encounter is that there may be background noise in the voice input. If the noise is loud, it may affect the feature extraction and voice recognition. I probably need to explore methods for removing noise. 

CS388 – Week 7 – Update

with No Comments

I did not add an update to week 6 due to the long weekend but had been working on my literature review, which I finished today. It was very useful to read and re-read certain articles and realise some are useful and some are not. I now have further inspiration with where I can take my idea and am happy with its process. I will be looking in the next week or two to start looking into potential technologies to use for my project, which currently seems to be leaning on public Python libraries.

CS 388 – Week 7 – Updates

with No Comments

I now know what idea I am going to go with, it’s a new idea and is not related to any of my old ideas. My new idea is about using neural networks and natural language processing to predict a better way to write emails or other forms of text in order to better engage the reader. This will be focused on business emails and other forms of business-related texts. I have read a lot of papers on neural networks in the past week and have spent most of my time writing my literature review on it.

CS 388 – Week 6 – Updates

with No Comments

I’ve finalized the base project idea – I’m going with the one involving music generating AI. Additionally, I’ve officially gotten a proposal adviser, Xunfei, who I’ll be meeting with every Wednesday morning. I’m having some issues in limiting the scope and application of my project, which I’ll be focusing on while I finish up my literature review. In terms of the review, I’ve read 9 of the 10 papers, so I only need to read the last one and put the notes I have into the literature review format.

CS 388 – Week 5 – Updates

with No Comments
  • The First Responders Tech idea is not panning out in terms of finding any relevant research articles so that may be way to difficult for me to achieve in this time frame
  • The Secure Paperless Voting Machine is working out in terms of finding research articles but upon reading a couple of the them, I’ve realized how complex the issues are. Both the hardware itself and the software need to be more secure. This might be too big of a project.
  • There are a lot of articles and info about my Fake News Detector idea and my Ancient Sites in VR idea so that bodes really well for the feasibility of those two ideas as my official capstone idea

CS 388 – Week 4 – Update

with No Comments

Updates in Ideas

  • My first “Fake News Detector” idea has remained mostly the same
  • I still have not come up with a more doable idea for the secure paperless voting machines because there’s so much oversight with voting regulations and laws and because there’s an added obstacle of different voting systems/mechanisms.
  • My “911 Tech” has hit a wall in terms of find research articles on it

Updates in Process

  • I talked to Charlie about my three ideas and he suggested a change from 911 specific tech to First Responders/Disaster Relief tech because they actually use open source technology and it might be more feasible to create something and find research on it.
  • I also created a fourth idea in case the First Responders idea doesn’t pan out. My idea was creating very realistic recreations of ancient archaeological sites that you can interact with in VR.

CS388 – Week 6 – Update

with No Comments

My project is using  Voice Print Recognition technology to check if the voiceprint of the input match the corresponding one in the database. This technology can be used in many identity verification scenes like customer services for bank, door lock, business transaction. The main steps approximately will be: take voice input -> (remove noise -> ) extract voice features -> building models with selected algorithms -> compare voice features -> check if voiceprint match. The possible algorithms might be: VQ, MFCC, DTW. 

CS388 – Week 6 – Update

with No Comments

I read more papers on my ideas, met with my advisor to schedule a weekly meeting and discussed a new idea proposed by the company I interned with this summer.

The new idea is a Neural Network driven A.I. that can learn and predict better ways to write emails, marketing campaigns and other forms of communication with the customer.

CS388 – Week 6 – Update

with No Comments

This week, I finalized who my advisor will be (Charlie). I also decided that I will be working on my security testing idea as my main project. To start this, I spoke with Brendan Post (IT) to discuss my ideas. He was happy to help me and I will be in further discussion with him as I move forward with my proposal.

I also found a couple other papers related to my idea. There was one that had much more lower level detail and actuially described the implementation of their testing. The researchers used Kali Linux to hack into a router through different ways such as SSH, Telnet, and SNMP. There were images that showed the commands they used. It was the first article I found to have a lot of low-level detail.

1 2