- This week I focused on getting part of the PolicialNews Data set from Castello et al. to work with Weka to be able to see if I can recreate the results used by their classification methods
- Downloaded a tool to combine excel files into one sheet without data loss, manually added headers and an extra column denoting which was fake and which was real
- But Weka still won’t load the data so that I can test it
- Next week I will focus on making smaller versions of the data set to see what features are the issue for Weka and testing features individually; I will also look into Keras as a machine learning tool and see what kind of testing can be done
CS488 – Week 6
This week I was trying to run two different TensorFlow models with checkpoints, however, I could get the checkpoints to work which is key to my project. After a discussion with Dave, I have decided to implement a simple model myself using the Keras library since it is more abstracted and well documented, so it shouldn’t take too much time. I will be aiming for a minimum of 50% accuracy with my model and the SemVal-2010 Task 8 Dataset, which I think is the best dataset choice for this task.
Following this implementation, I want then start testing my model outside its dataset.
Week 6
I did not manage to get the foreground (the food) with zero user interaction. I have achieved pretty good success at picking just the food with minimal user interaction. I will try it with a pure white background next.
I can now successfully manipulating individual contour areas.
For next week, I will (finally) work on learning more about food photography.
I I have most of the standard functionality working. Pretty much the only untouched coding is the resizing.
CS488-Week6-Update
This week I worked on finalizing my proof for the IFF and OR gadgets, and rewriting the final proof for the paper draft. I am now planning on working on an explicit algorithm for the reduction for use in the program.
CS 488 – Week 6
This week I started working on writing the first draft of the paper. The models seemed to run probably when I tried to test with the small data set. However the accuracy was not what I wanted since there were not enough data. I will tried to implement these models on an online cloud computing system while waiting for Layout to be available.
CS 488 – Week 6
Completed the paper outline. Working on the first draft as per the feedback received on the outline. Plan to complete the draft this week and present the first version of the project next week. I am still working on the QR code generation and scanning.
CS 488 – Week 5 – Updates
I wrote the outline of the senior project paper. It is similar to my proposal but i changed my modeling method. i need to rewrite the modeling from GMM-UBM to Convolutional Neural Networks. But I am still having bugs on the CNN resource I gained from GitHub. I am pretty sure I will use MFCC as feature extraction. But if i cannot get proper resource of CNN I probably need to change to DNN or others. But I will try my best cuz I want to use CNN.
CS488 Week 5
This week, I focused on writing the outline of the paper. I also have been using NLTK to extract features from the text reviews. I need to figure out how to handle a review that has multiple sentences since my feature extraction is currently only applied to one sentence.
CS488 Week 3
For this week, I followed a tutorial from Educative about Natural Language Processing and followed for data preprocessing. I was able to load the dataset and I am in the process of preprocessing it. I ran into a problem of sentences that do not contain any emotions at all, therefore need to be eliminated. Also, I need to extract the word ‘not’ from words such as ‘don’t’, haven’t’ for better indication of a negation. In the following week, I shall try to do so
CS488 Week 2
For the past week, I was able to skim through the Amazon Review datasets and chose Clothing, Shoes and Jewelry, on the criteria that its size is not too much. I then researched and learnt how to use pandas to import the dataset from its json type into a DataFrame. In the following week, I will continue to research and find out the way to extract the review of the data into multiple feature of interest that will be use to train our ML models with.
CS388 – Week 5 – Update
I read six papers for my three ideas. It was interesting that many research ideas have the same goal, but the ways they approach the problem are very different. For example, in the fall detection problem, some research ideas apply deep learning on images and videos, but others work on radio frequency instead of using the images. Another example of this is improving the performance of the Optical Character Recognition for Chinese text. My initial thought about how to solve this problem is to apply image processing techniques to improve the image quality and then deep neural networks. However, there was one paper approaching this problem from a different angle. They apply statistical natural language processing models such as N-gram in order to improve the accuracy of OCR. These ideas might help me come up with an approach that is different from the one I was thinking of doing.
CS488 – week 5 – Update
In the last week, I have spent more time learning about using fastai with a convolutional neural network, specifically the resnet34 and resnet50 models, which I think will be ideal for my purposes. I have also been working with Jordan to get the modules for this set up on Lovelace.
I have also been scavenger hunting for more data. I have data from Iceland, and a few confirmed spots on campus which I can use for both positive and negative training cases, but more data is better for this kind of AI. My search has lead me to reach out to Tom Hamm, Greg Vaughn, The Earlham Library, and the Geology department.
Over the next week, I will continue learning about fastai and start implementing my model to be trained over the data that I have already.
CS488-Week5-Update
The proof has been completed. This week I will work on writing it out rigorously, as well as designing the program.
CS 488 – Week 5
This week, I focus on the writing parts including the outline for the capstone paper. I also start implementing some neural network models for the image data set such as MobileNet and EfficientNet. I will try to test the model using a small sample training data set while waiting for Layout to be available so I can train the whole large data set.
CS 488 – Week 5
I finished writing my initial model and have trained it using a small sample dataset. The accuracy of my initial model is quite bad. I am currently researching how I can modify and improve the accuracy of my model as well as downloading a larger dataset to increase the accuracy. I also worked on the outline of the first draft of the final paper. Currently, the obstacle is the lack of accuracy of my model. This next week I will work on finishing up the first draft of the final paper as well as increasing the accuracy of my model.
CS 488 – Updates – Week 5
- I found a data set that would be the easiest to recreate results with
- I just need to merge the data set of credible news and the data set of non-credible news with an added column denoting whether it was real or fake to be able to test
- However, I ran into a major hiccup because Weka crashed and I can no longer open it.
- I am documenting the errors and trying to reinstall it and fix but because for some reason I can’ t delete some of the old folders and so I still have not gotten Weka to start back up again
- My hypothesis is that one of the packages is failing the whole opening process because I don’t have R on this machine
- I am also afraid of force deleting the folder because I have no idea how it will affect Weka or my computer
- I did successfully complete my outline but I do not have enough information to fill out the Results section and anything regarding my exact methodology.
CS488 – Week 5 – Updates
In the past week, I found and fixed a bug for cosine similarity calculation in the code I was referencing. I was able to obtain more accurate recommendations across different product types. I also thought about and planned for next week’s task.
Week 5
I am trying to succesfully detect the foreground object in the image, which will be food. While the initial idea was to use AI to detect the food in the image, this is not a completely solved problem yet. Since I can choose my own input, using the foreground image makes more sense. Charlie recommended a software called Image Magick, and I will look at that.
For next week, I want to finish the foreground, and start resizing the images, and passing them to the ranking AI.
CS 488 – Week 5 – Update
This week I spoke with Charlie and gained clarity into how Earlham handles guest connections through ECOpen and from there does not allow access to the network within Earlham but simply gives access to the outside internet. I also started my Social Engineering test on Thursday. I ran into some speed bumps with that and am currently resolving. For this next week, I plan to talk to Aaron in ITS about NMAP and its use to possibly check for any ports that are left open and could be vulnerable. I plan to meet with him sometime early next week.
CS488 – Week 5
This week I first worked on creating an outline for my final paper, which was useful as it sharpened my current understanding of my project and where it is headed. I also was working with a new model and was able to successfully train it, save checkpoints and load them. I also created basic pre-processing functions for my data to match the format of input sets.
Loading of weights did seem to not work with this model. When I reached a checkpoint with 80%+ accuracy and saved the weights, I followed up with loading the weights and feeding in test data from the dataset, but accuracy dropped to 5%. This was extremely confusing and is my priority to understand this week otherwise I will have to find another model.
CS 488 – Week 5
User login and authentication is complete. New user is able to sign and save details in the database. Sample set of 20 books to be used as testing data. Currently I am figuring out QR code scanning and generating new QR code’s. Next week scanning will work and books can be scanned and retrieved from DB.
CS 488 – Week 4 – Updates
TensorFlow is working fine on Lovelace now. But I just found that the demo uses TensorFlow 1 while the latest version installed on Lovelace is TF2………. The demo has a lot of code. I am not sure if i should work on this one and update all codes to TF 2, or just find another resource…… I talked to Xunfei, she told me to try other resource briefly. Because update that demo is not a small work.
CS488 – Week 3 – Update
I am still having issue on running the demo code from GitHub. I requested installation of TensorFlow in python3 on lovelace but it seems there’s still error. It is probably the issue of environment setting. I will communicate with the admins.
TensorFlow is working fine on Lovelace now. But I just found that the demo uses TensorFlow 1 while the latest version installed on Lovelace is TF2………. The demo has a lot of code. I am not sure if i should work on this one and update all codes to TF 2, or just find another resource…… I talked to Xunfei, she told me to try other resource briefly. Because update that demo is not a small work.
CS388 – Week 4 – Update
I am reading papers for my first idea, which is “Detect and Translate Chinese text in images”. One research that I read was about improving the performance of the Optical Character Recognition for Chinese books that are in precarious conditions. Instead of trying to enhance the image quality, their research applies N-gram, long short-term memory, and backward and forward N-gram statistics text model to develop a more accurate OCR model.
CS488 – Week 4
This week I did a lot of research and work on the more anthropological side of my project. I emailed Tom Hamm and Greg Vaughn and got some great information about where I could find the foundations of old buildings around campus that I could use for my project. This information will hopefully be detailed enough for me to create some labeled training images.
I also spent some time this week learning fast.ai, which I have settled on for now as the best option for identifying images. The library is extensively documented, and extremely robust. As soon as Layout or a similar machine is back up, I will be able to start testing code, but for now, learning the library is just as important.
CS 488 – Week 4
Last week I worked on collecting and preprocessing the data using Groupon API. I also started learning about and implementing my autoencoder model. So far the obstacle has been the learning curve but I have been extensively reading about neural networks and Keras and should be able to continue working on the project without any hiccups. Next week I plan to start my first draft of the paper as well as have a somewhat working version of the autoencoder model.
CS 488 – Week 4
This week, I started the data preprocess for my image data. The steps include, resizing, cropping, normalizing and lastly change to tensor value so that it can be fit in a neural network. For the numerical data set, I started looking into different algorithms which are not as computationally expensive as neural network such as k-nearest neighbor, support vector machine. In that way, I can test it when Layout server is still not available.
Week 4
I finished the ranking module. It take a folder of images, converts them to an array, passes the n best images to another function, which keeps processing the images, and then picking the best n again to be processed. There is no processing yet.
For the processing, I have started working on the genetic pixel changing for the image processing. I am reading the pixels into an array, and am changing each individual pixel. While it is (sort of) a genetic algorithm, I want the changes to be a little more intentional.
CS488-Week4-Update
This week I took a short break from working on the proof to start working on the app. I am currently trying to figure out whether it is worth designing the Parks app as a webapp, while also starting work on some of the basic modules.
CS 488 – Updates – Week 4
- I have spent this week analyzing the data sets that I have to see if there are any outside things that I need for these data sets to be able to be tested using Weka.
- I have found that some required me to have my app registered with Facebook Developers and Disqus and some were not actually in proper .csv format and so Weka (the tool that I am using to test classification methods) could not read it.
- This meant that I have a lot smaller pool of articles that I am able to replicate.
- I have found 27 different data sets but I haven’t read all the papers those data sets are used in and some of the papers that mention the data sets are just explaining how they created the data sets and not how to use them in this context.
- Because of all of these little setbacks, I am working on just finding smaller sample data to test Weka with, so that I can make sure Weka is working and I am focusing on recreating the results from Castello et al.’s work for the moment.
- Castello et al.’s data format is different than what I have used for Weka before and I have to do some more digging to see if I need to combine the fake news data set with the credible news data set for each year first before sending it through Weka, or if I can just open both within Weka and tell it how to find what it needs.
CS 488 – Week 4
In the past week, I worked on generating five recommendations from each of the the six product categories. I still have a confusion about the cosine similarity formula so I’m planning to meet with other faculties in the following week while keep working on the next task. Other than that, there wasn’t any obstacle and I just need to make the function return the results in a nice and clean way.
CS488 – Week 4
This week I made efforts to get predictions from my model that was trained last week. However, after some hours spent understanding the code, I realized that this model is not for practical use but rather theoretical predictions, as each query set requires a supporting set.
Following this setback, I have now found some models to train from a smaller dataset in comparison to FewRel. I believe these models are able to be used practically on random query sets. With the smaller training time required for them, I should be able to verify which is best for my project this week.
CS 488 – Week 4 – Update
This week, I continued my testing for the physical aspect of my project. During this testing, I tried to focus on ECOpen since it says there is no encryption associated with the network. Come to find out, there is still an authentication process that one must go through when trying to connect to ECOpen. So when I ran a packet sniffer on a device that was on an ECOpen channel, I could not see any data. (This is a good thing and was noted). I also finished preparing my social engineering test which will begin tomorrow, February 12th. This next week will consist of my social engineering test, processing results from physical test, and working on my paper.
CS 488 – Week 4
Working on the first draft of the paper.
Week 3
I have started working on passing images to the ranking algorithm.
I also have found some online food-photography courses I want to look at. Learning that will be helpful in knowing how to improve my images.
Update up to 2/5/2020
This past month i have been mostly working with getting everything for my project working and fighting some major issues. The first issue that has been almost solved is that the NorthStar uses DisplayPort out while my computer only has an HDMI port and a MiniDisplayPort in. Turns out HDMI outputs are not compatible with DisplayPort and so the adapter i got to do that does not appear to work. I am investigating getting the proper port by the end of this current week.
The other issue i had to fight was the fact that for a week and a half, i did not have my primary computer since it was broken and needed to be repaired. I had a much less good backup computer that i used to test the hypothesis above, so i was not entirely useless during that time.
This coming week and weekend, i am hoping to have a fully functioning environment running and have the ability to display tracked hands in the headset, depending on the availability of an adapter.
CS 488 – Week 3 – Updates
Last weekend, I spent time with a small group of friends filling out a spreadsheet of information for 2020 Senate candidates. So far, 154/348 filed candidates have been added to the sheet. During that time, we discovered that a few candidates operate their campaign on a public Facebook profile instead of a Facebook page. In talking with Charlie, he guessed that the process in the API to collect profile data shouldn’t be too different from page data. Therefore, I am planning to collect this data as well, while noting the names with profiles in case their results are drastically different from overall results. Next, I plan to develop the scripts to start collecting and analyzing small amounts of data, planning to scale and automize them later.
cs488 – Week 3
This week was a big planning week for me. I spent a lot of time writing down notes and ideas, as well as researching the details of what I need for my project. I also spent some time gathering resources for my project in the form of data from Iceland. A combination of 2018 and 2019 data will provide me a much-needed training/testing case.
I have progressed in my implementation, further streamlining the process of creating various edge detections of original images. This week I added the Prewitt edge detection algorithm and improved my Caney edge implementation to have a tight, wide, and auto mode.
I have also been researching technologies for image recognition via machine learning with multiple channels. This is the idea that a single “object” in the AI can have multiple images associated with it, and it is necessary for my project.
CS488 – Week 3
This week I was able to create a saved checkpoint of my learning model for semantic relation extraction. This hopefully means I won’t need to train it further and can now focus on feeding it my data, which now needs to be pre-processed before being fed into the model. A basic GUI window was also up and running this week with PyQt5 which was great to see! I will be writing more code in the coming weeks now so I need to ensure that my project files are organized.
CS 488 – Week 3
This week, I tried to implement some models and was hoping to get it on our Layout server with GPUs. However, the system admins were still working on that and I could not ssh to the server. Therefore, I created a google cloud free trial account and started writing and testing my model on their server.
CS 488 – Week 3
Since my project involves a significant part that’s marketing, I was advised by my instructor to talk to Seth and other professors about how I should approach a dataset. After talking to them, I have decided that a good approach would be creating a dataset using the readability formulas. First I will calculate the average readability and then filter the dataset using that average readability. A marketing dataset has been extremely hard to find, but asking around has led me to the Groupon API – it lets me get 100 deals per second which will help me easily scrape millions of deals in a few days. I plan to run a script in the background that does it. Since last week, I have also successfully implemented word2vec using Genism – a python library.
CS 488 – Week 3 – Update
In the past week, I worked on calculating the cosine similarity between the ingredient composition of an inputted item and that of the rest of the items in the data. I am struggling to decide on which formula to use for this, since the related project used the equation different from the “typical” formula used to compute cosine similarity. I will need to look into this more next week.
CS388 – Week 3 – Third Idea
- Name of Your Project
A Real Time Fall Detection System to Assist the Elderly Using Deep Neural Networks
- What research topic/question your project is going to address?
The elderly have a high chance of falling and get injured or faint. This might put them to danger if they are alone. One way that can help the elder people is having a system that can monitor their actions, detect the falling action and other behaviors after falling down, classify the levels of severity and send an alert to their emergency contacts or the emergency room if the level is serious.
- What technology will be used in your project?
Deep learning, pattern recognition, image processing
- What software and hardware will be needed for your project?
Python, PyTorch (or Keras)
I might also need a CCTV camera if I decide to build the actual device.
- How are you planning to implement?
First I will apply some image processing techniques to enhance the images and videos quality. If the dataset is small, I will use of image data augmentation techniques to produce more data. Then train the model that detect the person falling in the photo frame using deep neural networks, then use the people falling photos and videos to train a model that classify the level of severity. When the index of severity passes a threshold, send out the alert.
- How is your project different from others? What’s new in your project?
There are several projects that work on the similar problem. Most of them work on detecting the falling action only. In this project, I hope to build a system that is more detail and can decide whether it is an emergency case.
- What’s the difficulties of your project? What problems you might encounter during your project?
I might not be able to find a big enough dataset to train the model.
CS488-Week3-Update
I worked with possible ways of proving that non-contiguous Parks is NP-Complete, and found one good avenue for exploration. Over the week I produced a general technique to convert any instance of 3-SAT to an instance of the non-contiguous Parks Puzzle, thus proving that it is NP-Complete, our first major result. I am working to modify the proof, or try similar techniques for the contiguous case this week.
CS 488 – Update – Week 3
- I have started to keep a log of what I do every day for this project so if something goes wrong I know where to back up and begin again. This will also help later when writing about my process for the poster/paper
- I have started mapping all the datasets I found to what papers used them so that I could figure out which papers I could replicate
- I have started trying to replicate papers as well using Weka just to make sure I’ve set up everything correctly so that I can properly set up my own tools
- I’m having issues with how vague all the research papers are, however. So I think to fix that issue, I’ll need to email the researchers which more questions so I can actually replicate them and know what tools they used.
Idea 3
- Name of Project
Automating laptop checkouts from CST front desk using image recognition
- What research topic/question your project is going to address?
Although we need a human to address the needs of guests in the welcome desk of CST, it would be ideal for the worker and students if we can automate the process of MACs’ checkout. Humans are prone to error and we do not want any student worker to be liable of errors that could cost them thousands of dollars. So this project would allow a machine to handle the checkout using a camera to identify the laptop and the student wishing to check out the laptop and remove the process from the desk worker completely.
- What technology will be used in your project?
Image recognition, machine learning models,
- What software and hardware will be needed for your project?
This would need a good quality camera, python, and some database management software
- How are you planning to implement?
Have a camera stationed above the cst desk. Also I think it would be beneficial to change the barcodes in the laptops to bigger QR codes for easy recognition and better visuals for the camera. Use various machine learning models to train the software to recognize students and identify unique laptops. This product should also send out emails regarding reservation details to the students like the current system does.
- How is your project different from others? What’s new in your project?
This is different implementation from the current process we have in that we are removing human responsibility from this procedure. This will hopefully reduce human error in the process and decrease financial liability to student worker and the institution. It is also scalable to other use cases (like Runyan desk for example) to increase automation and improve efiiciency.
- What’s the difficulties of your project? What problems you might encounter during your project?
The problem I anticipate is making sure the model I have does not mis-identify students checking out the laptop or mistaking someone walking by the cst desk as someone checking out a product. Lighting might also be some issue as the desk is besides huge windows and so lighting is very different in night vs day, or summer vs winter. Another issue to consider is the camera quality (need to get good camera under reasonable budget)
Idea Number 3
- Name of Your Project
Ans: SARS
- What research topic/question your project is going to address?
Ans: Using trained neural nets to be able to tell when a statement/sentence is sarcasm
- What technology will be used in your project?
Ans: NLTK and
- What software and hardware will be needed for your project?
Ans: Botmock is the only software that will be needed for this project
- How are you planning to implement?
Ans: I plan on making this an extension of Botmock
- How is your project different from others? What’s new in your project?
Ans: With my project, I am using the same method of using CNN model hierarchy when it comes sentiment analysis to learn the context and space in which the sentence exists
- What’s the difficulties of your project? What problems you might encounter during your project?
Ans: Every sarcasm exist in a defined space one difficulty of this project is trying to build a barrier for that space. Another problem would be getting access to Botmock’s API to make this application compatible.
CS 488 – Week 3 – Update
This past week I have really dug into my physical testing. Using Kali Linux and a wireless adapter (supports monitor mode), I was able to use commands to see which networks were available and from there, I could see all of the clients connected to each network. However, I only could see the BSSID (MAC Address) of each device, nothing more. I then went in to WireShark which showed me a little more data. I could potentially see what type of device it was. However, all data was encrypted in ECSecure. Trying to break the encryption was hard as we have hundred of users with different passwords. It’s not just a single password for the ECSecure network (that would be too easy to break). I plan to continue this testing and see what else I can find through ECOpen.
I have also started to set-up my Social Engineering experiment that way everything is ready when the start date arrives.
CS 488 – Week 3
I worked on the login page and setup and almost done with the forget password setup. I have to decide on the database for the login, whether it will be single data base or sql for the whole application. I plan on working on the User interface features in the coming week. This involves setting up a db to store books, setting storage attributes etc. This is the main portion of the project and shall take the most time.