My project aims to develop models where we can predict the risk of having cancer based on both numerical data and image data. After training all the models, they will be analyzed to see which has the best accuracy and possible ideas to improve the accuracy. After that we can decide the best model to use if we want to predict the risk of having cancer.
This week I started working on writing the first draft of the paper. The models seemed to run probably when I tried to test with the small data set. However the accuracy was not what I wanted since there were not enough data. I will tried to implement these models on an online cloud computing system while waiting for Layout to be available.
This week, I focus on the writing parts including the outline for the capstone paper. I also start implementing some neural network models for the image data set such as MobileNet and EfficientNet. I will try to test the model using a small sample training data set while waiting for Layout to be available so I can train the whole large data set.
This week, I started the data preprocess for my image data. The steps include, resizing, cropping, normalizing and lastly change to tensor value so that it can be fit in a neural network. For the numerical data set, I started looking into different algorithms which are not as computationally expensive as neural network such as k-nearest neighbor, support vector machine. In that way, I can test it when Layout server is still not available.
This week, I tried to implement some models and was hoping to get it on our Layout server with GPUs. However, the system admins were still working on that and I could not ssh to the server. Therefore, I created a google cloud free trial account and started writing and testing my model on their server.
This week I have looked at some papers of most recent models for classifying images to build for my dataset. I encountered some challenges while reading those papers since there were terms that were hard to understand. Next week, I will continue to work on the image dataset and model.
I go through the project again because it has been a while since I had CS 388 last Spring. I downloaded the data set and started doing some data manipulation and preprocessing. I will start looking at the models for image data set next week.
- Found some more research papers to read which were related to the topic that I chose.
- Worked on the literature review.
In past week, I have spent time doing research to find interesting papers that I think will be related to my ideas. For each of my ideas there are at least five papers to take a look on at thee moment. However, for the last idea about a mobile application for time scheduling, I could only find some topics about effective time scheduling and mostly are just about general mobile application development. I have also read some paper for the first pass in order to briefly know what the authors were proposing in their papers and how well are those paper fit to my topics. I am also looking for available data sets for my first two ideas.
I have talked with Ajit about all three ideas and received some feedbacks and suggestion. Also, I have started finding research papers that are related to my ideas to read for next week.
Idea 1: Optical Character Recognition using Machine Learning
Optical character recognition is not a completely new field of research. However, with the advance of today’s technology, we can apply machine learning to optical character recognition and therefore can recognize not only the printed characters but the more difficult handwritten characters or characters from different type of documents such as certificate or receipt. With the help of OCR, a lot of human work can be reduce. I would like to research about that specific use of machine learning in OCR to handle those harder cases.
Idea 2: Disease prediction
People have always want to know their health status therefore they can prepare and take a better care of their body. It is not difficult to know a person’s current health status since he/she can just go to hospital to have a general medical check. However, people also want to know what type of disease they may have the future, this is when disease prediction using machine learning will be useful. I would like to research about this topic but the first things I found to be difficult will be how to get the appropriate the training data set.
Idea 3: Time management application
Since students sometimes need help with planning the classes and studying for exams, I want to create an application that at first can suggest the best time for students to go class or what time to study. After that, each term, the student can input how well did he/she perform on those classes/exams and from that in the next term/semester the application may suggest a more specific timetable for that specific students to have a better performance.