Capstone Poster – Who’s Fake News
Here is the poster I created about my capstone project. I do plan to continue this project in the future so I will update the poster in the future.
Capstone Paper – A Functional and Scale-able User Platform for Automatic Fake News Detection
Here is my paper for the Capstone Project that I turned in on May 3rd, 2020. Hopefully, in the future, I will have an updated version.
CS488 – Elevator Pitch
Information informs our entire lives. Information shapes public opinion which shapes things like public policy, elections, the health and safety of the public, and more. No one is above the harm that can come from misinformation, which is why we need to fight against its spread.
Fake News as an area of research is relatively new and so some of the aspects are not very well researched, and new aspects to research pop up. Some existing problems in this research are that all of the solutions to these aspects are made in isolation, therefore no one solution can be used to find all instances of fake news, and that most solutions do not have an accessible, comprehensive user platform to disseminate their solution to the people.
This solution that I will provide will be a functional model of a user platform that demonstrates how an engaging and accessible one-stop-shop for fake news detection can work. It allows the user to interact in many different ways that require different levels of effort and is able to scale to include many different automatic detection methods.
CS 488 – Update – Week 6
- This week I focused on getting part of the PolicialNews Data set from Castello et al. to work with Weka to be able to see if I can recreate the results used by their classification methods
- Downloaded a tool to combine excel files into one sheet without data loss, manually added headers and an extra column denoting which was fake and which was real
- But Weka still won’t load the data so that I can test it
- Next week I will focus on making smaller versions of the data set to see what features are the issue for Weka and testing features individually; I will also look into Keras as a machine learning tool and see what kind of testing can be done
CS 488 – Updates – Week 5
- I found a data set that would be the easiest to recreate results with
- I just need to merge the data set of credible news and the data set of non-credible news with an added column denoting whether it was real or fake to be able to test
- However, I ran into a major hiccup because Weka crashed and I can no longer open it.
- I am documenting the errors and trying to reinstall it and fix but because for some reason I can’ t delete some of the old folders and so I still have not gotten Weka to start back up again
- My hypothesis is that one of the packages is failing the whole opening process because I don’t have R on this machine
- I am also afraid of force deleting the folder because I have no idea how it will affect Weka or my computer
- I did successfully complete my outline but I do not have enough information to fill out the Results section and anything regarding my exact methodology.
CS 488 – Updates – Week 4
- I have spent this week analyzing the data sets that I have to see if there are any outside things that I need for these data sets to be able to be tested using Weka.
- I have found that some required me to have my app registered with Facebook Developers and Disqus and some were not actually in proper .csv format and so Weka (the tool that I am using to test classification methods) could not read it.
- This meant that I have a lot smaller pool of articles that I am able to replicate.
- I have found 27 different data sets but I haven’t read all the papers those data sets are used in and some of the papers that mention the data sets are just explaining how they created the data sets and not how to use them in this context.
- Because of all of these little setbacks, I am working on just finding smaller sample data to test Weka with, so that I can make sure Weka is working and I am focusing on recreating the results from Castello et al.’s work for the moment.
- Castello et al.’s data format is different than what I have used for Weka before and I have to do some more digging to see if I need to combine the fake news data set with the credible news data set for each year first before sending it through Weka, or if I can just open both within Weka and tell it how to find what it needs.
CS 488 – Update – Week 3
- I have started to keep a log of what I do every day for this project so if something goes wrong I know where to back up and begin again. This will also help later when writing about my process for the poster/paper
- I have started mapping all the datasets I found to what papers used them so that I could figure out which papers I could replicate
- I have started trying to replicate papers as well using Weka just to make sure I’ve set up everything correctly so that I can properly set up my own tools
- I’m having issues with how vague all the research papers are, however. So I think to fix that issue, I’ll need to email the researchers which more questions so I can actually replicate them and know what tools they used.
CS 488 – Update – Week 2
- This week I recovered all of the data sets I found last semester that were on my other computer. I then downloaded and extracted the data.
- I also chose to set up my own Developer SQL database on my laptop so that I can keep my training data and the user data in one accessible place.
- Because I wasn’t able to have my mentor meeting last week, I wasn’t sure where to begin with all the work I’ve set up. So I’ve decided to go back through all of my notes on the research papers I have read and create a giant spreadsheet detailing the tools used, features used, classification methods used, whether the dataset or the code was available, and if I’ve contacted the authors of these papers for more info.
- This will help me figure out how I’ll need to create the learning loop to not forget any feature or method.
- This will also help me show my advisor exactly what was in previous work and what I have to build off of
CS 488 – Week 1 – Update
- I bought a new computer over the break because my older one was unreliable and crashed unexpectedly from time to time. So I spent this first week setting up the computer and downloading the tools that I believe I’ll be using.
- I also have spent a lot of time hunting down the data sets from the research papers that I have read and have a collection of over 22 different fake news data sets.
- I created my presentation slides which helped me think about the project in a different way since I need to think about how to explain things in a way that will make sense to everyone and not just myself.
- Finally, I chose my adviser and set up a meeting time and shared notes space but we were unable to meet this week since she will be at a conference.
CS 388 – Week 14 – Updates
- I spent the vast majority of this week looking for projects that have specifically detailed how they implemented a fake news detector and reading through the articles I’ve already found.
- While some have given a lot more detail on their process, unfortunately, I can’t understand some of the details.
- A lot of the details go into the mathematical aspects of machine learning and convolutional neural networks. That’s very difficult for me because math is not my strong suit.
- I will either have to find a tutorial that will actually explain it well or I might have to compromise my big goals for this project. I need help finding papers or tutorials that clearly explain their processes so I can move forward in the way that I want to.
CS 388 – Week 13 – Updates
- I focused this week on fixing my first proposal.
- I re-did all of my diagrams so that they would use the proper shapes
- I re-wrote my design section
- I add more to my introduction to better explain the importance and the gaps
- I elaborated about the timeline and gave a high level overview by month
- I also did research into the postgres database using SQL because that seems like the best tool for my project.
- Next week over the break, I hope to go more in depth into my readings and start to finalize the tools I want to use
CS 388 – Week 12 – Updates
- I spent a lot of time this week trying to closely read the texts I’ve found already to try and find any mention of the data set they are using. This was very difficult because the research articles usually don’t name what their data set was called or don’t explain where to find the data set they were using. There is not a lot of details in these papers about the researchers’ process and methodology in a way that would allow me to replicate their results. This made finding fake news data sets extremely difficult. However, through the close reading and intense web searches, I have found 21 fake news related data sets.
- I also spent a lot of time researching what would perhaps be the best machine learning tool to use for my project. I’ve narrowed it down to these possibilities: Oryx 2, Tensorflow, Azure ML Studio, Weka, Shogun, AWS CLI, TensorBoard, Kerras, Caffe2. I think that I might be able to use more than one for my project to get the best results but more research still needs to be done about which tool is better for the type of data set I have (which is not a timeseries data set).
CS 388 – Week 11 – Updates
- Started my work in reviewing new found research that has more relevant research about a fake news detector application
- Finished my first draft for my project proposal
- I will need to update my related works section because of the new research I’ve found
- After the peer review session, I will need to go back and redesign my figures so that they are easier to read
- Started finding datasets which is very difficult because the research papers never tell you where to find the data set they use and often they never name the dataset either.
- However, I was able to find some github repositories with datasets and some websites of the authors in the research papers that actually linked to the dataset of their works
CS388 – Week 10 – Updates
- This past week I have been trying to find more research papers that discuss how to create a prototype fake news detector instead of papers that just talk about how to discern fake news from media.
- The hunt for those kinds of papers is more difficult than the one for methods of fake news detection which tells me that my work in creating a user application for fake news detection is very much needed.
- It also seems like a lot of these papers are geared specifically towards twitter and so hopefully my research can fill a gap.
- I have also been trying to figure out a good timeline for myself both for this semester and the next. I do believe this project is feasible if I can just find some data sets in a timely manner.
CS388 – Week 9 – Updates
- I’ve been working on the Lit Review and Proposal Outline this week and I finally finished the Lit Review fully.
- My Lit Review included sections on:
- Data sets: what kind of data has been tested and what do they extract from the data to use as a way of identifying misinformation
- Identification/Classification Methods: what approaches did people take to test the data and have it respond with whether or not it was fake news
- Prototype Design Consideration: some papers outlined what a good prototype detector should have or be used for and that is something I want to deliver on so it was important I note what I found
- The Proposal was a bit challenging
- While drawing designs, I realized that there are a lot of parts to my idea (not that it’s unfeasible though) so I’ll have to sit down and not only figure out a good overall framework but good designs for all the smaller parts
- I also struggled with the methodology, budget, and timeline section because my framework is very much in flux since I actually need to figure out what works before doing the meat of my project.
CS388 – Week 8 – Updates
- I wrote my project proposal this week and I already had most of the info ready.
- The sections that I didn’t feel really prepared for were the design and what software/hardware do you need sections.
- I didn’t have a good idea of what should go into my design and what counts as a component
- I’m also not sure what kind of software/hardware I need because I’m not completely sure what my own unique approach will be so I don’t know which software/hardware is the best for my approach yet
CS388 – Week 7 – Updates
- After reading articles for my literature review, I see that there are a lot of different ways to identify and classify media with misinformation.
- I will need to do a bit of work to be able to combine all the methods in a way that it will be able to identify and classify media of all different topics and types
- I’ve also split up my project into smaller more manageable goals to accomplish
- First Level Basic Goals:
- Find a large enough dataset that is properly vetted as credible and one that is properly vetted as not credible
- I want this dataset to be on a variety of topics
- Find an algorithm/classifier that is accurate 80%-100% of the time on the dataset with a variety of topics
- Find the key features that are the most reliable for classifying
- Find a large enough dataset that is properly vetted as credible and one that is properly vetted as not credible
- Second Level Goals:
- Expand the dataset to include pictures where the text was extracted from it
- Re-test the algorithm/classifier to make sure a drop in accuracy hasn’t occurred
- Re-test that the key features for articles can apply to the text within a picture
- Third Level Goals:
- Expand the dataset to include videos where they are transcribed as accurately as possible.
- Re-test the algorithm/classifier to make sure a drop in accuracy hasn’t occurred
- Re-test that the key features for articles can apply to the transcriptions
- Fourth Level Goals:
- Create an app/website where you can upload a piece of media and the app will use the algorithm/classifier and tell you if it is credible or not
- Fifth Level Goal:
- The app/website will keep a record of things that have been deemed credible or not
- Create a browser extension that will take the media from the current tab and check if it is credible
- Sixth Level Goal:
- The app/website and browser extension will scan and search for certain keywords set by the user and check new content that’s been uploaded
- First Level Basic Goals:
CS 388 – Week 6 – Updates
- I’ve chosen Charlie as my adviser for my project
- The idea that I picked for my Capstone is my Fake News Detection Idea
- The basic idea is that I would create a website/application and a website extension that takes mediums as input and will tell the user if it is factual or not
CS 388 – Week 5 – Updates
- The First Responders Tech idea is not panning out in terms of finding any relevant research articles so that may be way to difficult for me to achieve in this time frame
- The Secure Paperless Voting Machine is working out in terms of finding research articles but upon reading a couple of the them, I’ve realized how complex the issues are. Both the hardware itself and the software need to be more secure. This might be too big of a project.
- There are a lot of articles and info about my Fake News Detector idea and my Ancient Sites in VR idea so that bodes really well for the feasibility of those two ideas as my official capstone idea
CS 388 – Week 4 – Update
Updates in Ideas
- My first “Fake News Detector” idea has remained mostly the same
- I still have not come up with a more doable idea for the secure paperless voting machines because there’s so much oversight with voting regulations and laws and because there’s an added obstacle of different voting systems/mechanisms.
- My “911 Tech” has hit a wall in terms of find research articles on it
Updates in Process
- I talked to Charlie about my three ideas and he suggested a change from 911 specific tech to First Responders/Disaster Relief tech because they actually use open source technology and it might be more feasible to create something and find research on it.
- I also created a fourth idea in case the First Responders idea doesn’t pan out. My idea was creating very realistic recreations of ancient archaeological sites that you can interact with in VR.
CS 388 – Week 3 – Updates
Summary of Updates:
This week I spent a lot of time trying to refine my last two ideas to make it more suitable for a capstone and tried finding enough research articles about all three ideas. I struggled with finding related articles to my second idea so I came up with a fourth idea that was easy to research. Below I’ll put all four of my ideas that I have right now.
First Idea: Automatically Detecting Fake News
- Create a database of media of different opinions that are credible with truthful information.
- Using Machine Learning, find/develop an algorithm that when given a medium could accurately determine whether it is credible and truthful.
- Create an app/website/browser extension as a user interface where you can manually input the medium you want to check, or follow certain keywords and be alerted when new credible content with those keywords has been created, or tell you when a medium you are currently looking at is not credible and link you to credible sources of both opinions.
Second Idea: First Responder’s Tech
- Disaster relief first responders, like 911 operators, would benefit from a tech upgrade as well and they actually use open source software.
- I would like to set up a text system for those responders because you can handle more texts at once than you can calls and sometimes it’s easier to text for help.
- With texts, it would also be easier to automatically send location data to the responders
- I would like to create a technology that uses the phone’s barometric pressure sensor and gps location tech to be able to tell first responders exactly where and what floor a person is on.
Third Idea: Secure Paperless Voting Machines
- The paperless voting machines that we have at this very moment have been proven to have many flaws and vulnerabilities and are easy to hack into. Given the interference of the Russian in the last presidential election, this is extremely concerning.
- I would like to create a paperless voting method that would allow people unable to securely cast their ballot and have the capabilities to have a secure virtual caucus for those unable to make it to the physical location.
- I would make sure all data is encrypted to the highest standards we have at the moment and that all vulnerabilities noted in our current methods and efforts are addressed.
Fourth Idea: Virtual Recreation of Ancient Sites
- world in an immersive environment without having to damage the sites and spend money building on top of the actual ancient stones and what not
- People should be able to walk into places and into rooms
- People should be able to hear sounds as well
- Perhaps incorporate smell???
- People should be able to walk up stairs!
- There should be an interactive element that will give you facts and info about a specific thing that you touched or selected
- This could be used as an educational tool in classes and could be used by museums so that they could give back the original pieces and works to the countries that can safely house them and use the VR with recreations
CS388 – Week 2 – Three Ideas
First Idea: Credibility Analyzer
- Create a database of media of different opinions that are credible with truthful information.
- Using Machine Learning, find/develop an algorithm that when given a medium could accurately determine whether it is credible and truthful.
- Create an app/website/browser extension as a user interface where you can manually input the medium you want to check, or follow certain keywords and be alerted when new credible content with those keywords has been created, or tell you when a medium you are currently looking at is not credible and link you to credible sources of both opinions.
Second Idea: 911 2.0
- 911 operating centers are underfunded, understaffed, and under-equipped. Uber and Lyft have a better GPS location system than 911 does.
- I would like to create a free GPS location service that works 90%-100% of the time (instead of 10%-90%) and can also locate what floor of a building someone is on
- I would also like to create a way to text 911 and have that text automatically send the coordinates of where your phone is to 911 without the user having to type it themselves
- I would like to create a system that can screen texts to see if it’s something simple that an AI could handle and get an immediate response or if it’s too complex and needs to be sent to an actual person.
Third Idea: Secure Paperless Voting Machines
- The paperless voting machines that we have at this very moment have been proven to have many flaws and vulnerabilities and are easy to hack into. Given the interference of the Russian in the last presidential election, this is extremely concerning.
- I would like to create a paperless voting method that would allow people unable to securely cast their ballot and have the capabilities to have a secure virtual caucus for those unable to make it to the physical location.
- I would make sure all data is encrypted to the highest standards we have at the moment and that all vulnerabilities noted in our current methods and efforts are addressed.
CS388 – Week 1 – First Idea
Name of My Project
Who’s Running?: In depth look at who’s running for what, when, and info about them.
What research topic/question is my project going to address?
- How to reduce number of uniformed voters?
- How to dynamically update a website when something with certain keywords is uploaded on the web?
- How to do that accurately without mistaking an article for something else?
- How to make sure sources being provided are credible?
- How to make sure that sources from both sides are presented and credible?
- How to make the registration process easier and encourage people to vote?
What technology will be used in your project?
- Website: HTML, CSS, Javascript
- Content Change Detection and Notification Service
- CRON job with scripting?
- FeedWelder?
- Scripting
- MDM Notifications API
What software and hardware will be needed for your project?
- Laptop
- Possibly access to CRON and FeedWelder
How are you planning to implement?
- The implementation would be deploying a live website that updates automatically and alerts users of important dates and info and if new info is on the site.
How is your project different from others? What’s new in your project?
- It’s a one stop shop and automatically updates with more detailed info on how candidates stand on certain issues, their track record, and articles pro and against them.
- This also makes sure the sources about each candidate is actually credible and not peddling false info
- This will also alert people when new information is out there and important election/registration dates
What’s the difficulties of your project? What problems you might encounter during your project?
- Making it update automatically reliably
- Making sure it successfully finds only credible sources
- Making sure my script parses the info correctly onto the website
- Making sure the new info it finds is accurate and about the right people