Detecting Fake News Using Machine Learning

Introduction

Hello, My name is Ihsan Alaeddin and I am a senior Data Science major graduating in December 2023. This page will host my capstone project. My primary interests are software engineering, machine learning, and finance. My project is using machine learning to identify fake news.

Abstract

My plan is to use Machine Learning to detect fake news. This is done using several tools, such as training sets and classifier algorithms. This algorithm will allow the user to type (or copy and paste) the text from an article they have. Then the algorithm will detect what the most relevant frequent word is while also transferring the text to the five different classification algorithms, which, based on the majority, returns whether the text is fake or not. The plan is better described in the image below. Moreover, I implemented four different classifier algorithms, in which each classifier would give its own prediction, with an accuracy rate, to whether the text is fake or true. Once the predictions are out, the algorithm gives the final result based on the majority of the four different classifiers. If the predictions for the classifiers would be tied, it would return that the article is Fake, since two algorithms deemed that it is fake.

The four different classifiers are, Passive Aggressive Classifier, Decision Tree Classifier, K-Neighbour Classifier, a Logistic Regression Classifier. A Passive Aggressive Classifier is a learning algorithm, in which it keeps updating itself the more data it is fed, when the algorithm predicts correctly it stays passive, however, if it predicts incorrectly it becomes aggressive and changes the way it’s set up to predict more accurately. A Decision Tree Classifier is structured like a tree, where the raw data we have is compared to the root, and based on that comparison we go to the next node, and we keep comparing until it is classified. The K-Neighbour Classifier works by finding the distances between a query and all the examples in the data, selecting the specified number of examples closest to the query, then votes for the most frequent label or averages the labels. The Logistic Regression Classifier uses the weighted combination of the input features and passes them through a sigmoid function. Sigmoid function transforms any real number input, to a number between 0 and 1.

Data Visualization Diagram

Research Poster

GitLab

https://gitlab.com/ihsanalaeddin0/senior-capstone/-/blob/main/News_Detection_Algorithm.py

Software Demonstration video

Software Paper

https://drive.google.com/file/d/1B-55VpstxpQECJw2L0MQY-KXzpfrv3fw/view?usp=sharing