Hello, I am a senior Computer Science major graduating in May 2022. This page will host my capstone project. My primary interests are cybersecurity, machine learning, and system administration. My project is using machine learning to identify malicious URLs.
Uniform Resource Locator (URL) is a link to a website. A Malicious URL leads to a malicious website that has been designed or is used for the sole purpose of causing harm to the user. In recent years malicious website attacks have been ranked first in the top 10 cyber-attack techniques. Before the advent of machine learning, the common method for identifying and limiting access to malicious websites was a blacklist. In the early days of the internet, with fewer websites, crowdsourcing these blacklists was an efficient and robust solution. Now it is impossible to maintain an exhaustive list. Other methods have been implemented as extensions of blacklisting, such as heuristics, where a signature is given to common attack types and then web scanners can look for these signatures. As the technology for finding attacks has advanced so have the attacks. Google estimates that 30 trillion unique URLs are currently in existence. This sheer number of URLs combined with evolving attack techniques has proven blacklisting to be slow and rigid. Previous studies in using machine learning to identify malicious URLs have primarily focused on batch learning methods. While this allows quicker processing of large amounts of data it can be limited in identifying new attack types and can be circumvented by attackers disguising their attacks. This project will utilize online learning as a way of being more adaptable and use preprocessing techniques to help avoid attacker obfuscation.