Idea #1
Using machine learning to identify if a webpage is malicious, sometimes websites are blacklisted and that’s how they are identified as malicious but its cumbersome to do that for every website and constant new sites, use ML to identify malicious sites based on keyword density and improve upon existing methods. Other factors that could be used to identify the malicious website are URL length, website age, country of origin. Identifying the most important features to use for ML will be key to the project. A nuance I could add would be to identify the type of attack associated with the URL and rank its severity. Short URLs are a way that malicious attackers attempt to circumvent detection. Being able to expand short URLs in order to extract features could allow for current tools to be more effective.
Idea #2
Calculate expected goals of premier league soccer teams. Expected goals is commonly used as a predictor to help analysts identify skillful players and predict the winning team. There is a mass of datasets to use and techniques that could be analyzed for efficacy and improved upon. A possible nuance I could add is comparing expected goals of a player to their wages or expected goals to the teams’ total wage bill to find efficient teams.
Idea #3
Using machine learning to identify network attacks specifically DOS attacks. Most current methods use huge and cumbersome MIB databases. I would explore more efficient and less time and resource-consuming methods for classifying the data and identifying anomalies within network traffic. Data can be classified by where it comes from, to help determine if it may be malicious. There is less specific research on this topic as most of it is specific to a domain or the data is private.