With the non-stop improvements in technology, more and more fields are trying to apply computer science to achieve their goals in a more efficient and less time consuming way. Sports are no outsiders to this group of fields.
In sports, especially in soccer, technology has become an essential part. Soccer experts now make use of technology to evaluate a player’s or a team’s performances. Other than using their experience and their management abilities after many years being parts of the game, the soccer coaches also use statistics from data providers to improve their knowledge of their own players and teams so that they can come up with different strategies/tactics that bring them closer to the wins. Besides coaches, soccer analysts also make use of the data to predict results in the future as well as evaluate new talents emerging from the scene.
This is where Machine Learning techniques can become useful. Machine Learning is one of the intelligent methodologies that have shown promising results in the domains of classification and prediction. Therefore, Machine Learning can be used as the learning strategy to provide a sport prediction framework for experts. The end goal of this project is to produce a program/script that will automatically execute the complete procedure of results prediction.
For the paper, there should be several sections that explain the framework of using Machine Learning to predict results in the English Premier League. This will include basic knowledge about soccer and the League, data preparation and understanding, feature extraction (Scikit-learn, Machine Learning algorithms/models, Selections), training and testing. Other than those sections, I will also discuss the results of my program/script in my paper from predictions for the upcoming matches. Finally, I will talk about the difficulties/obstacles of the project, the conclusions, and a few possible directions for further development of this field/topic.
For the programming part, my plan is to just create a basic program/script that can carry out every step needed in the process of predicting the future results. This starts from writing code that retrieves the data from the source stated above and preprocesses the data in an usable format. As said above, during the course of the project and after training and testing have been performed, the best performing features and Machine Learning algorithms (provided and tested by using Scikit-learn) will be determined. I would then set those features and algorithms to be used in the program.
My current idea is that the program will first ask the user for the home team and the away team. Then, it will use the decided Machine Learning algorithms to predict the result between the specified teams and print that result out to the screen. However, the user can also choose to predict the 38 matches of one team for the whole season. The program will then write the predictions to an output file which can be accessed by the user.