By: Oluwademilade Bolatimi
This research aims to use different machine learning models for comparison purposes to see if we can reach a high enough accuracy in detecting an individual’s ethnicity based on an image of them. Similar works around this topic, such as Owusu et al and Kremic et al, show a baseline good accuracy score should be around 95%. This research is being done because there has recently been significant progress in classifying different biometrics such as age and gender, but “the recognition of ethnicity has not received the same attention from the scientific community” (Greco). This is concerning because of the racial bias of facial recognition given to specific races more than others. An example of this is seen in Marcos et al. They “noticed important biases, thus leading to higher error rates when images were taken from black people” (Ferreira) and how there is a “risk of deploying computational software that might affect minority groups that are historically neglected” (Ferreira). If we are able to determine the ethnicity of a person we can make accommodations for the current biases against those minorities with the aim of improving the overall facial recognition software. The paper will first give details about the dataset used and how it will be manipulated for my research. It will then move on to the different models to be used. The idea of the various models suitable for this sort of classification comes from past research such as Chen et al. They use the k-nearest Neighbor algorithm, a Support Vector Machine (SVM), a two-layer neural network, and a convolutional neural network (CNN) for comparison. After this is done we will then use confusion matrix diagrams to see how accurately the models performed and choose our best model with the highest metrics as measurement.
This paper aims to test the effectiveness of applying different machine learning models to accurately detect a person’s ethnicity based on their facial image. This is important due to the lack of research that has been done on that specific bio metric with other biometrics such as age and gender being given greater attention. There will be a performance comparison of a support vector machine, a convolutional neural network in the form of ResNet50, and a logistic regression neural network. The model with the highest scores of accuracy, precision, f1, and recall overall will be selected to be used to test facial images not involved in the building of the model to see the effectiveness of the model.
Link to project proposal: