I haven’t decide my topic yet, but I was reading papers related to my three ideas to gain a deeper understanding on these ideas.
My first idea is an AI tech for voice print recognition. It can be used for avoiding voice spoofing attacks on business, banks (like mobile phone customer service), etc. The main steps for voice recognition is: take vocal input -> identity-feature analysis -> deviating feature selection -> deviating feature comparison -> distance to reference pattern estimate -> check if voice match. The main algorithms for feature extractions are: GMM, JFA, GMM-SVM, etc. On the paper “Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech”, the authors experimented several algorithms and concluded that although JFA has a high inaccuracy but the converted samples with JFA sounds very mechanical so human can easily distinguish. The authors of paper “Voice command recognition system based on MFCC and VQ algorithms” discuss and examine two significant modules: MFCC and DTW. Their results were good. So I will consider use these two modules.
For my second idea which is creating an AI tool for safety driving, the key tech is 3D dynamic facial recognition. I learned that the most Facial recognition tech can be decided into two main parts: facial detection and facial recognition. I can use open sources like opencv and dlib to do facial detection. There are 3 factors i need to care about: detection rate, misdetection rate, false alarm rate. The authors of the paper “BP4D-Spontaneous: A high-resolution spontaneous 3D dynamic facial expression database” reported a newly developed spontaneous 3D dynamic facial expression database in their paper. I am not sure if I can or should use their new database. Although the paper “Real time facial expression recognition in video using support vector machines” primary discuss detecting emotion from facial expression, it provides some facial recognition tech info for me.
My third idea is creating a smart tool to grade algebra on handwritten homework. The APP takes a photo of the handwritten homework and using OCR tech to extract the texts and grade them. The main tech is just OCR. Although the two papers I read both talk about their own APP and OCR system, I can refer some technologies they used, like matrix matching, fuzzy logic for facial extraction.