Daily posts (from Oct. 26 – Nov. 2)

with No Comments

Oct. 26:

Installed and configured tensorflow on personal desktop computer. Ran into a few issues with enabling tensorflow to use GPU, but managed to get it up and running with some debugging.

Oct. 27:

Analyzed and dissected the repository for tensorflow implementation of wavenet. Researched about the concepts and features of tensorflow in order to understand the basic implementation. Looked into the functionality of the audio library librosa that was used in the project.

Oct. 28:

Attempted to download the MAPS piano dataset, but failed to do so since the FTP authorization details were incomplete. Contacted the sysadmins and authors, but they have responded that the dataset is unavailable and it hasn’t been updated in a while.

Oct. 29:

Gave up on downloading the MAPS dataset since the process was complicated and they didn’t respond back to me. Researched about downloading piano MIDI files and writing a script to convert MIDI note files to WAV music files, but I would need to download individual files since there was no download option for downloading all of them.

Oct. 30:

Thought of making a scraping scripts to download the files, but opted for a different site to collect aggregate music. Managed to download a few gigs of piano music collection. Started looking into how to train a model on the custom dataset.

Oct. 31:

Trained and generated sample music with the model. Researched about exporting the trained model graph so that it can be used on the mobile.

Nov. 1:

Designed and broke down the architecture for the project and the parts needed for the Android application, which will generate music based on the model. Found an audio library that could be used for processing the audio files on Android. Generated illustrations for the presentation on project implementation.

Nov. 2:

Looked into the example and its analysis for tensorflow being used on Android. Started working on creating new project based on the example.


Proposal Idea

with No Comments

I would like to create a generic software pipeline based on the recent breakthroughs in compressing deep neural networks. The pipeline could be then used for software applications such as Tensorflow in order to compress neural networks so that they can be deployed in resource limited mobile devices. My initial idea for the pipeline is to use network trimming/pruning, quantization, weight sharing and huffman coding

Project Ideas

with No Comments

Update: my capstone project will be survey of the techniques and methodologies used in making deep learning models smaller and efficient so

that they can be run on the mobile platform. If time permits, I would also like to research about voice conversion using deep learning, especially

the recently published paper on the topic, Wavenet.

Advisor: David Barbella


UpdateAnnotated Bibliography


1. Deep learning on mobile

With the recent advances in deep learning and increase in the amount of data, we are now able to

create smarter applications with more accurate recognition engines. Most of the mobile applications

using deep learning only work online with the main processing being done in the cluster servers. But this

introduces unnecessarily delays and network bandwidth, with the additional disadvantage of not

working at all when the device is offline. Scaling down deep learning models into a mobile an interesting

area of research that could have impact on the future mobile applications. The power of the mobile

hardware will surely keep increasing, but there are quite a few software techniques we can use to

reduce the model size so that it can fit on an average mobile computing device.

More concretely, this research will analyze the current techniques used to reduce model size and

possibly offer possible future optimizations that can be done.

Current Techniques


Low precision arithmetic

Related Research

Compressing Deep Learning Models with Pruning, Quantization and Huffman Coding


2. Voice conversion/morphing

Voice conversion is about conversing voice input from a person to the target’s voice signature. It used to

be very hard to replicate and convert to another person’s voice due to the sheer complexity of the task –

accounting for different accents and specific individual quirks. But it has theoretically become possible

to achieve a relatively effective conversion using neural networks. The practical applications are ample,

ranging from entertainment, giving unique voices to the disabled. Some security systems that use voice

recognition could even become obsolete.

Related Research

Voice Morphing

High Quality Voice conversion using Deep Neural Networks


3. Shopping Experience Enriched by Machine Learning

Recommendation engines are already a popular machine learning application used in e-commerce. In this research, I would like to experiment and research about the further applications of machine learning to enrich the user’s shopping experience. For example, we could train it to understand the reviews and summarize it instead of having the user read over the most recent reviews that may or may not be related to what they’re looking for. Also, the user can specify a problem statement (e.g. “I want to buy a present for my dad”), then the system could suggest possible gifts based on some prior training dataset.