Herschel Darko,ideas about my 3 projects

by Herschel Darko on 2018-01-27 with No Comments

After speaking with Dave,I have realized my ideas are not focused enough and I need to inspire confidence.

I will narrow my ideas specifically to:

1.Using neural networks to improve G.O.A.P

2.Neural networks in Intelligent Character.

3.A examination of Case Based Reasoning A.I in Serious Games and RTS

Any project I decided to implement using these ideas will be built using the Unity Engine,As I have experience and familiarity with Unity.

Will find time to discuss with Dave to get his input.

1/27 Priscilla Coronado project ideas

by Priscilla Coronado on 2018-01-27 with No Comments

Initially I wanted to do something more towards HCI. These were the following ideas that I had:

(1) Creating a chatbot using NLP where a user would talk to the chatbot and see if there could be some sort of connection made between user and chatbot.

(2) Studying how websites (i.e. typography, colours, layout, etc) affect users emotions and decisions.

(3) Creating a robot that could study dirt temperature and feed it to a database for farmers to use.

However in the end I have to change scope and lean outside of HCI. I will continue to think of new ideas as the days go by.

Adhish Adhikari Senior Project Proposal

by Adhish Adhikari on 2018-01-24 with No Comments

Background

As a student I have realized that being able properly understand my personal academic data is very
important. From grades and transcripts to majors and courses, I need to be able to clearly acquire the information about my academic standing. We can take for example the data for a student’s degree progression. While a college degree has it’s hierarchy (a degree can be broken down into majors, minors and general requirements, majors can further be broken down into first year courses, major requirements and electives and so on), the course progression each semester is also chronological. Thus, I believe that the idea of using both treemaps and timeline data visualization for such data seems to be an idea worth exploring.

The Idea

Studying at Earlham College, it seems only natural that I’ll work with Earlham College’s degree progression data. As of right now, Earlham College provides information on academic standing (including, majors, grades, courses taken, etc) using a web based platform called Degreeworks. While Degreeworks does have all the relevant information, it lacks presentation. Thus, many students can’t really see the big picture. The interface is very traditional and major chunks are divided into lists. It’s difficult to imagine where you are in the road map to graduation by just staring at the list. A student can see that they have taken x credits out of 120 credits for their graduation requirement and y credits out of z requirement (for majors) credit. However, there is no relativity. These two things seem very disjointed even though they are deeply connected.

My goal for the senior project is therefore, to create a visual version of the Degreeworks which I call Degreeworks V2. By providing this visual interface for Earlham Degreeworks, I want to help Earlham College students to effectively visualize their academic standing. Like I discussed earlier, I will be using treemaps and timelines in order to visualize the data. Like I said, just being able to know how manycredits I have taken or how many are left does not give me a good sense of where I am. Neither does looking at a list of electives. If we can visualize this data, I think it would
hugely benefit the students as well as institution.

Software

Like I discussed in my survey, D3 (short for Data driven documents) is one of the frameworks that provides processing capabilities . D3 is a domain-specific programming language that uses these algorithms to provide a library of classes and objects to create data visualizations including treemaps and timelines. D3 is a JavaScript based framework. However, if I go with the web-based option, I might need a little bit of php to connect to the database.

D3 does most of the rendering for us so the frontend work is limited to styling the visuals. I will useHTML for page content, CSS for aesthetics and SVG for vector graphics while a further JavaScript layer can be added for interactions.

1/24 – Weekly Progress Report

by Jeremy Swerdlow on 2018-01-24 with No Comments

For this week, I reviewed the source material on decision tree creation, attempting to further my understanding, and think about where to begin editing.

For next week, I will be setting up a git repo with the source code from sci-kit learn, and begin to work on the design of how to add more data.

Ben Folk-Sullivan Second Entry, CS388

by Benjamin Folk-Sullivan on 2018-01-23 with No Comments

(Made my first entry as part of my Portfolio by accident)

January 23rd, 2018

Spent time thinking and researching for a third idea, decided to go with an adaptation of one of my other ideas. ~2 hours 30 minutes.

Niraj Senior Research Paper & Software

by Niraj on 2018-01-23 with No Comments

Background

Ever since their introduction in the late 2000s, smartphones have been changing the landscape of people’s daily lives. As these devices get more popular, they are also getting more powerful. Today, we can hold in our hand a device that is more powerful than a room full of computers in the 1960s. Not only this, smartphones today come with sensors like accelerometers, gyroscope, magnetometer, GPS, ambient light sensor and so on. Human activity recognition utilizes the increased computational power of smartphones and their diverse array of sensors to collect raw data from a subset of the phone sensors, use the computational power of the phone to detect motion patterns and recognize the activity that the user is engaged in.

Fitness monitoring trackers like fitbit and android watch have also steadily gained popularity worldwide. This reflects the increasing demand for ways to monitor fitness. Activity recognition also presents us with a marvelous prospect when it comes to fitness monitoring. Using techniques employed in activity recognition, not only will users be able to track the number of step taken, the number of calories spent, the number of stairs climbed, the number of hours slept, their quality of sleep and distance traveled but smartphones can also be used to alert idle users to move around if it notices that they have been sitting for too long. Since no extra sensors are required and they are accessed through the smartphone, these applications are zero cost and easy to use. Therefore, my motivation behind this project is to provide an affordable means of monitoring fitness through an Android device.

Software

The final product of my project will be an Android (possibly cross-platform) application that comes with a trained classification model (possibly based on decision trees) capable of classifying activities into separate classes based on the current input data stream from sensors like accelerometer, magnetometer, gyroscope, etc. Furthermore, the application will also keep track of how many steps the user has taken, stairs climbed, hours slept, distance travelled and so on. I plan to build a suitable visualization within the application to allow the users to better understand the data.

Paper

My paper will contain a detailed description of the framework I used to build the application, as well as the techniques I used to extract features from training and test datasets. Also, in this paper, I will justify my choice of the machine learning algorithm used and the visualization techniques used. More importantly, I will evaluate the accuracy of the current model and suggest further ways to improve on it.

Vitalii Senior Research Paper and Software

by Vitalii Stadnyk on 2018-01-23 with No Comments

Background

Cities are complicated systems that consist of numerous interconnected citizens, businesses, various transportation modes, services, and utilities. Scientists expect around 70% of the population to be located in cities and surrounding areas by 2050. Hence, the demand for smart cities which would provide everyone with high-quality services and ensure a suitable environment for economic and social well-being has appeared. Smart cities are mostly driven by the Internet of Things (IoT). A major part of costs that city municipalities face come from data collection. IoT applications in smart cities are argued to have a significant positive impact on efficiency and cost saving.

Garbage is a direct source of spreading diseases in underdeveloped countries and it contributes to the overall estimation of how clean an environment is. Since garbage collection process has to be repeated continuously, some countries simply cannot afford it which leads to some portion of garbage not being picked up. Studies have shown that garbage directly influences life expectancy which makes it a very important issue to be considered by governments all over the world. That is where the question arises of how to get a good implementation of waste collection system at a price that government can afford. The aim of my project is to create an affordable waste monitoring system that takes advantage of IoT, historic data, and various routing techniques.

Software and Hardware

The final product of my project is a waste monitoring system that consists of Arduino board mounted directly in the garbage bin along with ultrasonic sensor and RF transmitter, Arduino board acting as a control center with RF receiver, and software that connects all of the specified components. An ultrasonic sensor is capable of detecting the distance to objects it points to. Therefore, it can be used to measure garbage can fill level. Arduino board within a trash bin will continuously receive this information and send it using RF transmitter directly to the control center at every specified period for further analysis. The control center will receive this data using RF receiver. Later on, the software will analyze received information and make adjustments to routing plans for near future if a given value falls within a warning or critical zone. Such a system will create routing and scheduling policies that reflect a real-time data collected at physical locations of garbage bins.

I acknowledge that testing a full-scale monitoring system will be impossible with an amount of time and hardware components available to me. Therefore, after coming up with a prototype of the monitoring system, I will run simulations based on available data from Dr. Charles Peck of waste collection services within Richmond, IN. This approach will let me evaluate the effectiveness of a proposed system without necessarily setting up a complete system and complete a result section of my paper.

Paper

This paper will provide good background information about IoT and it’s applications in smart city. The primary focus of the paper will be to evaluate the effectiveness of the created waste monitoring system, various routing policies etc. Additionally, this paper will suggest optimal techniques that minimize the garbage spread based on the available resources.

Zeyang Gao senior research project

by Zeyang Gao on 2018-01-17 with No Comments

Aim of the project

More and more large-scale Monte Carlo simulations are now run on parallel systems like networked workstations or clusters. In a parallel environment, the quality of a PRNG is even more important, to some extent because feasible sample sizes are easily 1015 times larger than on a sequential machine [2]. The main problem is the parallelization of the PRNG itself. Some generators with good quality that do not run on parallel sacrifice their efficiency. Those parallelized generators cannot ensure their quality. This problem

becomes even difficult for TRNGs, due to their nature of instability of quality and complex implementation for parallelization and them- selves. Therefore, I think it is important to resolve this problem with a stable RNG design runs in parallel that can generate random numbers on a large scale. If applicable, I will definitely go for a TRNG based design. However, given the difficulties of imple- mentation and limited hardware knowledge, I will not feel guilty if I end up with a PRNG design.

Software and paper

The software of this project will be divided into two parts. The first part will the interface of TRNGs or the source code of a PRNG design. The second part will be a set of statistical test to certificate the randomness of output stream. My paper will introduce my implementation and design in great detail, including how to bring ordinary RNGs into parallel and how to optimize them for large scale purpose. It will also include a result analysis part where I run statistical test against sample streams. The success of a design is dependent on whether the design is able to generate random numbers on large scale and how successful (i.e. how many tests can the output pass) the output stream is.

Jeremy Swerdlow Senior Research Paper & Software

by Jeremy Swerdlow on 2018-01-17 with No Comments

In machine learning, there has been a shift away from focusing on the creation complex algorithms to solve problems. Instead, a large focus has been on the application of simpler algorithms which learn from datasets. This shift has been made possible through the ever-increasing computational power of modern computers, and the massive amounts of data generated and gathered through the internet of things. However, even given the power and storage cheaply available for creating these models, it can still be quite time and space intensive to make a useful machine learning model. Datasets can vary in size, but can range from hundreds of thousands, millions, or even billions of unique data points. Due to the copious amount of data, training even a relative fast machine learning model can take hours or days. Because of how time and resource intensive that process is, companies often wait to recreate the model, even though they may lose some performance due to it. Some companies even wait to do it on a monthly basis, such as CoverMyMeds, who update their models every 28 days.

Part of why updating models is so intensive is that many do not allow data to be added after they are initially trained. This means each time you want to add data, you must create a new version from scratch, using the old set and the new points. Other types of models do allows this though, so it is possible to add it. The aim of my research focuses on learning how to add data dynamically to the model from neural networks, a machine learning algorithm based off of how the brain works with neurons, and apply similar logic to classification decision trees. The hypothesis of my research is that the time intensity of updating a decision tree can be decreased by adding data incrementally, with little loss to the tree’s effectiveness.

Paper
For the paper associated with this research, I will focus on the theory behind neural networks and their dynamic data addition, how decision trees are created, and how I will be adapting their training to mimic the behavior of neural networks when it comes to training. However, it may not be the case that decision trees can be changed to act as neural networks do, but can be edited in some other manner. To confirm or discredit my hypothesis, the resulting software will be tested on a series of datasets which range in size, type, and topic, and recorded in the paper.

Software
For the software component of this research, I will be reviewing, editing, and testing the sci-kit learn package in Python, which comes with well-tested and documented implementations of both decision trees and neural networks. These will be gathered into a Git repository, along with the relevant datasets, my edited version of the code, and the necessary files to run to test the results.

Shihao_Chen Senior paper&software

by Shihao Chen on 2018-01-17 with No Comments

No matter how developed technologies may become, humans need to consume food and convert into energy. Autotrophs, usually are plants, take inorganic compounds and convert into organics which then could be digested by animals. Growing a feeble seed into a mature plant has always been carefully manually processed which is time consuming for large quantities. A sole seed takes approximate 7 days to germination, and the germination rate is difficult to control. In addition, different species require distinct environment even an expert could not predict the germination rate. Fully automation could not only help to reduce the resources cost but increase the efficiency as well. First, it is much more precise on each environmental condition, thus making sensitive changes more quickly. Then, with precise adjustment, reducing chemical waste and energy lost which means the cost would be decreased. Among developing countries, starvation is still a daily problem that needs to be considered.

Paper: As for paper, I will be mainly concentrate on the development of software frame and the interaction between the agent and the environemnt. Since this project is heavily depend on using machine learning to make rational decisions, applying algorithms to analyze data is essential. In addition, precise sensors could collect data which then should be labbled and pass on the the agent.

Software: Althogh this is a project involving both software and hardware components, using certain algorithms (such as Bayesian networks) to process collected data and make rational decision is critical. Upon that, by analyzing data from the past, a future prediction could be made for productivity and cost.

Victor Zuniga Senior Project Idea

by Victor Zuniga on 2018-01-16 with No Comments

The technology of blockchain has reached the mainstream conscience as a result of the popularity of Bitcoin and other cryptocurrencies. Blockchain as implemented in Bitcoin is a distributed ledger system in which transaction history is maintained by a consensus of participant nodes. These participant nodes also compete in using proof of work to decide the block of transaction added to the chain. This system has the benefits of being totally decentralized and creating a nearly unalterable transaction history. However, the proof of work method is resource intensive and slow. Another cryptocurrency, Etherium uses a consortium variation on blockchain in which a subset of participant nodes are selected to determine the next transaction block added to the chain. It is this consortium blockchain which my project will be based on.

In the healthcare industry of today data privacy is a major concern. There are numerous examples available of healthcare providers failing to maintain the security of their patients’ data and being hacked. As Internet of Things devices become more commonplace they will play an ever grMy senior project will focus on using blockchain technology to connect Internet of Things devices, specifically in a healthcare context where patient data is of high security concern. The implementation will make use of the consortium blockchain concepteater role in healthcare and form the basis of smart healthcare. Of primary concern with this fact is being able to secure these devices of low computational power. My project will use the consortium blockchain previously mentioned to secure such devices and improve the security of the data being transfered.

Paper

My paper will delve into the technology of blockchain and specifically focus on consortium blockchain. It will explain what the Internet of Things is and how these devices pertain to healthcare. And it will bring the two together explaining how a blockchain will provide increased security to an IoT network and how it allows providers to remain HIPAA compliant.

Software/Hardware

The hardware component of my project will utilize affordable single board computers (SBCs) like CHIP to model healthcare IoT devices. These SBCs will be set up in a network similar to one that could feasibly be found in smart healthcare. Additionally, another SBC or a more powerful computer, if need be, will be used as a sort of aggregator. For my blockchain implementation I will use the University of Sydney’s Red Belly Blockchain with the Linux Foundation’s Hyperledger as a backup. My code will use the blockchain framework which is currently geared towards currency and tweak it for communication. My prediction is that this repurposing will present the bulk of the challenge and time commitment for the project.

Honglie Hu Senior Project

by Honglie Hu on 2018-01-16 with No Comments

The paper of my project will be developed based on the proposal. And the software components of my project are all listed in the proposal.

The proposal is attached below.

HonglieHu_Proposal_FirstDraft

Minh Vo Senior Project

by Minh Vo on 2018-01-16 with No Comments

With the non-stop improvements in technology, more and more fields are trying to apply computer science to achieve their goals in a more efficient and less time consuming way. Sports are no outsiders to this group of fields.

In sports, especially in soccer, technology has become an essential part. Soccer experts now make use of technology to evaluate a player’s or a team’s performances. Other than using their experience and their management abilities after many years being parts of the game, the soccer coaches also use statistics from data providers to improve their knowledge of their own players and teams so that they can come up with different strategies/tactics that bring them closer to the wins. Besides coaches, soccer analysts also make use of the data to predict results in the future as well as evaluate new talents emerging from the scene.

This is where Machine Learning techniques can become useful. Machine Learning is one of the intelligent methodologies that have shown promising results in the domains of classification and prediction. Therefore, Machine Learning can be used as the learning strategy to provide a sport prediction framework for experts. The end goal of this project is to produce a program/script that will automatically execute the complete procedure of results prediction.

Paper Plan:

For the paper, there should be several sections that explain the framework of using Machine Learning to predict results in the English Premier League. This will include basic knowledge about soccer and the League, data preparation and understanding, feature extraction (Scikit-learn, Machine Learning algorithms/models, Selections), training and testing. Other than those sections, I will also discuss the results of my program/script in my paper from predictions for the upcoming matches. Finally, I will talk about the difficulties/obstacles of the project, the conclusions, and a few possible directions for further development of this field/topic.

Software Plan:

For the programming part, my plan is to just create a basic program/script that can carry out every step needed in the process of predicting the future results. This starts from writing code that retrieves the data from the source stated above and preprocesses the data in an usable format. As said above, during the course of the project and after training and testing have been performed, the best performing features and Machine Learning algorithms (provided and tested by using Scikit-learn) will be determined. I would then set those features and algorithms to be used in the program.

My current idea is that the program will first ask the user for the home team and the away team. Then, it will use the decided Machine Learning algorithms to predict the result between the specified teams and print that result out to the screen. However, the user can also choose to predict the 38 matches of one team for the whole season. The program will then write the predictions to an output file which can be accessed by the user.

Jeremy Swerdlow Project Proposal

by Jeremy Swerdlow on 2018-01-16 with No Comments

My senior research is on adapting how decision trees in Python add data, to allow them to grow further after their initial creation. By allowing data to be added later, we can greatly reduce the time required to update a model in production. Please find attached a copy of the proposal for this research.

jjswerd14_388_project_proposal

Jon Senior Project

by Abdullojon Abdulloev on 2018-01-16 with No Comments

In the past couple of decades, there has been a significant growing amount of research on Natural Language Processing (NLP) which has largely been motivated by its enormous applications. Some of the well-known systems that use NLP techniques include Siri from Apple, IBM Watson and Wolfram|Alpha. There has also been much work done on building efficient NLIDBs that allow people without SQL backgrounds to query a SQL database in natural language. For instance, a research team at Salesforce developed WikiSQL with the aim of democratizing SQL so databases can be queried in natural language. WikiSQL is a large crowd-sourced dataset for developing natural language interfaces for relational databases that generates structured queries from natural language using Reinforcement Learning.

The purpose of my senior project is to solve the inequitable distribution of a crowd’s resources. The goal is to build a large Natural Language to Interfaces Database System for the Sharing and Gig economies. In other words, this means building a database of our current resources and services that can be queried and modified in the English natural language.

Given the scope of this project I will start with a small database for the Earlham student community. The application will connect students with certain needs with students who can fulfill those needs. I will start with simple queries and sentences related to the following contexts: Homework, Transportation, and sharable items.

Facilitating the connections between crowd members requires communication between the users and the database. The functionality of the application will be dependent on the constant input of information from users about their daily activities so that the algorithm will be better able to connect users. I realized that communicating with a chat-bot in natural language will be the best option to facilitate the constant input of information. I decided to use one the most widely used relational database management systems, PostgreSQL. Hence, the goal of this project is to democratize SQL so that users can query the SQL database in natural language (for example: “Who graduated from Earlham College this year?”) and modify the SQL database in natural language (for example: “I graduated from Earlham College this year”). There is huge potential in such systems where people can query a database system using natural language as it can create accessibility to a lot of people without SQL backgrounds.

Description of Paper
The paper will include an outline of and an introduction to Natural Language Processing (NLP). I will base my final paper significantly on my survey paper. Therefore it will contain sections on aspects of NLP, such as Natural Language Understanding, Knowledge Bases, Information Extraction and part-of-speech tagging. However, the primary focus of the paper will be on comparing the techniques discussed in my survey paper.

Description of Software
The application will consist of the following components:

User Interface – The application user interface will be a web application through which users can query and modify the SQL database. The frontend will be built using React and BootStrap CSS. The plan is to build a chat area where a user can communicate with the chatbox by typing messages. The results of the queries and sentences will appear beside the chat-box. Since this application is only for Earlham students at the moment, people will be able to login only with Earlham email addresses.
Server – The server will be built using Python’s Django framework since the main natural language processing component will be written in Python as well. It will essentially serve the results to the queries made on the user interface as well as update the database based on the information given.
Natural Language Processor – This is the main component of the application which will require the most amount of time and effort. Essentially the goal of this component is to identify characteristic patterns in natural language queries and sentences and convert them into SQL statements. The natural language processor will be implemented according to the algorithm described in the paper.

Project Idea 3

by Maniz Shrestha on 2018-01-12 with No Comments

A Data Science and Machine Learning Project to explore the stock data of a particular stock exchange. The exploration will be focused on observing the repetitive trend in stock markets and relating it to the business cycles. Some questions that can be asked in this project is as follows:

Is there any particular pattern that stocks market follow in between the end of December and start of January. This time period is said to be a speculative time for investors and trader. Particularly, it is observed that traders can benefit by buying in December and selling in January because of the closure of accounting books of firms.
Another interesting phenomenon would be to ask if there is a trend in between bull market and bear market. That does a bull market always have to be followed by a bear market and vice versa.

The main resource for this project would be “Python for Finance” Analyze Big Financial Data by O’Reilly Media. Some other resources are as follows:

https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505541d877
https://pythonprogramming.net/machine-learning-pattern-recognition-algorithmic-forex-stock-trading/
http://www.financial-hacker.com/build-better-strategies-part-4-machine-learning/

Project Idea 2

by Maniz Shrestha on 2018-01-12 with No Comments

A portfolio tracker that keep tracks of investments in stocks in a particular market. Keeping in mind the time limitation, it would be better to focus on small markets for this project. The web-based application will provide different portfolios to users to keep track of their investments and to easily look at their best and worst investment.

In this project, the major component of research would be figuring about how to structure the database design for such a system as well as enforcing multiple levels of database transactions logging. A further investigation might be in mirroring the data for backup. Along with this, the project can have a data analysis research segment for any market that might suffice the need of this project.

The research component of this project will also lie in using Model View Controller design pattern to develop such a system. This project essentially has two part, the software design, and the data analysis research. If this project is taken, serious amount of planning has to be done to ensure that all both the component of the project is completed,

Project Idea 1

by Maniz Shrestha on 2018-01-12 with No Comments

The project is about creating a software that can determine an optimal value for a company by looking at their balance sheets records in the past to predict future cash flows. Financial analysis methods such as DCF, DDM and FCE can be implemented in this approach (only one). This system would be automated using machine learning and data analysis.

The main research for this project is coming up with a model that can predict the future cash flows of a company by looking at past trends. Regression will be one of the core Machine Learning Techniques that will be applied in this research. Some resources for this project will be “Python for Finance” Analyze Big Financial Data by O’Reilly Media.

The valuation of the company is doing using what finance people call as the time value of money adjustment. Basically, what this means is that getting $100 today is better than getting in tomorrow or anytime in the future. Thus, all future cash flows that the company generates needs to be discounted at today’s value. In order to do this, we need to figure out the discount rate. There are different approaches we can take for this. For instance, we can use the interest rate provided by the Federal Reserve or we can make our own that can reflect the real financial scenario better. The Capital Asset Pricing Model can be used in this scenario but there are things such are beta and the free interest rate that needs to be estimated. This estimation can be the second part of the research.

Month: January 2018