The tangible work I have done this week on my project is finding the data sets and started setting up a work flow to automate the process. I created the connection between SQL and my simulation so that the simulation takes 2 arguments, a name of the database to store the timestamp data as well as the result data and the file path of a csv containing data. This sets up a database for the data with three tables, two of which contain results and one with the actual timestamp data. I have also started to take notes as I go so that I can refer to them when I start to write my paper.
Weekly Update
My adviser for my project for the remainder of the semester is Charlie Peck. We met on Tuesday to discuss my weekly plan. Over the next 4 weeks I will be building the database from the source code, which includes installation of many things. I will also be doing statistical analyses over some datasets that I have yet to find in order to understand the research/ database user side of my project. I have a clear plan of my tasks that lay ahead.
Week 11 update
I have decided to go with the Database Akumuli. My reasoning behind this is both because of the index structure it uses along with it’s documentation. This data base clearly states in the README that it uses a combination of an LSM tree and B+ tree. These also happen to be trees that are very well researched and it is easy to become well informed about these trees. Other databases I’ve looked at, for example InfluxDB, while very well documented, uses it’s very own specific tree that was created specifically for this database. Therefore, it is more difficult to learn about it. Akumuli also has a page where it describes it’s index structure, https://akumuli.org/akumuli/2017/04/29/nbplustree/. Furthermore, I have cloned the repository and looked at the index source code. It is in C plus plus, a language I am not as familiar with, but I have used it once or twice. This code was well organized and well commented. For the index the programmers used a boost property tree. While I couldn’t find very much documentation about the property on boost, Akumuli is good at explaining the functionality of the tree. On Thursday I met with Xunfei, who gave me some helpful advice as to where to focus my efforts to be prepared for the proposal. One of these areas includes preliminary testing of the database. At this point I do not have that time at my disposal, seeing as I have just spent a lot of time and effort finding and choosing this database. This requires learning how to use the database and figuring out a testing environment along with finding datasets. Often ingesting datasets into a database can be very time consuming. However, the documentation of Akumuli does include some details and figures about it’s performance. I am in a good spot. I have a solid direction to go in. While there are still some things I need to do, I am only at the first draft stage of the proposal, so there is still time to do things and include them in my final proposal.
Week 10 update
I have been taking notes and learning about different tree indexing structures. I currently have 6 pages of notes and understand How R trees work and how X trees uses techniques from both R trees and B trees. I need to go further into detail and learn about R* tree and the X tree along with the LSW tree along with others. I have also started looking into which database to use for my project. I need a database that uses R or R* trees. I am doing this by downloading a database’s source from github and looking into the code using tools such as searches. I am starting by looking into the database InfluxDB. The documentation for this database seems adequate, so it is a good option.
week 9 update
I am still in the process of writing my literary reviews. I have completed one and the second I am currently outlining. I am learning a lot of new information with my new sources. Since I know which of the two topics I will be choosing, I am using the second literary review, which is on a related topic to learn more background on the general area that these two topics are in. The most challenging feat I am facing currently is accessing book sources through the ACM website.
Week 8 Update
This week I am writing my literary review for Indexing Time Series Databases. It is going well. Because this is an area that has been researched for over two decades there is a lot of background information I will be covering in this literary review. I have been focusing on Time Series Database Indexing as a whole rather than a specific database. Since I am feeling confident about going in this direction with my senior projects, my next steps after completing the literary review includes researching specific databases and looking at the road maps for these individual Time Series Databases to know at what development stage each database is at and if it fits the criteria for a database I am looking for.
Update Ideas
I am certain at this point that I will be going in a direction where I will be doing research on a database. My first thought was to explore how to store Tree Data Structures in Postgresql. Which led me to then think about indexes in relational databases and finding an open source relational database that is less mature than Postgresql and improving on the way they store their indexes. Firebird is one that popped up onto my radar. I am unfamiliar with it and still need to do background research. Later, it was pointed out to me that relational databases in general are very mature and don’t have as much room for improvement especially when compared to Time Series Databases. My two ideas are to improve the way indexes are used in Time Series Databases and in Relational Databases. My next steps include looking into what open source databases of these types exists and how they use their indexes.
Questions for Planning a Research Project
Python Module for Image Processing
• Is your proposed topic clearly a research activity? Is it consistent with the aims and purposes of research?
Yes. I plan to create something new and make some small breakthrough.
• How is your project different from, say, software development, essay writing, or data analysis?
It is different because, although those are necessary to complete components of my project, they are not the purpose of this project.
• In the context of your project, what are the area, topic, and research question? (How are these concepts distinct from each other?)
The topic and area of my research has to do with image processing. My research question is different from the topic because it is more narrow and specified. My research question is looking at image processing text found over an image/video. For example, reading the overlaying text that appears from a camera in a home video that usually says the date, instance.
• Is the project of appropriate scale, with challenges that are a match to your skills and interests? Is the question narrow enough to give you confidence that the project is achievable?
This project seems of appropriate scale, however I can always extend this project seeing as it is a python module and adding features to it would be conceivable. It seems appropriate scale seeing as it will be from scratch. The challenges of this project that match my skill set is using python, however, image processing itself is not which is an interested I am excited to learn more about. I am confident that this is achievable in a semester.
• Is the project distinct from other active projects in your research group? Is it clear that the anticipated outcomes are interesting enough to justify the work?
I am working individually, however in the 388 group this project is very distinct because I am the only person in the group who has proposed a python module and a project related to image processing.
• Is it clear what skills and contributions you bring to the project? What skills do you need to develop?
The skills I need to develop is knowledge in image processing. I have experience with OpenCV, an image processing library, but I do not know the mechanics of how image processing works.
• What resources are required and how will you obtain them?
I will most likely be needing resources related to AI and image processing, most likely specific chapters in books along with online resources.
• What are the likely obstacles to completion, or the greatest difficulties? Do you know how these will be addressed?
I anticipate this project to be made from scratch, it will take specific deadlines and time management to address the difficulties in completing this project.
Platform for viewing data
• Is your proposed topic clearly a research activity? Is it consistent with the aims and purposes of research?
Yes, I don’t simply plan to create something, but create a minor, realistic breakthrough for a senior undergrad.
• How is your project different from, say, software development, essay writing, or data analysis?
It is different because, while these are a components of this project, they are not the purpose or specific question of my research. They are tools to help me answer my research question. My project is about creating a platform that will store data in a binary tree and to see how storing large amounts of data in a binary tree structure would work.
• In the context of your project, what are the area, topic, and research question? (How are these concepts distinct from each other?)
The context is of my project has to do with storing data, specifically binary tree data. The area is in big data and storing it. The topic has to do with storing and displaying binary tree data and algorithmically how to do that. The broad problem that I am investigating is storing binary data, so my research question is that.
• Is the project of appropriate scale, with challenges that are a match to your skills and interests? Is the question narrow enough to give you confidence that the project is achievable?
The scale is appropriate, if anything it is too large. The challenges in this project that match my skills and interests are dealing with large amounts of data along with creating a visualization for it. Furthermore, there will be a big challenge in figuring out the algorithmic part of this project for ingesting large amounts of data in an appropriate time. Because most of the components of this project have already been done in other tools, figuring out how to do them should be within my reach.
• Is the project distinct from other active projects in your research group? Is it clear that the anticipated outcomes are interesting enough to justify the work?
I am working individually on this research with one advisor, but in comparison to the wider group of 388, there are not projects from last week’s presentations that relate to this topic, so it is safe to say that this project is distinct from other projects in the research group. My anticipated outcome is to have a platform/database for ingesting and displaying binary tree data, which I personally find interesting and I believe justifies an entire semester’s work of work.
• Is it clear what skills and contributions you bring to the project? What skills do you need to develop?
Yes, to the project I bring some knowledge about big data, binary trees, some D3 and languages to build the project. Skills I need to develop are knowledge about ingesting large amounts of data and algorithms for storing data into binary trees along with some other complex algorithms and skills to create such a platform.
• What resources are required and how will you obtain them?
Many resources are required, some having to do with data structures, other with web development and others with moving large amounts of data. I plan to obtain them primarily form the internet and picking apart other such tools.
• What are the likely obstacles to completion, or the greatest difficulties? Do you know how these will be addressed?
There are many components of this project that will be difficult to push through, using time management and helpful pointers from my adviser. Furthermore, since this project entails creating a platform (probably web) from scratch, I will probably need to specify specific deadlines for myself to keep progress steady.
Computer Echolocation
• Is your proposed topic clearly a research activity? Is it consistent with the aims and purposes of research?
Yes, although I will be creating physical, it will be productive and novel.
• How is your project different from, say, software development, essay writing, or data analysis?
This project deals specifically with the intersection of software and hardware. So, it also has a component of hardware in addition to these specific things. Additionally, although hardware, software development, essay writing and data analysis are involved, my project is about exploring the intersection of hardware and software in the are of computer echolocation.
• In the context of your project, what are the area, topic, and research question? (How are these concepts distinct from each other?)
The topic is computer echolocation, however my research question is much more narrow and specific because it deals with the intersection of software and hardware in the area of computer echolocation.
• Is the project of appropriate scale, with challenges that are a match to your skills and interests? Is the question narrow enough to give you confidence that the project is achievable?
My project is both appropriate scale for the semester and is not too ambitious that it would take longer than a semester to create. This project matches my interest in echolocation and matches my skills in software development.
• Is the project distinct from other active projects in your research group? Is it clear that the anticipated outcomes are interesting enough to justify the work?
Although this project is individual and will not be done by an entire group, the overall group of 388 has thus far not mentioned any project having to do with echolocation. The anticipated outcome is to have a robot that is blind but uses microphones to understand it’s surroundings, I personally find this very interesting and am confident that the work is worthwhile given the anticipated outcome.
• Is it clear what skills and contributions you bring to the project? What skills do you need to develop?
Yes, I will be needing to develop knowledge and experience with hardware. The overall contributions involve hardware, software and an understanding of echolocation.
• What resources are required and how will you obtain them?
Most of the resources needed are available online. I will be needing resources that explain the varying hardware that exists, what they do and how to use them.
• What are the likely obstacles to completion, or the greatest difficulties? Do you know how these will be addressed?
The obstacles that are in my way for this project are obtaining hardware. I will address this difficulty with my adviser.
Very first draft of thesis ideas
Computer Echolocation
For my project, I would like to do something where I can use both hardware and software. An idea I am interested in that contains the intersection between hardware and software would be creating a robot that uses echolocation to find the walls that define its boundaries. For now, I am thinking of using an EV3 robot and sonar. However, this doesn’t seem like it’ll be enough for a final project, so I am considering using other hardware tools for the robot besides the EV3 robot. I’m still narrowing that part down and need to do a little more research into hardware tools.
Platform for viewing data
I am interested in data, an idea relating to this is be to create my own platform for ingesting and displaying data. Having experimented with D3 recently, I think that would be a tool I would like to use for a large portion of project. During the summer, I used a platform that stored data in a graph form and would display said data, I would like to make something similar to this but with a different data structure for the data. I will be doing research to consider what direction to go with this.
Python Module for Image Processing
My final and least developed idea is to make a python module for image processing. I would make this from scratch, it would be dependant on other modules, but not other image processing modules. Creating a python module from scratch and image processing are two things that