Chapel IO module

by Ben Liebersohn on 2016-10-27 with No Comments

Currently working on tagging the individual data fields in each message entry, and saving the newly tagged tweets to a new directory.

current approach: GeoBurst method

by Ben Liebersohn on 2016-10-14 with No Comments

The GeoBurst algorithm detects local news events by looking for spatiotemporal ‘bursts’ of activity. This cluster analysis uses methods which look at geo-tag clusters of phrases.

Phrase network analysis has been able to historically link user clouds, however the use of GPS in mobile devices has led many users of social media to indicate their wherabouts on a reliable basis. Clusters appear not only in the spatial proximity of phrases, but also in their temporal proximity. This is being compared to a recent history which is sampled from a ‘sliding frame’ of historic phrases.

Possible changes may emerge as I rework the sampling process, in order to account for larger historic contextualization from previous years of data, in order to compare seasonal events, such as famous weather systems or sports. In the case of my research, the events are sports (specifically Football). This is because sports are temporal events on Twitter which happen in a simultaneous manner in the USA, giving me lots of clusters to look at. Though politics would be a fun topic, it is not resolved well in my dataset which dates to 2013.

The pursuit of GeoBurst is eventually to work towards disaster relief, however the behaviour of humans may arguably not be directed to social media in some disasters. The objective being that existing cyberGIS infrastructure may benefit from social media and be used to inform disaster response decision making.

In the mean time, it’s time to get GeoBurst running and looking at the Twitter API.

Project Topic

by Ben Liebersohn on 2016-09-22 with No Comments

For my Senior Research, my topic will be a data mining project using data collected from Twitter. Twitter’s API offers 1% of a spatial bandwidth (in my case, the continental U.S.A.) for users to collect. This data has been collected for over 3 years, and represents well over one billion tweets. Of these, a significant percentage of tweets contains at least one hashtag, which is one kind of data I will be looking at. The other datatype I have an interest in is geo-tags, which are an optional GPS coordinate which users may choose to include. Using machine learning algorithms, I hope to identify regular hashtags, in order to classify different kinds of signals based on hashtag frequency. The purpose of this is to see if I can predict hashtag occurrence, or whether hashtags are too noisy to classify or group into reliable frequencies.

My goal is to then study the noise, and to give that noise a geo-spatial context in which to understand the events which contributed to that noise.

Here’s a simple example:
Given that the State of Indiana tests tornado sirens on the first Tuesday of each month, it is likely that hashtags similar to #tornado or #siren appear in greater numbers on the same days as tests. This is a regular signal which could be reduced to a variability of +- 6 hours. This signal can be ignored. However, should a tornado strike on a different day, the sirens will go off, and #tornado or #siren might appear on an irregular day. The siren creates a spatial event which only affects the region which hears it, which might distinguish it from the more regular signals.

At a larger scale, looking at the noisy hashtags might give insights into real time, less predictable events. This can help de-obfuscate growing stories or events in real time, allowing us to find the meaningful information which hides under layers of signals.

I will be doing this research with David Barbella (Dave). Dave and I will be working with resources hosted by NCSA, including the CyberGIS Supercomputer ROGER (an XSEDE resource, for others that are interested).

Research project ideas

by Ben Liebersohn on 2016-08-29 with No Comments

Possible research: Spatial computational resource allocation

Panel overview

by Ben Liebersohn on 2016-08-29 with 1 Comment

Panel: Future Directions of CyberGIS and Geospatial Data Science (Chair: Shaowen Wang)
Panelists: Budhendra Bhaduri, Mike Goodchild, Daniel S. Katz, Mansour Raad, Tapani Sarjakoski, and Judy —

Selected topics by Ben Liebersohn

Michael:

3D domains are limited, more GIS integration with 3D rendition and simulation be well received.
Support for different types of data, which is sometimes more proprietary or otherwise have limited longevity.
Can we do analysis of data which we need 3D representation in order to compute simulations with it. Not everything is just landscapes (possibly meaning >3 dimensions? -B).
Decision support systems need more types of data. We need the integration with the applications as well.
Real time data streams and distributed loads which serve local decisions on broader, better networked scales.

Judy:

Integration needs quantification of size, needs What do we envision as the problem, and the scope? What technology (hardware, network) is needed?
What does all this data mean? What do we do about it? This gets you closer to the science policy area.

Paul:

“As an outsider, when I see what’s going on in this community I ask: what unique problems is this community facing versus common problems? I presented networking and cloud stuff you may not have seen before. The application can drive the network and the compute resources. Flexible and scalable networks. Maybe both sides can help one another.”

Author: Ben Liebersohn