Using SwiftRiver to Find the Needle in the Social Media Haystack

Ushahidi
Dec 9, 2014

Guest Post By Dan King @viewpointpro

A True Story

Thirty four years ago, on a spring day in 1980 in the Puget Sound region of the Pacific Northwest corner of the United States, my younger brother Patrick, age 7 at the time, came running inside the house exclaiming “Mount St. Helens has erupted! Look up in the sky! You can see the volcano ash. Come outside and look up in the sky. You can see the ash cloud from Mount St Helens!”

boy

At this extraordinary news Patrick was scolded for telling lies and made to listen to the story of the “Boy Who Cried Wolf”. All the while, Mount St. Helens had erupted and the ash cloud could in fact be seen from our vantage point near the Puget Sound.

If my brother Patrick had today’s technology in his hands in 1980 he would have tweeted the news. Our family, also with today’s technology, could have seen the #MtStHelens trending hashtag and Patrick’s claim would have been corroborated.

phone

Wolf!

Not that there are those who don’t cry wolf. There are those who do. Last month a disgruntled gamer tweeted a bomb threat of the flight of Sony Playstation executive with whom the gamer had a bone to pick about a recent change the in platform. The tweet was taken seriously, the plane diverted, landing the gamer duly in hot water.

tweet1

http://motherboard.vice.com/blog/hackers-and-trolls-target-sony-playstation-divert-plane-with-fake-bomb-threat-

Likewise, in the aftermath of the 2013 Boston Marathon bombings rumors and misinformation were reported as fact. A Boston.com article summarizes many of these false rumors, highlighting that not all tweets can be taken as the gospel truth.

http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/ZOYLJpEydYgJ8UYNUT674H/story.html

Yeah, but still.

The subduction zone moving beneath the Puget Sound region makes the area due for anther big earthquake. A really big one. For the last 3 millennia there has been a catastrophic quake in the region every 300-600 years. The last one was in 1700. Let’s do the math. Not only are we due for a big one, apparently we are unprepared. When the big one hits, either tomorrow or in the year 2300, it will take down critical infrastructure and cause widespread damage.

theevidencefeature

http://en.wikipedia.org/wiki/Cascadia\_subduction\_zone

It’s Gonna Be Bad

And it’s not just earthquakes and volcanic eruptions we have to worry about. A whole host of disasters could strike at any minute. Remember the asteroid that hit Russia in 2012? How about the Ebola virus currently attacking parts of West Africa? Wildfires, plagues, dirty bombs and nuclear disasters all have the potential to cause widespread chaos and damage.

gonna be

We don’t know which of these disasters will strike next.

We don’t know when.

But we know it will be bad.

And we know people are going to tweet about it.

tweetbird augmented

Enter SwiftRiver and FirstToSee

To harness the power of social media a group of agencies in the Puget Sound region developed FirstToSee to assist in the aftermath of a disaster by providing augmented situation awareness available from social media. The government of Pierce County and the Pacific Northwest Economic Region teamed to lead a group of federal, state, local, tribal and private organizations to provide a tool for capturing, filtering and reporting such trends in social media by using the open source platform SwiftRiver from Ushahidi.

pugetsound

http://www.firsttosee.org

http://www.ushahidi.com/product/swiftriver/

SwiftRiver uses a water paradigm to describe the various aspects of social media.

A “droplet” is a single social media post, such as a Twitter tweet.

A “river” is a group of droplets matching a particular set of search terms or hash tags.

A “bucket” is a subset of a river that has been further filtered down with additional keyword searches.

tweets2

In the aftermath of an incident a “river” honed to capture tweets on trending hashtags gathers thousands of droplets per hour. After the Oklahoma City tornado in 2013 we created a river and gathered over 25,000 droplets per hour immediately after the event.

Such volumes are too large for manual human review, especially for organizations short on staff or otherwise spread too thin. By filtering the voluminous river with an additional keyword such as “injured” or “missing”, a manageable subset of tweets is diverted into a “bucket”. Such buckets tend to be manageable size: a small team or an individual can manually review and highlight important content.

Oso Landslide #530Slide, #OsoSlide

In March 2014 a massive landslide in a rural area of Snohomish County, Washington State killed 43 people. In the immediate aftermath of the disaster a search and recuse effort was underway. FirstToSee was used to capture tweets about individuals reported missing by loved ones tweeting on Twitter, including the photos and details of when and where the person was last seen.

To accomplish this we first set up a river on the trending hash tags #530Slide and #OsoSlide. FirstToSee uses the Twitter API to provide a list of currently trending hashtags to assist in the selection process when setting up a river. The River captured over 20,000 droplets in the first day.

Next we created a bucket by filtering the river on the keyword ‘missing’. This generated a subset of several hundred droplets. This subset was small enough for a staff of two to review in under an hour. FirstToSee harnesses the SwiftRiver API to present the user with a custom interface. This custom UI includes the ability to flag a given droplet in a bucket as ‘important’.

tweet3

Amidst the dozens of retweets about news reports of tallies of missing, hotlines set up for to report missing persons and prayers for those involved we captured two dozen tweets about specific individuals missing. These were tweets posted by loved ones and contained information about where and when this person was last scene, what they were wearing and in some cases a photograph.

Below are some of the actual tweets. This group of tweets was exported into a PDF report and shared with incident commanders leading the search and recuse effort on the ground.

tweet4 tweet5 tweet6

https://twitter.com/TomYazwinski/status/449575469411610625

https://twitter.com/realtimwilliams/status/448202651549716480

https://twitter.com/marialaganga/status/448195693572661248

Millions of Tweets

In addition to tracking tweets form the Oso landslide, FirstToSee has been gathering tweets from other trending hashtags ranging from festivals and other large gathering such as the Seattle Seahawks 2014 Super Bowl victory parade. In the vast majority of cases the droplets gathered are never needed but should there have been a disaster during any of these events the social media was there to assist in getting a better picture and provide clues on how the pieces of the puzzle fit together.

droplets

In the past year FirstToSee has gathered over 4 million tweets. In addition to creating a custom user interface using the SwiftRiver API, we are harnessing open source data visualization tools such as D3JS (http://d3js.org/) to provide different ways of viewing the data. The simple D3JS graph above shows total ‘droplets’ (i.e. tweets) per month.

One of the challenges of the project is the extensive technology stack used in the development process. The images below show some of the tools required to run the FirstToSee and SwiftRiver programs.

puzzle

If any of the pieces is missing the picture is incomplete, and in some cases not visible at all. As such a deployment requires at the very least a “jack of all trades” who understand the basics of all these technologies.

help wanted super

Super hero Emmanuel Kala, many hands make light work

But fortunately the SwiftRiver community includes developer Emmanuel Kala (https://twitter.com/bytebandit), technical guru and super hero extraordinaire. And because the platform is open source the community can be self-supporting through the various channels such as GitHub, Google Groups and mailing lists, and the regular community calls hosted by Ushahidi.

manyhands

The development team of FirstToSee is currently working on additions to SwiftRiver, which will capture additional social media sources such as Vine videos, Instagram photos and Facebook posts. These enhancements will be open source and will continue to strengthen through participation from the greater SwiftRiver community.

getting involved

https://wiki.ushahidi.com/display/WIKI/SwiftRiver+-+Getting+Involved

The needle in the haystack

Despite the challenges of a complicated framework and large volume of data with a low signal-to-noise ratio, there is valuable information to be captured from social media using the right tools. The open source community has such a tool in the form of the SwiftRiver platform. Join us in this movement and get involved with SwiftRiver!

needle

http://www.hagencartoons.com/cartoon656.gif

Dan presented FirstToSee and SwiftRiver at the annual international Free and Open Source Software for Geospatial (FOSS4G) conference in Portland, Oregon on September 11th, 2014. Slides from that presentation can be found at http://www.viewpoint.pro/foss4g

A video of the presentation can be found at https://2014.foss4g.org/live/

10th

Dan can be found on Twitter at @viewpointpro (https://twitter.com/viewpointpro) and on the web at www.viewpoint.pro.