Portland01232010

From CrisisCommons
Revision as of 22:35, 1 June 2011 by Filbertkm (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Crisiscamp-pdx.gif

About the event

How to Participate:

The first CrisisCampPDX was held on January 23, 2010. Notes about what we worked on can be found on the daily update page for that day.


Location:

NedSpace Old Town
117 NW 5th
btw Couch and Davis, to the right of Backspace
Portland, OR 97209


Media: Follow us on twitter at @crisiscamppdx


hash tag #crisiscamppdx

contribute to our flicker photo pool crisiscamppdx4haiti http://www.flickr.com/groups/1345168@N23/


SPONSORS

OSU Open Source Lab
Nedspace
Linux Fund
Backspace
Portland Office of Emergency Management

SCHEDULE

Next camp TBD

Online resources

Contact organizer: crisiscampPDX@gmail.com
Join mailing list: http://groups.google.com/group/crisiscamppdx
Flickr Group: http://www.flickr.com/groups/1345168@N23/
Follow local efforts on: @crisiscampPDX



Volunteer management people

  • Paige Saez skype paigesaez email paige.destroy@gmail.com phone 971-227-4384 twitter @paigesaez
  • Laura Schultz misslauraschultz@gmail.com 323-304-4762
  • Chris Blow skype cgblow email cgblow@gmail.com 415-309-7900

Projects

EMERGENCY MANAGEMENT AND THE CRISIS WIKI:

RESEARCH SKILLS, WIKI SKILLS (easy with training)

Major disasters like the earthquake in Haiti are a key time to use heightened awareness to prepare for the next big disaster. During last week's crisis camp, hoards of volunteers added enough information for Haiti into the wiki, and now we'd like to focus on creating a resource directory focusing on our community, including municipal, county and state resources. If we have more than enough team members to do this, we could split up the group and have the groups “adopt a city” – ie, begin assembling disaster-related resources for a community in a disaster-prone area. You can use the previous wiki project, HurricaneWiki, as a model:

http://www.hurricanewiki.org/wiki/Main_Page

Mark Chubb, the Operations Manager from the Portland Office of Emergency Management will be leading this project.

MarkDilley interested in helping with the wiki part


Haiti Hospital Capacity Finder

Members of the Haitian community have requested an application or tool where there is real time data regarding capacities of local hospitals. A group of top developers using open platforms will be integrating the best available open source data with SOUTHCOM’s All Partners Access Network (APAN) system, to ensure the best visibility to all key stakeholders.

http://crisiscommons.org/HospitalCapacity

Reid Beels will be leading the Portland effort on this project.


Mind Mapping

Here's a link to some open source Mind Mapping software:

http://borasky-research.net/open-source-mind-mapping-and-dialogue-mapping-tools/

http://compendium.open.ac.uk/institute/ Compendium (Dialog Mapping)

http://freemind.sourceforge.net/wiki/index.php/Main_Page FreeMind

http://freehackers.org/%7Etnagy/kdissert.html Semantik

http://www.insilmaril.de/vym/ View Your Mind

Crisis FIlter

Collaboration between: CrisisFilter, Swift River, Ushahidi, Tweak the Tweet, Boston Crisis Camp, LA Crisis Camp, Portland Crisis Camp

This project seeks to comb through more than 75k tweets collected over 8 days by http://haiti.ushahidi.com.

NOTE: We are attempting to build a system that will allow us to invite many more crowdsourced participants, we will update the Simple Things page when we are ready.

This track is for people who are familiar SQL, Twitter parsing, Rails and PHP. The proposed plan is to get a copy of the production database from http://haiti.ushahidi.com and then we will have to quickly prepare an environment for tagging, likely using the CrisisFilter codebase with pieces picked from Swift River.

11:38 UPDATE we have a bunch of coders who are familiar with managing news and Drupal, so we might work with that -- for now we are setting up the machine and getting a plan together still.

Day after UPDATE from Erik Ostrom:

  • At least four new people contributed code: Alain Bloch, John Labovitz, and David Cato in Portland; and Mike Travers here in Silicon Valley. Let me know if I missed anyone.
    • John added code to use geo info from YQL.
    • David added code to (I think) geocode tweets based on their content.
    • Alain added duplicate detection, a simple tweet search, authentication (about which he sent, or will send email for discussion), and maybe more!
    • Mike added tracking of downvotes to go with our implicitly up votes. There's also a field for views, which I don't think we're tracking.
  • It's not explicit in the code, but we seem to have settled on ushahidi as an output channel (maybe THE output channel) for CrisisFilter.
  • We didn't implement a hotornot UI or a 'mark as handled' form, my two goals at the start of the day.
  • I wrote some tests to help us avoid breaking stuff.

So what's next? Our goal is to have something that goes "live", right? What needs to be done before that can happen?


Relevant repositories:

CrisisFilter Rails repo: http://github.com/eostrom/crisisfilter
Swift River Rails repo: http://github.com/ajturner/swiftriver
Ushahidi repo: http://github.com/ushahidi/Ushahidi_Web
Ushahidi edge: http://github.com/ushahidi/Ushahidi_Haiti
Managing News http://managingnews.com/
http://github.com/unthinkingly/haiti.ushahidi.com-twitter-export


Analysis of existing Twitter data - Ed Borasky

1. No geo data :-(

John Labovitz: fixed reports model to pull and deliver latlong if there and is adding "location" column

2. File is with UNIX line endings - need a DOS line ending version for Windows folks

3. One 0x00 character had to be removed with Vim - choked PostgreSQL - deleted

4. Uploaded to http://github.com/znmeb/Twitter-API-Perl-Utilities

5. Writing queries to get list of most recent and most frequent tweeters - can be used to grab tweets, geotag tweeters, filter out noisy or old tweeters

6. Uploading query results


<><><>

Some of the team on this project is looking through the raw twitter data to find patterns to eliminate much of the data that is not relevant to the project to elimnate the need for human review. Some of the ideas so far:

  • Eliminate ReTweets
    • Filtering out RT (except RT was part of a word) removed 55.9% of the tweets. Reviewing it showed the only "false positives" were cases of "please RT" and none of those appeared to be significant tweets.
    • Filtering out via (except where via was part of a word) removed 4.8% of the tweets. Reviewing it showed a significant portion where via was used in a context other than signifying a retweet so the possibility for false positives is high.
  • Eliminate duplicate tweets
  • Eliminating tweets that were identical posts from the same user eliminated 4.7% of the tweets. This seems completely safe as the only thing lost is the time stamps of the duplicates.
  • Eliminating duplicate tweets (same text) from different users eliminated 21.0% of the tweets, but has the possibility of two different people requesting help using the same words and so is not suggested. In the set of 85k records, this was observed multiple time, although only in situations where two different people were reporting the same issue using the same words. It was not observed as two people reporting two different situations using the same words.

All %'s above are independent. Many messages fall into multiple categories above, so adding up these %'s will not give a total % of messages that could be eliminated quickly using simple rules.

the following PERL code could be used to accomplish the above tasks (I'm not sure how to add a file to a wiki if that's possible to eliminate the full text of this code from the body of the wiki, someone feel free to clean up my work here and place things in more appropriate places...):

Ed Borasky, 24 January 2010: As far as I know, all of the CrisisCamp projects involving software like this are using Github (http://github.com). If you're not a "git" user but have used other revision control systems, there's a minor learning curve. If you've never used a version control system, "git" is fairly easy to learn. I'd be happy to host this in my Github account, http://github.com/znmeb/Twitter-API-Perl-Utilities/, and add anyone else on Github as a committer. And yes, you *can* run "git" on a Windows machine! ;-)

#!/usr/bin/perl
#
# scantwit - gather information about a twitter data set.
#
# Written by Brian Martin and Jim Dorris, 1/23/10, Portland Crisis Camp.
#
# Input:
#   One or more CSV files of Twitter data with the following fields:
#	Unique key (not used)
#	Twitter ID
#	Twitter UserID
#	Message Text
#	Message Date
#
# Output:
#   nonretweet.csv - Messages with duplicates and retweets removed.
#   retweet.csv - Retweet messages with duplicates removed
#
# Duplicate messages are messages from the same Twitter ID with the same text.
# In case of duplicates, the first message is kept and later ones are dropped.
#	
# Bugs:
#   If multiple input files are provided, the output files only cover the last
#   input file.  We only had one input file, so this wasn't a problem for us.
#
use strict;
use warnings;
use Text::CSV;

# Verify the command line.
die "No files specified to scan.\n" unless (@ARGV > 0);

my $csv = Text::CSV->new ({binary => 1}) 
	or die "Cannot use CSV: " . Text::CSV->error_diag();

# Declare our file handles.
my $FH;
my $NONRT;
my $RT;

# Loop through the input files.
foreach  my $File (@ARGV) {
	if (!open($FH,"<:encoding(utf8)", $File)) {
		warn "Unable to open $File: $!\n";
	}
	else {
		# Initialize counters and hash arrays.
		my $Records = 0;
		my $RTCount = 0;
		my %TwitIDs;
		my %HashTags;
		my %DupCheck;
		my $DupCount;
		# Open the output files.
		open ($NONRT,'>:encoding(utf8)','nonretweet.csv');
		open ($RT,'>:encoding(utf8)','retweet.csv');
		# Loop through the input data.
		while (my $row = $csv->getline($FH)) {
			my($Key,$TwitID,$From,$Message,$MDate) = @$row;
			my $MIdent = "$TwitID/$Message";
			$Records++;
			# Check for duplicates.
			if ($DupCheck{$MIdent}) {
				# It's a duplicate.
				$DupCount++;
				next;
			}
			else {
				# It's new.
				$DupCheck{$MIdent}=1
			}
			# Count unique twitter IDs.
			if (defined($TwitIDs{$TwitID})) {
				$TwitIDs{$TwitID}++;
			}
			else {
				$TwitIDs{$TwitID}=1;
			}
			# Identify retweets.  A retweet is "RT" not adjacent to
			# upper case alpha, or "via" not adjacent to lower alpha
			if (
				$Message =~ /[^A-Z]RT[^A-Z]|^RT[^A-Z]|[^A-Z]RT$ {
				|[^a-z]via[^a-z]|^via[^a-z]|[^a-z]via$/
			   )
			{
				# It's a retweet.
				print $RT "$Key,$TwitID,$From,$Message,$MDate\n";
				$RTCount++;
			}
			else {
				# It's not a retweet.
				print $NONRT "$Key,$TwitID,$From,$Message,$MDate\n";
			}
			# Count the unique hash tags.
			foreach (split ('[^#A-Za-z]',$Message)) {
				if (/^#[A-Za-z]/) {
					$HashTags{uc($_)}++;
				}
			}
		}
		# Generate a count of unique twitter IDs.
		my $TwitIDCount = scalar keys %TwitIDs;
		# Calculate the retweet percentage.
		my $RTPercent = int(0.5+$RTCount*100/$Records);
		# Print stats.
		print "$File:\n\t$Records records\n\t$TwitIDCount unique twitter IDS\n\t$RTCount RT/vias ($RTPercent%)\n\t$DupCount dups\n";
		# Print hash tags used.
		print "Popular (>50) Hash Tag report:\n";
		foreach (keys(%HashTags)) {
			printf "%-20.20s %d\n", $_,$HashTags{$_}
				if ($HashTags{$_} > 50);
		}
	}
	$csv->eof;
	close $FH;
	close $NONRT;
	close $RT;
}

exit(0);
  • Other ideas that have not been explored yet:
    • Rate Tweets for validity based on key words. This could best be done by getting word occurence counts from samples of messages that have already been reviewed and marked as relevant or not relevant.
    • Identify language of tweet?
    • Identify key words for "pre-classification" of tweets?

Other Statistical findings on hash tag occurences (only those with over 50 occurences in the 85k sample shown):

twitter-unix.csv:
	82014 records
	78371 unique twitter IDS
	42348 RT/vias (52%)
	3643 dups
Popular (>50) Hash Tag report:
#HAITI               80312
#HAITIQUAKE          3977
#HELPHAITI           2140
#TWIBBON             1637
#EARTHQUAKE          1511
#FB                  628
#HELP                592
#REDCROSS            570
#P                   512
#YELE                486
#FF                  474
#NEWS                460
#CNN                 436
#TCOT                399
#BRESMA              352
#UN                  340
#LOC                 324
#NEED                319
#IPHONE              286
#IDF                 283
#ISRAEL              248
#QUAKE               237
#MSF                 229
#CNNHELPHAITI        227
#EQHAITI             214
#PRAY                197
#IRANELECTION        196
#FOLLOWFRIDAY        191
#PHOTOGRAPHY         189
#INFO                185
#GIRO                183
#DONATE              179
#RELIEF              175
#OBAMA               170
#RETWEETTHISIF       164
#FIRSTAID            163
#AID                 162
#CONTACT             144
#TWITTER             142
#ERDBEBEN            136
#QUOTE               133
#GOOGLE              131
#UNICEF              129
#JACMEL              126
#CJP                 122
#RESCUEMEHAITI       122
#HAITIJP             121
#TERREMOTO           120
#CHARITY             119
#LATISM              116
#NOWPLAYING          115
#HAITICNN            113
#PHOTOS              111
#SHOUTOUT            111
#CCHAITI             100
#US                  98
#FAIL                93
#NUM                 92
#JGF                 90
#VENEZUELA           89
#ORPHANS             88
#RESCUE              87
#CUBA                86
#PRAYERS             82
#IHAVEADREAM         81
#LIVESTRONG          81
#DOCTORSWITHOUTBORDE 80
#FDP                 80
#USAF                80
#VIDEO               80
#EQ                  79
#HOAX                75
#LETSBEHONEST        75
#PHOTOJOURNALISM     75
#MEDIA               74
#WYCLEFWARRIORS      74
#TLOT                73
#AFGHAN              71
#PRAYER              71
#GREENSAFE           70
#PORTAUPRINCE        70
#USHELPSHAITI        70
#ETSY                69
#OFFERING            69
#SRC                 68
#GOLDENGLOBES        67
#IRAN                67
#OXFAM               66
#RESCUEHAITI         66
#HAITIEARTHQUAKE     64
#HAITIRELIEF         64
#MLK                 64
#RADIO               64
#HUMANRIGHTS         63
#PDX                 61
#PIRATEN             61
#PATROBERTSON        60
#SUPPORT             60
#USA                 60
#EMERGENCY           59
#CLINTON             58
#GOD                 58
#AFGHANISTAN         56
#GAZA                56
#NAME                56
#SPENDEN             56
#DISASTER            55
#HCR                 55
#DONATION            54
#OSM                 54
#HOPE                53
#JESUS               53
#FOLLOW              52
#JOB                 52
#TECHHAITI           52

<><><>


Eventual integration of Twitter Streaming API with CrisisCamp tools - Ed Borasky

Twitter Search is changing - see my blog post at http://borasky-research.net/2010/01/18/twitter-search-changes-what-do-they-mean-2/

Bottom line - if you have high-volume Twitter Search needs, you should be migrating to Streaming API

Sent an email to Twitter telling them what we're doing. Copied Chris B. and A. Hook

Streaming API page on Twitter: https://twitterapi.pbworks.com/Streaming-API-Documentation

If you are familiar with medical issues or are good at tracking down RSS feeds:

RSS FEED PROJECT

PORTLAND TEAM LEADER: Reid

http://wiki.crisiscommons.org/wiki/Haiti_RSS_Feed_Challenge

Wiki above has lots of contact info, specifics. PDX needs a lead to take on getting clarifications, start discussing task sharing/hand-off with other cities, etc.

The RSS project team turned into the "awesome translation and multi-lingual team" partway though the day.

  • Located and added RSS feeds to database in both French and English.
  • Coordinated with RSS teams in other cities to finalize version 2 of the RSS input form.
  • Translated RSS input form labels to French.
  • Worked on Sahana localization to French, and data input on 4636.ushahidi.com (focusing first on French txts, and later on Creole)

List of Sites Without RSS for possible scraping

If you are a geo hacker or GIS geek:

OpenStreetMap

PORTLAND TEAM LEADER: Rafael Gutierrez

Working with the geospatial projects for CrisisCampPDX does not require that you have a strong background in GIS or mapping but it helps. Outlined here are directions for the mapping process with OpenStreetMap. Also included are some of Portland's ongoing efforts and lessons learned to date.

The mapping platform that is currently being used and has one of the largest user base is OpenStreetMap (OSM). There are several Beginner guides here as well as basic editing videos here. There are other mapping applications such as Google MapMaker but that has not been used for this local effort.

Once you have familiarized yourself with OSM and the tutorials for basic editing, visit the OSM Haiti Mapping site. This site details all the current tasks that are still needed as well as other resources useful for mapping such as satellite imagery. A word of caution: this site contains an overwhelming amount of information and it is suggested to at least familiarize yourself with the Mapping Coordination section to get started. The Mapping Coordination section covers:

  • 1 General
  • 1.1 What needs to be mapped?
  • 1.2 How to map
  • 1.3 Data collection
  • 1.4 Coordination of tracing/mapping efforts
  • 1.5 Diff Reports
  • 1.6 OpenStreetMap-Emergencies
  • 1.7 Error Reports
  • 2 Best Practices
  • 3 Requests from Responders
  • 4 Who's Helping
  • 5 Places
  • 5.1 Administrative units and boundaries
  • 6 Features
  • 6.1 Lists of Tags
  • 6.2 Specific Tags
  • 6.3 JOSM presets
  • 6.4 Additional Ideas
  • 6.4.1 Fix errorlogs for better data

Pay careful attention to items in BOLD, particularly the Best Practices.


Notes about Getting Started

Again, if you have read how to edit in OSM and Potlatch then head back to the OSM site for details before you get started. If you feel comfortable enough with Potlatch to edit OSM, then make sure you check with the Task Ideas first to make sure certain tasks haven't already been addressed. One of the main tasks that people have contributed extensively to is the basemapping. The tool being used to track this progress is called the OSM Matrix which is a status map. It's not clear if this is the best tool at the moment since it is not regularly updated. OpenStreetEmergencies is a clone of OpenStreetBugs which is another tool that is being used and appears to be a better alternative for status updates. The issue of tracking thousands of users over such a vast area within a short time frame is unprecedented and will likely be addressed at some point. Some analyses are already underway.


Basemapping

When selecting areas for editing, there are two approaches to address the needs based on the status. One is to go to the OSM Matrix and check status of updated areas. The site is tricky to understand so be sure to read the Legend for a good understanding of the symbology and classifications. If you've found a suitable area to edit, you can launch Potlatch straight from the Tools Tab in the lower right-hand side and move on to editing. Always check to see if you are using the best imagery available. A great resource for viewing most of the available imagery and other maps is the Haiti Crisis Map from Telascience. The Haiti Crisis Map also has Permalinks for launching straight into Potlatch. There is a Youtube demo here.

When editing in Potlatch, it is best to use the Edit with Save option.

Other Tasks

Street Naming

Street naming conventions and how-tos are here: http://wiki.openstreetmap.org/wiki/WikiProject_Haiti/Street_names Use the 'Show Unnamed Streets' in Potlatch Set Options to 'Highlight unnamed roads’.

Hospital Locating - DONE.

I believe this was even completed before the start of Portland's Crisis Camp. Nice work everyone!

Tagging

When Tagging your data after edits, it's important that you follow protocols that have already been established, both by the OSM community and the Humanitarian effort. Also, it is very important to document your source data (IKONOS, GPS Garmin trace, Geo-Eye, etc.) with appropriate tags.


Resources

As you get mapping and need more resources, help, or guidance along the way, it's best to check the CrisisMappers Google Group or the IRC Chat Channel for OpenStreetMap #OSM Channel; this is a great resource for quick answers.

Imagery footprints are being updated at a very rapid pace. Many remote sensing companies are providing free and very high resolution for most of the areas affected by the earthquake in Haiti.



Portland's OSM Mapping Experience

The Portland CrisisCamp (#ccamppdx) mapping effort began in fits and starts and could have benefitted greatly from reading this more carefully. One very confounding issue was the the desire to find areas that needed mapping while also avoiding the duplication of effort.

If you speak French and want to manage a translation group

Languages and Translation:


OPEN TWEET FIND

Twitter data set on github: http://github.com/unthinkingly/haiti.ushahidi.com-twitter-export

Ushahidi source code: http://github.com/ushahidi/Ushahidi_Haiti

Ushahidi dev site: http://haiti1.osuosl.org - Up and functional using Ushahidi_Web, ping gchaix (IRC) or gchaix@osuosl.org for an account.

Managing News 7 dev site: http://haiti2.osuosl.org - site is up, ping gchaix (IRC) or gchaix@osuosl.org for an account. ON HOLD - focusing on Ruby/Rails-based filtering tool - CrisisFilter See: http://etherpad.com/rhok

For important, simple things anyone can do:

Please see the tasks on the "Important, Simple Things" Page: Simple Tasks Anyone Can Do

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox