Portland01232010
About the event
How to Participate:
The first CrisisCampPDX was held on January 23, 2010. Notes about what we worked on can be found on the daily update page for that day.
Location:
- NedSpace Old Town
- 117 NW 5th
- btw Couch and Davis, to the right of Backspace
- Portland, OR 97209
Media:
Follow us on twitter at @crisiscamppdx
hash tag #crisiscamppdx
contribute to our flicker photo pool crisiscamppdx4haiti http://www.flickr.com/groups/1345168@N23/
SPONSORS
- OSU Open Source Lab
- Nedspace
- Linux Fund
- Backspace
- Portland Office of Emergency Management
SCHEDULE
Next camp TBD
Online resources
- Contact organizer: crisiscampPDX@gmail.com
- Join mailing list: http://groups.google.com/group/crisiscamppdx
- Flickr Group: http://www.flickr.com/groups/1345168@N23/
- Follow local efforts on: @crisiscampPDX
Volunteer management people
- Paige Saez skype paigesaez email paige.destroy@gmail.com phone 971-227-4384 twitter @paigesaez
- Laura Schultz misslauraschultz@gmail.com 323-304-4762
- Chris Blow skype cgblow email cgblow@gmail.com 415-309-7900
Projects
EMERGENCY MANAGEMENT AND THE CRISIS WIKI:
RESEARCH SKILLS, WIKI SKILLS (easy with training)
Major disasters like the earthquake in Haiti are a key time to use heightened awareness to prepare for the next big disaster. During last week's crisis camp, hoards of volunteers added enough information for Haiti into the wiki, and now we'd like to focus on creating a resource directory focusing on our community, including municipal, county and state resources. If we have more than enough team members to do this, we could split up the group and have the groups “adopt a city” – ie, begin assembling disaster-related resources for a community in a disaster-prone area. You can use the previous wiki project, HurricaneWiki, as a model:
http://www.hurricanewiki.org/wiki/Main_Page
Mark Chubb, the Operations Manager from the Portland Office of Emergency Management will be leading this project.
- MarkDilley interested in helping with the wiki part
Haiti Hospital Capacity Finder
Members of the Haitian community have requested an application or tool where there is real time data regarding capacities of local hospitals. A group of top developers using open platforms will be integrating the best available open source data with SOUTHCOM’s All Partners Access Network (APAN) system, to ensure the best visibility to all key stakeholders.
http://crisiscommons.org/HospitalCapacity
Reid Beels will be leading the Portland effort on this project.
Mind Mapping
Here's a link to some open source Mind Mapping software:
http://borasky-research.net/open-source-mind-mapping-and-dialogue-mapping-tools/
http://compendium.open.ac.uk/institute/ Compendium (Dialog Mapping)
http://freemind.sourceforge.net/wiki/index.php/Main_Page FreeMind
http://freehackers.org/%7Etnagy/kdissert.html Semantik
http://www.insilmaril.de/vym/ View Your Mind
Crisis FIlter
Collaboration between: CrisisFilter, Swift River, Ushahidi, Tweak the Tweet, Boston Crisis Camp, LA Crisis Camp, Portland Crisis Camp
This project seeks to comb through more than 75k tweets collected over 8 days by http://haiti.ushahidi.com.
NOTE: We are attempting to build a system that will allow us to invite many more crowdsourced participants, we will update the Simple Things page when we are ready.
This track is for people who are familiar SQL, Twitter parsing, Rails and PHP. The proposed plan is to get a copy of the production database from http://haiti.ushahidi.com and then we will have to quickly prepare an environment for tagging, likely using the CrisisFilter codebase with pieces picked from Swift River.
11:38 UPDATE we have a bunch of coders who are familiar with managing news and Drupal, so we might work with that -- for now we are setting up the machine and getting a plan together still.
Day after UPDATE from Erik Ostrom:
- At least four new people contributed code: Alain Bloch, John Labovitz, and David Cato in Portland; and Mike Travers here in Silicon Valley. Let me know if I missed anyone.
- John added code to use geo info from YQL.
- David added code to (I think) geocode tweets based on their content.
- Alain added duplicate detection, a simple tweet search, authentication (about which he sent, or will send email for discussion), and maybe more!
- Mike added tracking of downvotes to go with our implicitly up votes. There's also a field for views, which I don't think we're tracking.
- It's not explicit in the code, but we seem to have settled on ushahidi as an output channel (maybe THE output channel) for CrisisFilter.
- We didn't implement a hotornot UI or a 'mark as handled' form, my two goals at the start of the day.
- I wrote some tests to help us avoid breaking stuff.
So what's next? Our goal is to have something that goes "live", right? What needs to be done before that can happen?
Relevant repositories:
- CrisisFilter Rails repo: http://github.com/eostrom/crisisfilter
- Swift River Rails repo: http://github.com/ajturner/swiftriver
- Ushahidi repo: http://github.com/ushahidi/Ushahidi_Web
- Ushahidi edge: http://github.com/ushahidi/Ushahidi_Haiti
- Managing News http://managingnews.com/
- http://github.com/unthinkingly/haiti.ushahidi.com-twitter-export
Analysis of existing Twitter data - Ed Borasky
1. No geo data :-(
John Labovitz: fixed reports model to pull and deliver latlong if there and is adding "location" column
2. File is with UNIX line endings - need a DOS line ending version for Windows folks
3. One 0x00 character had to be removed with Vim - choked PostgreSQL - deleted
4. Uploaded to http://github.com/znmeb/Twitter-API-Perl-Utilities
5. Writing queries to get list of most recent and most frequent tweeters - can be used to grab tweets, geotag tweeters, filter out noisy or old tweeters
6. Uploading query results
<><><>
Some of the team on this project is looking through the raw twitter data to find patterns to eliminate much of the data that is not relevant to the project to elimnate the need for human review. Some of the ideas so far:
- Eliminate ReTweets
- Filtering out RT (except RT was part of a word) removed 55.9% of the tweets. Reviewing it showed the only "false positives" were cases of "please RT" and none of those appeared to be significant tweets.
- Filtering out via (except where via was part of a word) removed 4.8% of the tweets. Reviewing it showed a significant portion where via was used in a context other than signifying a retweet so the possibility for false positives is high.
- Eliminate duplicate tweets
- Eliminating tweets that were identical posts from the same user eliminated 4.7% of the tweets. This seems completely safe as the only thing lost is the time stamps of the duplicates.
- Eliminating duplicate tweets (same text) from different users eliminated 21.0% of the tweets, but has the possibility of two different people requesting help using the same words and so is not suggested. In the set of 85k records, this was observed multiple time, although only in situations where two different people were reporting the same issue using the same words. It was not observed as two people reporting two different situations using the same words.
All %'s above are independent. Many messages fall into multiple categories above, so adding up these %'s will not give a total % of messages that could be eliminated quickly using simple rules.
the following PERL code could be used to accomplish the above tasks (I'm not sure how to add a file to a wiki if that's possible to eliminate the full text of this code from the body of the wiki, someone feel free to clean up my work here and place things in more appropriate places...):
Ed Borasky, 24 January 2010: As far as I know, all of the CrisisCamp projects involving software like this are using Github (http://github.com). If you're not a "git" user but have used other revision control systems, there's a minor learning curve. If you've never used a version control system, "git" is fairly easy to learn. I'd be happy to host this in my Github account, http://github.com/znmeb/Twitter-API-Perl-Utilities/, and add anyone else on Github as a committer. And yes, you *can* run "git" on a Windows machine! ;-)
#!/usr/bin/perl
#
# scantwit - gather information about a twitter data set.
#
# Written by Brian Martin and Jim Dorris, 1/23/10, Portland Crisis Camp.
#
# Input:
# One or more CSV files of Twitter data with the following fields:
# Unique key (not used)
# Twitter ID
# Twitter UserID
# Message Text
# Message Date
#
# Output:
# nonretweet.csv - Messages with duplicates and retweets removed.
# retweet.csv - Retweet messages with duplicates removed
#
# Duplicate messages are messages from the same Twitter ID with the same text.
# In case of duplicates, the first message is kept and later ones are dropped.
#
# Bugs:
# If multiple input files are provided, the output files only cover the last
# input file. We only had one input file, so this wasn't a problem for us.
#
use strict;
use warnings;
use Text::CSV;
# Verify the command line.
die "No files specified to scan.\n" unless (@ARGV > 0);
my $csv = Text::CSV->new ({binary => 1})
or die "Cannot use CSV: " . Text::CSV->error_diag();
# Declare our file handles.
my $FH;
my $NONRT;
my $RT;
# Loop through the input files.
foreach my $File (@ARGV) {
if (!open($FH,"<:encoding(utf8)", $File)) {
warn "Unable to open $File: $!\n";
}
else {
# Initialize counters and hash arrays.
my $Records = 0;
my $RTCount = 0;
my %TwitIDs;
my %HashTags;
my %DupCheck;
my $DupCount;
# Open the output files.
open ($NONRT,'>:encoding(utf8)','nonretweet.csv');
open ($RT,'>:encoding(utf8)','retweet.csv');
# Loop through the input data.
while (my $row = $csv->getline($FH)) {
my($Key,$TwitID,$From,$Message,$MDate) = @$row;
my $MIdent = "$TwitID/$Message";
$Records++;
# Check for duplicates.
if ($DupCheck{$MIdent}) {
# It's a duplicate.
$DupCount++;
next;
}
else {
# It's new.
$DupCheck{$MIdent}=1
}
# Count unique twitter IDs.
if (defined($TwitIDs{$TwitID})) {
$TwitIDs{$TwitID}++;
}
else {
$TwitIDs{$TwitID}=1;
}
# Identify retweets. A retweet is "RT" not adjacent to
# upper case alpha, or "via" not adjacent to lower alpha
if (
$Message =~ /[^A-Z]RT[^A-Z]|^RT[^A-Z]|[^A-Z]RT$ {
|[^a-z]via[^a-z]|^via[^a-z]|[^a-z]via$/
)
{
# It's a retweet.
print $RT "$Key,$TwitID,$From,$Message,$MDate\n";
$RTCount++;
}
else {
# It's not a retweet.
print $NONRT "$Key,$TwitID,$From,$Message,$MDate\n";
}
# Count the unique hash tags.
foreach (split ('[^#A-Za-z]',$Message)) {
if (/^#[A-Za-z]/) {
$HashTags{uc($_)}++;
}
}
}
# Generate a count of unique twitter IDs.
my $TwitIDCount = scalar keys %TwitIDs;
# Calculate the retweet percentage.
my $RTPercent = int(0.5+$RTCount*100/$Records);
# Print stats.
print "$File:\n\t$Records records\n\t$TwitIDCount unique twitter IDS\n\t$RTCount RT/vias ($RTPercent%)\n\t$DupCount dups\n";
# Print hash tags used.
print "Popular (>50) Hash Tag report:\n";
foreach (keys(%HashTags)) {
printf "%-20.20s %d\n", $_,$HashTags{$_}
if ($HashTags{$_} > 50);
}
}
$csv->eof;
close $FH;
close $NONRT;
close $RT;
}
exit(0);
- Other ideas that have not been explored yet:
- Rate Tweets for validity based on key words. This could best be done by getting word occurence counts from samples of messages that have already been reviewed and marked as relevant or not relevant.
- Identify language of tweet?
- Identify key words for "pre-classification" of tweets?
Other Statistical findings on hash tag occurences (only those with over 50 occurences in the 85k sample shown):
twitter-unix.csv: 82014 records 78371 unique twitter IDS 42348 RT/vias (52%) 3643 dups Popular (>50) Hash Tag report: #HAITI 80312 #HAITIQUAKE 3977 #HELPHAITI 2140 #TWIBBON 1637 #EARTHQUAKE 1511 #FB 628 #HELP 592 #REDCROSS 570 #P 512 #YELE 486 #FF 474 #NEWS 460 #CNN 436 #TCOT 399 #BRESMA 352 #UN 340 #LOC 324 #NEED 319 #IPHONE 286 #IDF 283 #ISRAEL 248 #QUAKE 237 #MSF 229 #CNNHELPHAITI 227 #EQHAITI 214 #PRAY 197 #IRANELECTION 196 #FOLLOWFRIDAY 191 #PHOTOGRAPHY 189 #INFO 185 #GIRO 183 #DONATE 179 #RELIEF 175 #OBAMA 170 #RETWEETTHISIF 164 #FIRSTAID 163 #AID 162 #CONTACT 144 #TWITTER 142 #ERDBEBEN 136 #QUOTE 133 #GOOGLE 131 #UNICEF 129 #JACMEL 126 #CJP 122 #RESCUEMEHAITI 122 #HAITIJP 121 #TERREMOTO 120 #CHARITY 119 #LATISM 116 #NOWPLAYING 115 #HAITICNN 113 #PHOTOS 111 #SHOUTOUT 111 #CCHAITI 100 #US 98 #FAIL 93 #NUM 92 #JGF 90 #VENEZUELA 89 #ORPHANS 88 #RESCUE 87 #CUBA 86 #PRAYERS 82 #IHAVEADREAM 81 #LIVESTRONG 81 #DOCTORSWITHOUTBORDE 80 #FDP 80 #USAF 80 #VIDEO 80 #EQ 79 #HOAX 75 #LETSBEHONEST 75 #PHOTOJOURNALISM 75 #MEDIA 74 #WYCLEFWARRIORS 74 #TLOT 73 #AFGHAN 71 #PRAYER 71 #GREENSAFE 70 #PORTAUPRINCE 70 #USHELPSHAITI 70 #ETSY 69 #OFFERING 69 #SRC 68 #GOLDENGLOBES 67 #IRAN 67 #OXFAM 66 #RESCUEHAITI 66 #HAITIEARTHQUAKE 64 #HAITIRELIEF 64 #MLK 64 #RADIO 64 #HUMANRIGHTS 63 #PDX 61 #PIRATEN 61 #PATROBERTSON 60 #SUPPORT 60 #USA 60 #EMERGENCY 59 #CLINTON 58 #GOD 58 #AFGHANISTAN 56 #GAZA 56 #NAME 56 #SPENDEN 56 #DISASTER 55 #HCR 55 #DONATION 54 #OSM 54 #HOPE 53 #JESUS 53 #FOLLOW 52 #JOB 52 #TECHHAITI 52
<><><>
Eventual integration of Twitter Streaming API with CrisisCamp tools - Ed Borasky
Twitter Search is changing - see my blog post at http://borasky-research.net/2010/01/18/twitter-search-changes-what-do-they-mean-2/
Bottom line - if you have high-volume Twitter Search needs, you should be migrating to Streaming API
Sent an email to Twitter telling them what we're doing. Copied Chris B. and A. Hook
Streaming API page on Twitter: https://twitterapi.pbworks.com/Streaming-API-Documentation
If you are familiar with medical issues or are good at tracking down RSS feeds:
RSS FEED PROJECT
PORTLAND TEAM LEADER: Reid
http://wiki.crisiscommons.org/wiki/Haiti_RSS_Feed_Challenge
Wiki above has lots of contact info, specifics. PDX needs a lead to take on getting clarifications, start discussing task sharing/hand-off with other cities, etc.
The RSS project team turned into the "awesome translation and multi-lingual team" partway though the day.
- Located and added RSS feeds to database in both French and English.
- Coordinated with RSS teams in other cities to finalize version 2 of the RSS input form.
- Translated RSS input form labels to French.
- Worked on Sahana localization to French, and data input on 4636.ushahidi.com (focusing first on French txts, and later on Creole)
List of Sites Without RSS for possible scraping
If you are a geo hacker or GIS geek:
OpenStreetMap
PORTLAND TEAM LEADER: Rafael Gutierrez
Working with the geospatial projects for CrisisCampPDX does not require that you have a strong background in GIS or mapping but it helps. Outlined here are directions for the mapping process with OpenStreetMap. Also included are some of Portland's ongoing efforts and lessons learned to date.
The mapping platform that is currently being used and has one of the largest user base is OpenStreetMap (OSM). There are several Beginner guides here as well as basic editing videos here. There are other mapping applications such as Google MapMaker but that has not been used for this local effort.
Once you have familiarized yourself with OSM and the tutorials for basic editing, visit the OSM Haiti Mapping site. This site details all the current tasks that are still needed as well as other resources useful for mapping such as satellite imagery. A word of caution: this site contains an overwhelming amount of information and it is suggested to at least familiarize yourself with the Mapping Coordination section to get started. The Mapping Coordination section covers:
- 1 General
- 1.1 What needs to be mapped?
- 1.2 How to map
- 1.3 Data collection
- 1.4 Coordination of tracing/mapping efforts
- 1.5 Diff Reports
- 1.6 OpenStreetMap-Emergencies
- 1.7 Error Reports
- 2 Best Practices
- 3 Requests from Responders
- 4 Who's Helping
- 5 Places
- 5.1 Administrative units and boundaries
- 6 Features
- 6.1 Lists of Tags
- 6.2 Specific Tags
- 6.3 JOSM presets
- 6.4 Additional Ideas
- 6.4.1 Fix errorlogs for better data
Pay careful attention to items in BOLD, particularly the Best Practices.
Notes about Getting Started
Again, if you have read how to edit in OSM and Potlatch then head back to the OSM site for details before you get started. If you feel comfortable enough with Potlatch to edit OSM, then make sure you check with the Task Ideas first to make sure certain tasks haven't already been addressed. One of the main tasks that people have contributed extensively to is the basemapping. The tool being used to track this progress is called the OSM Matrix which is a status map. It's not clear if this is the best tool at the moment since it is not regularly updated. OpenStreetEmergencies is a clone of OpenStreetBugs which is another tool that is being used and appears to be a better alternative for status updates. The issue of tracking thousands of users over such a vast area within a short time frame is unprecedented and will likely be addressed at some point. Some analyses are already underway.
Basemapping
When selecting areas for editing, there are two approaches to address the needs based on the status. One is to go to the OSM Matrix and check status of updated areas. The site is tricky to understand so be sure to read the Legend for a good understanding of the symbology and classifications. If you've found a suitable area to edit, you can launch Potlatch straight from the Tools Tab in the lower right-hand side and move on to editing. Always check to see if you are using the best imagery available. A great resource for viewing most of the available imagery and other maps is the Haiti Crisis Map from Telascience. The Haiti Crisis Map also has Permalinks for launching straight into Potlatch. There is a Youtube demo here.
When editing in Potlatch, it is best to use the Edit with Save option.
Other Tasks
- Street Naming
Street naming conventions and how-tos are here: http://wiki.openstreetmap.org/wiki/WikiProject_Haiti/Street_names Use the 'Show Unnamed Streets' in Potlatch Set Options to 'Highlight unnamed roads’.
Hospital Locating- DONE.
I believe this was even completed before the start of Portland's Crisis Camp. Nice work everyone!
Tagging
When Tagging your data after edits, it's important that you follow protocols that have already been established, both by the OSM community and the Humanitarian effort. Also, it is very important to document your source data (IKONOS, GPS Garmin trace, Geo-Eye, etc.) with appropriate tags.
Resources
As you get mapping and need more resources, help, or guidance along the way, it's best to check the CrisisMappers Google Group or the IRC Chat Channel for OpenStreetMap #OSM Channel; this is a great resource for quick answers.
Imagery footprints are being updated at a very rapid pace. Many remote sensing companies are providing free and very high resolution for most of the areas affected by the earthquake in Haiti.
Portland's OSM Mapping Experience
The Portland CrisisCamp (#ccamppdx) mapping effort began in fits and starts and could have benefitted greatly from reading this more carefully. One very confounding issue was the the desire to find areas that needed mapping while also avoiding the duplication of effort.
If you speak French and want to manage a translation group
OPEN TWEET FIND
Twitter data set on github: http://github.com/unthinkingly/haiti.ushahidi.com-twitter-export
Ushahidi source code: http://github.com/ushahidi/Ushahidi_Haiti
Ushahidi dev site: http://haiti1.osuosl.org - Up and functional using Ushahidi_Web, ping gchaix (IRC) or gchaix@osuosl.org for an account.
Managing News 7 dev site: http://haiti2.osuosl.org - site is up, ping gchaix (IRC) or gchaix@osuosl.org for an account. ON HOLD - focusing on Ruby/Rails-based filtering tool - CrisisFilter See: http://etherpad.com/rhok
For important, simple things anyone can do:
Please see the tasks on the "Important, Simple Things" Page: Simple Tasks Anyone Can Do
