Introduction to Cloud Computing - UMIACS

Introduction to Cloud Computing - UMIACS

Data-Intensive Text Processing (Bonus session) with MapReduce Tutorial at 2009 North American Chapter of the Association for Computational LinguisticsHuman Language Technologies Conference (NAACL HLT 2009) Jimmy Lin The iSchool University of Maryland

Chris Dyer Department of Linguistics University of Maryland Sunday, May 31, 2009 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Agenda Hadoop nuts and bolts Hello World Hadoop example

(distributed word count) Running Hadoop in standalone mode Running Hadoop on EC2

Open-source Hadoop ecosystem Exercises and office hours

Hadoop nuts and bolts Source: http://davidzinger.wordpress.com/2007/05/page/2/ Hadoop Zen Dont get frustrated (take a deep breath)

This is bleeding edge technology:

Remember this when you experience those W$*#[email protected]! moments Lots of bugs Stability issues Even lost data To upgrade or not to upgrade (damned either way)? Poor documentation (or none)

But Hadoop is the path to data nirvana? Cloud9 Library used for teaching cloud computing courses at Maryland

Demos, sample code, etc. Computing conditional probabilities

Pairs vs. stripes Complex data types Boilerplate code for working various IR collections Dog food for research

Open source, anonymous svn access Master node Client Client JobTracker

JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

TaskTracker Slave node Slave node Slave node From Theory to Practice

1. Scp data to cluster 2. Move data into HDFS 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 You

Hadoop Cluster 5. Move data out of HDFS 6. Scp data from cluster Data Types in Hadoop Writable WritableComprable

IntWritable LongWritable Text Defines a de/serialization protocol. Every data type in Hadoop is a Writable. Defines a sort order. All keys must be

of this type (but not values). Concrete classes for different data types. Complex Data Types in Hadoop How do you implement complex data types?

The easiest way:

The hard way: Encoded it as Text, e.g., (a, b) = a:b Use regular expressions to parse and extract data

Works, but pretty hack-ish Define a custom implementation of WritableComprable Must implement: readFields, write, compareTo Computationally efficient, but slow for rapid prototyping Alternatives:

Cloud9 offers two other choices: Tuple and JSON Plus, a number of frequently-used data types Input Input file file (on (on HDFS) HDFS)

InputSplit InputSplit InputFormat RecordReader RecordReader Mapper Mapper

Partitioner Partitioner Reducer Reducer OutputFormat

RecordWriter RecordWriter Output Output file file (on (on HDFS) HDFS)

What version should I use? Hello World Hadoop example Hadoop in standalone mode Hadoop in EC2 From Theory to Practice

1. Scp data to cluster 2. Move data into HDFS 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 You

Hadoop Cluster 5. Move data out of HDFS 6. Scp data from cluster On Amazon: With EC2 0. Allocate Hadoop cluster 1. Scp data to cluster 2. Move data into HDFS

EC2 3. Develop code locally 4. Submit MapReduce job 4a. Go back to Step 3 Your Hadoop Cluster You

5. Move data out of HDFS 6. Scp data from cluster 7. Clean up! Uh oh. Where did the data go? On Amazon: EC2 and S3 Copy from S3 to HDFS

S3 EC2 (Persistent Store) (The Cloud) Your Hadoop Cluster

Copy from HFDS to S3 Open-source Hadoop ecosystem Hadoop/HDFS Hadoop streaming

HDFS/FUSE EC2/S3/EBS EMR Pig HBase

Hypertable Hive Mahout Cassandra

Dryad CUDA CELL Beware of toys! Exercises

Questions? Comments? Thanks to the organizations who support our work:

Recently Viewed Presentations

  • Fundamental concepts

    Fundamental concepts

    Trust and integrity are essential requirements for any system, but this is especially true in a decentralized system where the peer participants operate beyond the traditional boundaries of trust. Today, you'll learn how to add elements that establish trust and...
  • Learning English as a Foreign Language in Developing

    Learning English as a Foreign Language in Developing

    Learning English as a Foreign Language in Developing Regions Matthew Kam* | Divya Ramachandran* Jane Chiu† | Anand Raghavan* University of California at Berkeley, USA
  • Writing Scotland : contemporary Scottish literature and the ...

    Writing Scotland : contemporary Scottish literature and the ...

    Writing Scotland : contemporary Scottish literature and the "representation" of the nation. Marie-Odile Pittin-Hedon In the absence of elected political authority, the task of representing the nation has been repeatedly devolved to its writers.
  • my.harvard FAS Course Catalog & Schedule of Classes

    my.harvard FAS Course Catalog & Schedule of Classes

    Course Catalog & Schedule of Classes Recommended Course Maintenance Workflows Start in the Schedule of Classes : The schedule displays nearly all courses that were/are being taught in the previous academic year and have been copied to the current year...
  • Cultural Patterns and Processes - FEED YOUR BRAINS

    Cultural Patterns and Processes - FEED YOUR BRAINS

    Lingua Franca Facilitates trade Pidgin Simplified version Becomes Creole/Creolized (once it is adopted) Ex. French domination in the Caribbean…first pidgin French then became Creolized Toponyms Are Place names Reflect cultural identity and impact the cultural landscape May be controversial Ex....
  • Selecting new testers and integrating them into test team

    Selecting new testers and integrating them into test team

    - recruitment advertising (job portals, social media, Latvian State Employment Agency) No help from personnel department. Previous experience - internal recruitment within the Language school. It is quicker and has a lower cost to hire someone internally. Teaching section -...
  • PRESSURE GROUPS - Yola

    PRESSURE GROUPS - Yola

    PRESSURE GROUPS An Introduction What do you care about? What issues at Ravens Wood School need addressing? What are Pressure Groups? (or Lobby Groups, Interest Groups, Protest Groups) An organised group that does not put up candidates but seek to...
  • J U N I O R A D

    J U N I O R A D

    GACollege 411. Big Future (College Board) Career Cruising. Get Involved in your community! Studies show students who are involved in their community have better grades. Volunteer & community service. Join a club. Youth groups. Summer/part-time jobs or internships.