Day 1 Module 1 - Introduction to Big Data

Day 1 Module 1 - Introduction to Big Data

Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya Why Big Data? Agenda Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service The Business Imperative 1. 2.

3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve CAP Theorem C Consistenc y

A Availabili ty P Partition Tolerance Big Data Lambda Architectu re Big Data Lambda Architecture Batch layer Stores master dataset

Compute arbitrary views Batch Layer Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer

Speed Layer Serving Layer The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency Incoming data streams Master dataset

Batch views The Speed Layer Real-time views Stream processing of data Stores a limited window of data Dynamic computation Incoming data streams Process stream

Real-time increments Increment views The Serving Layer Batch views Queries the batch and real-time views Merges the results Real-time views Querying and merging Output

Microsoft Lambda Architecture Batch Layer Speed Layer Serving Layer Support Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Federations in Windows Azure SQL Database Azure tables Memcached/ MongoDB Azure Storage

Explorer Microsoft Excel Power Query PowerPivot SQL Server database engine Power View SQL Server VM: Reporting Services Power Map Columnstore indexes

LINQ to Hive Analysis Services Analysis Services Yahoo! Batch Layer Apache Hadoop Speed Layer Staging Database Serving Layer

SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot SQL Server Connector (Hadoop Hive ODBC) Other BI Tools and Custom Applications SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications

Microsoft Excel & PowerPivot for Excel Ferranti Computer Systems Batch Layer Windows Azure HDInsight Speed Layer Reactive Extensions (Rx) SQL Server Database (In-Memory OLTP) Reactive Extensions (Rx)

Serving Layer Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Data Feed from Smart Meters Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX

SQL Server SQL Server Analysis Reporting Services Services Windows Azure Storage Demo 1: Setting up the Windows Azure storage account Batch Layer Azure Blob storage Speed Layer Serving Layer

Azure Storage Explorer Azure Storage Explorer Windows Azure Blob storage Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs

http://.blob.core.windows.net// Account Containe r Blob PIC01.JPG Images Contoso Pages/ Blocks Block/Page PIC02.JPG Video

Block/Page VID1.AVI Getting started with HDInsight Service Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Windows Azure HDInsight Speed Layer Serving Layer

HDInsight Console Azure Blob storage HDInsight Console Windows Azure HDInsight https:// .azurehdinsight.net/ Windows Azure Blob storage Demo 3: Loading data into Windows Azure storage for use withBatch HDInsight Layer Speed Layer Serving Layer

Windows Azure HDInsight HDInsight Console Azure Blob storage HDInsight Console Windows Azure HDInsight CSV files from local disk https:// .azurehdinsight.net/ Windows Azure Blob storage Easy

Access to Data, Big & Small Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Key Features Easily preview, shape, & format your data Power Query Windows Azure Marketplace Windows Azure HDInsight Service

Combine with Unstructure d Combine and refine data across multiple sources Easily Manage & Query Common management of structured & unstructured data Gain insight across relational, unstructured, & semistructured data

Query across relational DB & Hadoop with single T-SQL Query Parallel Data Warehouse with Polybase Getting Started with Learn more HDInsight http:// blogs.msdn.com/b/windowsazure/archive/2013/ 03/19/getting-started-with-hdinsight.aspx Azure HDInsight and Azure Storage

http:// blogs.msdn.com/b/windowsazure/archive/2013/ 03/21/azure-hdinsight-and-azure-storage.aspx Questions?

Recently Viewed Presentations

  • Created by Mr. Bendele and Mr. Thompson Your

    Created by Mr. Bendele and Mr. Thompson Your

    Your job is to browse through the following PowerPoint presentation to see an example of a plant cell and the plant cell organelles (parts). After you have studied the samples, you will have the opportunity to create a plant cell...
  • The Hero's Journey: A study of Archetypes

    The Hero's Journey: A study of Archetypes

    Perseus is often called the archetypal hero. Describe the ways in which he embodies heroic traits. The Hero's Journey: A study of Archetypes. A study of Joseph Campbell's The Hero's Adventure . Joseph Campbell (1904-1987)
  • Waspalloy - Mechanical Engineering Online

    Waspalloy - Mechanical Engineering Online

    Micro Structure Vacuum Induction Melting: VIM Reduced levels of nitrogen and oxygen Allows for better control of Titanium and Aluminum alloying Increases forgeability Produced in cast and wrought forms FCC structure Properties Specific Gravity- Waspalloy - 8.25 Steel - 7.83...
  • Engaging Mathematics For All Learners

    Engaging Mathematics For All Learners

    Taking the Fear Factor Out of Rich Tasks: Supporting the Three Aims of the National Curriculum Lynne McClure, Jennie Pennant, Bernard Bagnall and Liz Woodham
  • T Minus 275 Days… or…What should I be doing now?

    T Minus 275 Days… or…What should I be doing now?

    Target Heart Rates Max effective HR = 220 minus your age Exercise @ 70-85% of max effective rate FORMULA: 220 - age x (.70) to (.85) Example: Age 40 70%: (220-40=180) x (.70) = 126 BPM 85%: (220-40=180) x (.85)...
  • Race and Ethnic Relations

    Race and Ethnic Relations

    Minority Groups - Conflict Theory. Speaking the language most common in a society is one position of power held by the dominant group. Using the conflict theory, many sociologists have concluded that a dominant group's position of power allows them...
  • Natural Selection 3 Facts, 1 Lie - Central Bucks School District

    Natural Selection 3 Facts, 1 Lie - Central Bucks School District

    Natural Selection 3 Facts, 1 Lie. A. Species overproduce offspring that may survive an environment. B. There is little variation among members of a population. C. Competition for resources, mates, and space among species leads to a struggle to survive....
  • Geneva Lacrosse - LeagueAthletics.com

    Geneva Lacrosse - LeagueAthletics.com

    Advanced Defensive Checks. As stated earlier multiple times, body positioning is the key to playing good defense. There are additional advanced takeaway checks that defensemen can learn, but these should come secondary to learning solid fundamental defensive positioning and the...