Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya Why Big Data? Agenda Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service The Business Imperative 1. 2.
3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve CAP Theorem C Consistenc y
A Availabili ty P Partition Tolerance Big Data Lambda Architectu re Big Data Lambda Architecture Batch layer Stores master dataset
Compute arbitrary views Batch Layer Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer
Speed Layer Serving Layer The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency Incoming data streams Master dataset
Batch views The Speed Layer Real-time views Stream processing of data Stores a limited window of data Dynamic computation Incoming data streams Process stream
Real-time increments Increment views The Serving Layer Batch views Queries the batch and real-time views Merges the results Real-time views Querying and merging Output
Microsoft Lambda Architecture Batch Layer Speed Layer Serving Layer Support Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Federations in Windows Azure SQL Database Azure tables Memcached/ MongoDB Azure Storage
Explorer Microsoft Excel Power Query PowerPivot SQL Server database engine Power View SQL Server VM: Reporting Services Power Map Columnstore indexes
LINQ to Hive Analysis Services Analysis Services Yahoo! Batch Layer Apache Hadoop Speed Layer Staging Database Serving Layer
SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot SQL Server Connector (Hadoop Hive ODBC) Other BI Tools and Custom Applications SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications
Microsoft Excel & PowerPivot for Excel Ferranti Computer Systems Batch Layer Windows Azure HDInsight Speed Layer Reactive Extensions (Rx) SQL Server Database (In-Memory OLTP) Reactive Extensions (Rx)
Serving Layer Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Data Feed from Smart Meters Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX
SQL Server SQL Server Analysis Reporting Services Services Windows Azure Storage Demo 1: Setting up the Windows Azure storage account Batch Layer Azure Blob storage Speed Layer Serving Layer
Azure Storage Explorer Azure Storage Explorer Windows Azure Blob storage Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs
http://.blob.core.windows.net// Account Containe r Blob PIC01.JPG Images Contoso Pages/ Blocks Block/Page PIC02.JPG Video
Block/Page VID1.AVI Getting started with HDInsight Service Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Windows Azure HDInsight Speed Layer Serving Layer
HDInsight Console Azure Blob storage HDInsight Console Windows Azure HDInsight https:// .azurehdinsight.net/ Windows Azure Blob storage Demo 3: Loading data into Windows Azure storage for use withBatch HDInsight Layer Speed Layer Serving Layer
Windows Azure HDInsight HDInsight Console Azure Blob storage HDInsight Console Windows Azure HDInsight CSV files from local disk https:// .azurehdinsight.net/ Windows Azure Blob storage Easy
Access to Data, Big & Small Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Key Features Easily preview, shape, & format your data Power Query Windows Azure Marketplace Windows Azure HDInsight Service
Combine with Unstructure d Combine and refine data across multiple sources Easily Manage & Query Common management of structured & unstructured data Gain insight across relational, unstructured, & semistructured data
Query across relational DB & Hadoop with single T-SQL Query Parallel Data Warehouse with Polybase Getting Started with Learn more HDInsight http:// blogs.msdn.com/b/windowsazure/archive/2013/ 03/19/getting-started-with-hdinsight.aspx Azure HDInsight and Azure Storage
http:// blogs.msdn.com/b/windowsazure/archive/2013/ 03/21/azure-hdinsight-and-azure-storage.aspx Questions?