×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS

on

  • 266 views

Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description ...

Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

Statistics

Views

Total Views
266
Views on SlideShare
266
Embed Views
0

Actions

Likes
0
Downloads
35
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • EMR supports multiple instance types including the latest HS1 instance types <br /> EMR now supports High Storage Instances (hs1.8xlarge) in US East. These new instances offer 48 TB of storage across 24 hard disk drives, 35 EC2 Compute Units (ECUs) of compute capacity, 117 GB of RAM, 10 Gbps networking, and 2.4+ GB per second of sequential I/O performance. High Storage Instances are ideally suited for Hadoop and they significantly reduce the cost of processing very large data sets on EMR. We look forward to adding support for High Storage Instances in additional regions early next year. <br />
  • And the concept of adding nodes works well with hadoop – especially on the cloud since 10 nodes running for 10 hours costs the same as 100 nodes running for 1 hour. <br />
  • Vertical scaling on commodity hardware. Perfect for Hadoop.

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS Presentation Transcript

  • © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services @sk_bln
  • Overview Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • DataApp App http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ Data has gravity ComputeStorage Big Data
  • Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ …and inertia at volume… ComputeStorage Big Data
  • Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/ …easier to move applications to the data ComputeStorage Big Data
  • Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as- service-in-cloud.html S3 as a “single source of truth” S3
  • Getting your Data into AWS Amazon S3 Corporate Data Center • Console Upload • FTP • AWS Import Export • S3 API • Direct Connect • Storage Gateway • 3rd Party Commercial Apps • Tsunami UDP
  • Write directly to a data source Your application Amazon S3 DynamoDB Any other data store Amazon S3 Amazon EC2
  • Queue, pre-process and then write Amazon Simple Queue Service (SQS) Amazon S3 DynamoDB Any other data store
  • Amazon SQS Amazon S3 DynamoDB Any SQL or NoSQL Store Log Aggregation tools Choose depending upon design
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Hadoop based Analysis Amazon S3 Amazon EMR Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • EMR is Hadoop in the Cloud Amazon Elastic MapReduce (EMR)?
  • EMR Cluster S3 Put the data into S3 Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Get the output from S3 Launch the cluster using the EMR console, CLI, SDK, or APIs You can also store everything in HDFS How does EMR work ?
  • Resize Nodes EMR Cluster You can easily add and remove nodes
  • 1 instance for 100 hours = 100 instances for 1 hour
  • Small instance = $5.50 (including EMR – without: $4.40)
  • 1 instance for 1000 hours = 1000 instances for 1 hour
  • Small instance = $55 (including EMR – without: $44)
  • When you turn off your cloud resources, you actually stop paying for them
  • SQL based processing Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud What is Amazon Redshift ? Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
  • Demo: Amazon Redshift
  • Generation Collection & storage Analytics & computation Collaboration & sharing
  • Your choice of BI Tools Amazon S3 Amazon EMR Amazon Redshift Pre-processing framework Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Demo Jaspersoft as a BI Frontend
  • Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Web App Server Visualization tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Sharing results and visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Geospatial Visualizations Amazon S3 Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Rinse and Repeat Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • The complete architecture Amazon S3 Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline Amazon SQS DynamoDB Any SQL or NoSQL Store Log Aggregation tools
  • Real Time
  • Amazon Kinesis • Real-time processing • Massive scale • Integrated • Use cases: • Real-time log analysis • Real-time data analytics • Social media monitoring • Financial transactions • Online machine learning
  • Amazon Kinesis Data Flow Data Sources App.4 [Machine Learning] AWSEndpoint App.1 [Aggregate & De- Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extraction] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone
  • Use cases
  • SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly
  • Data Architecture Data Analyst Raw Data Get Data Join via Facebook Add a Skill Page Invite Friends Web Servers Amazon S3 User Action Trace Events EMR Hive Scripts Process Content • Process log files with regular expressions to parse out the info we need. • Processes cookies into useful searchable data such as Session, UserId, API Security token. • Filters surplus info like internal varnish logging. Amazon S3 Aggregated Data Raw Events Internal Web Excel Tableau Amazon Redshift
  • We found that Amazon Redshift offers the performance we needed while freeing us from the licensing costs of our previous solution With Amazon Redshift and Tableau, anyone in the company can set up any queries they like—from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts have had in different areas. It’s very flexible Jon Hoffman, Software Engineer, Foursquare 0 0.2 0.4 0.6 Female Male Gender 0 50 100 Age Foursquare Gorilla Coffee Gray's Papaya Amorino When do people go to a place?
  • Stack – analysis and sharing ApplicationStack Scala/Liftweb API Machines WWW Machines Batch Jobs Scala Application code Mongo/Postgres/Flat Files Databases Logs DataStack Amazon S3 Database Dumps Log Files Hadoop Elastic Map Reduce Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs mongoexport postgres dump Flume
  • Everything that was a limited resource is now a programmable resource
  • • Hadoop Technology and Use Cases: http://www.powerof60.com/ • http://aws.amazon.com/de • Start with the Free Tier: http://aws.amazon.com/de/free/ • 25 US$ credits for new German customers: http://aws.amazon.com/de/campaigns/account/ • Twitter: @AWS_Aktuell • Facebook: http://www.facebook.com/awsaktuell • Webinars: http://aws.amazon.com/de/about-aws/events/ Resources