Here are
27 public repositories
matching this topic...
Reference Architectures for Datalakes on AWS
Updated
May 13, 2020
HTML
An example Terraform project that will configure a Secure and Customizable Spark Cluster on Amazon EMR.
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
Updated
Jul 6, 2022
Python
Bits of code I use during live demos
Updated
Aug 10, 2022
Jupyter Notebook
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Updated
Jul 13, 2022
Jupyter Notebook
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Updated
Sep 1, 2022
Python
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
Updated
Aug 27, 2022
Jupyter Notebook
3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
Updated
Aug 17, 2019
Jupyter Notebook
📓 Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Updated
Dec 4, 2016
Python
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Updated
Oct 7, 2022
Python
Updated
Jun 9, 2021
Jupyter Notebook
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Updated
Oct 13, 2022
TypeScript
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
Updated
Apr 21, 2019
Scala
This repo provides cross-account integration code samples using Amazon S3 Access points
Updated
Dec 28, 2021
Java
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
Updated
Dec 29, 2020
Python
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
Updated
Aug 11, 2022
Shell
Udacity Data Engineering Nanodegree Program
Updated
Jun 1, 2020
Python
A simple Java-Scala mixed project template for Apache Spark
Updated
May 11, 2020
Scala
Improve this page
Add a description, image, and links to the
amazon-emr
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
amazon-emr
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.