The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
-
Updated
Jul 13, 2023 - Scala
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
A list of useful resources to learn Data Engineering from scratch
Memphis.dev is an intelligent, frictionless message broker. Made to enable developers to build real-time and streaming features fast.
The open standard for data logging
task management & automation tool
A lightweight stream processing library for Go
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Open-source data observability for analytics engineers.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Example end to end data engineering project.
Smarter data pipelines for audio.
A list about Apache Kafka
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogenous environments.
Streaming reactive and dataflow graphs in Python
Code review for data in dbt
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."