The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
-
Updated
Dec 7, 2022 - Scala
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
A list of useful resources to learn Data Engineering from scratch
Memphis is an Open-Source, Real-Time Data Processing Platform
The open standard for data logging
task management & automation tool
A lightweight stream processing library for Go
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Open-source data observability for analytics engineers.
Smarter data pipelines for audio.
Example end to end data engineering project.
A list about Apache Kafka
Streaming reactive and dataflow graphs in Python
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Use SQL to build ELT pipelines on a data lakehouse.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Pythonic tool for running data-science/high performance/quantum-computing workflows in heterogenous environments.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."