data-pipeline

Here are 499 public repositories matching this topic...

snowplow / snowplow

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated Jul 13, 2023
Scala

apache / seatunnel

Star

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

real-time offline high-performance apache data-integration sql-engine data-pipeline etl-framework seatunnel

Updated Jul 17, 2023
Java

kestra-io / kestra

Star

Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated Jul 15, 2023
Java

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated May 8, 2023

memphisdev / memphis

Star

Memphis.dev is an intelligent, frictionless message broker. Made to enable developers to build real-time and streaming features fast.

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated Jul 16, 2023
Go

whylabs / whylogs

Star

The open standard for data logging

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Jul 14, 2023
Jupyter Notebook

pydoit / doit

Star

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated May 10, 2023
Python

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated Jun 17, 2023
Go

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline