data-pipeline

Here are 413 public repositories matching this topic...

snowplow / snowplow

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated Dec 7, 2022
Scala

apache / incubator-seatunnel

Star

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

real-time offline high-performance apache data-integration sql-engine data-pipeline etl-framework seatunnel

Updated Dec 24, 2022
Java

kestra-io / kestra

Star

Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.

java kubernetes yaml workflow data kafka scale etl workflow-engine scheduler orchestration data-engineering elt data-pipeline low-code data-orchestration kestra data-orchestrator

Updated Dec 23, 2022
Java

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Jun 28, 2022

memphisdev / memphis-broker

Star

Memphis is an Open-Source, Real-Time Data Processing Platform

kubernetes golang data data-engineering data-pipeline data-streaming data-stream-processing messaging-queue

Updated Dec 24, 2022
Go

whylabs / whylogs

Star

The open standard for data logging

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Dec 24, 2022
Jupyter Notebook

pydoit / doit

Star

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated Sep 8, 2022
Python

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated Nov 17, 2022
Go

GoogleCloudPlatform / data-science-on-gcp

Star

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated Dec 20, 2022
Jupyter Notebook

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline