#
data-pipelines
Here are 75 public repositories matching this topic...
edogrigqv2
commented
Dec 13, 2020
🚨 🚨 Feature Request
We need description, citation, license, and version meta info to be added to the dataset.
Is your feature request related to a problem?
Some datasets need this info inside them for legal reasons.
If your feature will improve HUB
Easy to implement, won't hurt for sure.
Description of the possible solution
Currently, we have all metadata store
MLeap: Deploy ML Pipelines to Production
-
Updated
Aug 3, 2021 - Scala
ricklamers
commented
Aug 14, 2021
Describe the bug
Editing a job prompts "unsaved changes" on save
To Reproduce
Steps to reproduce the behavior:
- Go to running cron job
- Make changes
- Save edits, see warning "Leave with unsaved changes"
Expected behavior
3. It should just save, not warn.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
golang
bigquery
airflow
automation
etl
analytics
data-transformation
data-warehouse
business-intelligence
elt
workflows
data-pipelines
data-modelling
analytics-engineering
-
Updated
Aug 20, 2021 - Go
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
-
Updated
Aug 19, 2021 - TypeScript
Relational data pipelines for the science lab
mysql
python
s3
databases
pipeline-framework
scientific-computing
cloud-computing
data-analysis
relational-databases
data-pipelines
workflow-management
datajoint
relational-algebra
relational-model
-
Updated
Aug 9, 2021 - Python
An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.
php
framework
reporting
data-visualization
data-viz
data-analysis
reporting-engine
data-pipelines
report-generator
php-reports
mysql-reporting-tools
php-reporting-tools
data-pivot
data-summarization
reporting-tool
-
Updated
Aug 8, 2021 - PHP
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
-
Updated
Sep 30, 2019 - Python
Functional reactive data pipelines
data-science
sql
etl
pipelines
immutability
data-engineering
functional-reactive-programming
data-analysis
data-pipelines
data-pipeline
etl-framework
etl-pipeline
etl-pipelines
-
Updated
Aug 20, 2021 - Python
soffest
commented
Aug 4, 2021
1
Cloud-native, data onboarding architecture for Google Cloud Datasets
bigquery
airflow
google-cloud
cloud-storage
open-data
data-engineering
cloud-native
datasets
data-pipelines
cloud-composer
data-architecture
-
Updated
Aug 19, 2021 - Python
Beneath is a serverless real-time data platform ⚡️
python
go
kubernetes
data-science
streaming
sql
etl
analytics
data-warehouse
data-engineering
dataops
developer-tools
data-pipelines
mlops
beneath
-
Updated
Aug 6, 2021 - Go
Framework to quickly build and maintain Smart Data Lakes
scala
spark
hive
hadoop
transform-data
data-lake
data-pipelines
comprehensive
deltalake
smart-data-lake
-
Updated
Aug 20, 2021 - Scala
kevin-hanselman
opened
May 25, 2021
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
java
export
machine-learning
scala
spark
apache-spark
machine-learning-algorithms
transformers
mllib
machine-learning-library
data-pipelines
-
Updated
Dec 15, 2017 - Java
ELT for the DataOps era- open source data integration tool. This is a read-only mirror of https://gitlab.com/meltano/meltano
open-source
tap
data
opensource
integration
pipelines
target
dataops
loaders
elt
extract-data
data-pipelines
singer
connectors
dataengineering
targets
taps
meltano
dataops-platform
meltano-sdk
-
Updated
Aug 19, 2021 - Python
Example of an ETL Pipeline using Airflow
-
Updated
Aug 30, 2017 - Python
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
docker
distributed-systems
docker-swarm
business-intelligence
data-pipelines
big-data-analytics
predictive-maintenance
cloud-native-applications
-
Updated
Aug 4, 2021 - Python
Framework for data processing
-
Updated
Nov 10, 2019 - Python
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
-
Updated
Jun 2, 2021 - Python
A Pachyderm deep learning tutorial for conference workshops
python
docker
kubernetes
data-science
machine-learning
deep-learning
containers
data-engineering
data-pipelines
-
Updated
Aug 2, 2017 - Python
Classwork projects and home works done through Udacity data engineering nano degree
data-science
big-data
spark
etl
s3-bucket
data-analysis
redshift
data-pipelines
classwork
emr-cluster
data-lake-analytics
data-engineering-pipeline
airflow-dags
-
Updated
Jun 6, 2021 - Jupyter Notebook
Provides an extensible solution for creating Data Processing Pipelines in F#.
-
Updated
Apr 7, 2018 - F#
Rivery CLI
data-science
database
etl
dataops
database-management
dwh
elt
data-pipelines
data-pipeline
dwh-team
dataops-platform
rivery
-
Updated
Jul 29, 2021 - Python
The official wiki for the Data Engineering community.
-
Updated
Jun 23, 2021 - CSS
Using Apache Airflow to author, run and monitor complex data pipelines.
-
Updated
Oct 24, 2018 - Jupyter Notebook
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
nlp
elasticsearch
kibana
rest
data-integration
nifi
apache-nifi
data-pipelines
electronic-health-records
-
Updated
May 26, 2021 - Jupyter Notebook
Marshmallow serializer integration with pyspark
schema
spark
pyspark
data-engineering
marshmallow
data-pipelines
data-cleaning
data-engineering-pipeline
data-schemas
-
Updated
May 5, 2021 - Python
Source code for guide to run Apache Airflow on Kubernetes
-
Updated
Apr 13, 2020 - Python
Improve this page
Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."
Issue from the Dagster Slack
[dagster_shell] defer environ copy to solid run time
This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1623181007055200?thread_ts=1623181007.055200&cid=