big-data

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Oct 10, 2023
Python

apache / flink

Star

Apache Flink

python java scala sql big-data flink

Updated Nov 18, 2023
Java

amark / gun

Sponsor

Star

An open source cybersecurity protocol for syncing decentralized graph data.

Updated Nov 17, 2023
JavaScript

prestodb / presto

Star

The official home of the Presto distributed SQL query engine for big data

java data query sql big-data presto hive hadoop lakehouse

Updated Nov 19, 2023
Java

heibaiying / BigData-Notes

Star

大数据入门指南 ⭐

phoenix scala kafka big-data spark yarn hive hadoop storm bigdata hbase zookeeper hdfs mapreduce flume azkaban sqoop

Updated Sep 15, 2023
Java

questdb / questdb

Star

An open source time-series database for fast ingest and SQL queries

java iot postgres sql database big-data time-series analytics cpp grafana postgresql simd low-latency financial-analysis tsdb hacktoberfest time-series-database questdb

Updated Nov 18, 2023
Java

apache / predictionio

Star

PredictionIO, a machine learning server for developers and ML engineers.

scala big-data predictionio

Updated Jan 9, 2021
Scala

andkret / Cookbook

Star

The Data Engineering Cookbook

big-data best-practices cookbook data-engineering data-engineer

Updated Apr 11, 2023

yahoo / CMAK

Star

CMAK is a tool for managing Apache Kafka clusters

scala kafka big-data cluster-management

Updated Aug 2, 2023
Scala

vesoft-inc / nebula

Star

A distributed, fast open-source graph database featuring horizontal scalability and high availability

distributed-systems database big-data cpp graph raft scalability distributed graph-database graphdb hacktoberfest nebula nebula-graph nebulagraph

Updated Nov 15, 2023
C++

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Nov 19, 2023
Java

cython / cython

Star

The most widely used Python to C compiler

python c performance big-data cpp cython cpython cpython-extensions

Updated Nov 18, 2023
Python

catboost / catboost

Star

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated Nov 18, 2023
Python

provectus / kafka-ui

Star

Open-Source Web UI for Apache Kafka Management

opensource kafka big-data web-ui streams kafka-connect apache-kafka kafka-producer kafka-client kafka-streams hacktoberfest streaming-data kafka-manager kafka-cluster event-streaming cluster-management kafka-ui kafka-brokers

Updated Nov 16, 2023
Java

apache / beam

Star

Apache Beam is a unified programming model for Batch and Streaming data processing.

python java golang streaming sql big-data beam batch

Updated Nov 19, 2023
Java

h2oai / h2o-3

Star

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Updated Nov 19, 2023
Jupyter Notebook

delta-io / delta

Star

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Nov 18, 2023
HTML

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more