The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
-
Updated
Nov 5, 2023
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a free analytics DBMS for big data
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
An open source cybersecurity protocol for syncing decentralized graph data.
An open source time-series database for fast ingest and SQL queries
PredictionIO, a machine learning server for developers and ML engineers.
The Data Engineering Cookbook
CMAK is a tool for managing Apache Kafka clusters
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
The most widely used Python to C compiler
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Open-Source Web UI for Apache Kafka Management
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.
To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."