Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 7,989 public repositories matching this topic...
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Updated
Oct 10, 2023 - Python
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
Updated
Dec 24, 2023 - Python
Learn and understand Docker&Container technologies, with real DevOps practice!
-
Updated
Dec 22, 2023 - Go
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
-
Updated
Sep 15, 2023 - Java
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
-
Updated
Dec 22, 2023 - Python
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
-
Updated
Dec 22, 2023 - Java
List of Data Science Cheatsheets to rule the world
-
Updated
Nov 19, 2022
A Flexible and Powerful Parameter Server for large-scale machine learning
-
Updated
Nov 24, 2022 - Java
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Dec 22, 2023 - HTML
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Updated
Dec 26, 2023 - Jupyter Notebook
Alluxio, data orchestration for analytics and machine learning in the cloud
-
Updated
Dec 25, 2023 - Java
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Dec 26, 2023 - Python
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
-
Updated
Feb 27, 2023 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 406 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia