Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 694 public repositories matching this topic...

mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

  • Updated Apr 17, 2026
  • Python

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

  • Updated Feb 14, 2025
  • Python

Created by Matei Zaharia

Released May 26, 2014

Followers
434 followers
Repository
apache/spark
Website
github.com/topics/spark
Wikipedia
Wikipedia

Related topics

hadoop scala