apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
See what the GitHub community is most excited about today.
Apache Spark - A unified analytics engine for large-scale data processing
The Scala 3 compiler, also known as Dotty.
In-memory dimensional time series database.
Build highly concurrent, distributed, and resilient message-driven applications on the JVM
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Rocket Chip Generator
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Elasticsearch Scala Client - Reactive, Non Blocking, Type Safe, HTTP Client
Chisel 3: A Modern Hardware Design Language
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
State of the Art Natural Language Processing
Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala
Network components (NIC, Switch) for FireBox
Ergo protocol description & reference client implementation
Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
CMAK is a tool for managing Apache Kafka clusters
Multi-language coverage reporter for Codacy
Utility code for Scala: logging, testing, configuration and more
A simple alternative to the Amazon SQS Daemon ("sqsd") used on AWS Beanstalk worker tier instances, based on https://github.com/mozart-analytics/sqsd
Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.
Apache Spark Connector for Azure Cosmos DB
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
A minimal, idiomatic Scala interface for HTTP