awslabs / deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
See what the GitHub community is most excited about today.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
A better notebook for Scala (and more)
A Scala API for Apache Beam and Google Cloud Dataflow.
State of the Art Natural Language Processing
ZIO — A type-safe, composable library for async and concurrent programming in Scala
Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"
Scala 2 compiler and standard library. For bugs, see scala/bug
Scalable genomic data analysis.
sbt, the interactive build tool
A STAC/OGC API Features Web Service
We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. Learn more.
We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products.
You can always update your selection by clicking Cookie Preferences at the bottom of the page.
For more information, see our Privacy Statement.
We use essential cookies to perform essential website functions, e.g. they're used to log you in. Learn more
We use analytics cookies to understand how you use our websites so we can make them better, e.g. they're used to gather information about the pages you visit and how many clicks you need to accomplish a task. Learn more