Skip to content

ray-project/deltacat

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

DeltaCAT

DeltaCAT is a Pythonic Data Catalog powered by Ray.

Its data storage model allows you to define and manage fast, scalable, ACID-compliant data catalogs through git-like stage/commit APIs, and has been used to successfully host exabyte-scale enterprise data lakes.

DeltaCAT uses the Ray distributed compute framework together with Apache Arrow for common table management tasks, including petabyte-scale change-data-capture, data consistency checks, and table repair.

Getting Started


Install

pip install deltacat

About

A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages