re_data - fix data issues before your users & CEO would discover them
-
Updated
Mar 17, 2023 - HTML
re_data - fix data issues before your users & CEO would discover them
Data reliability tools for SQL- and Spark-accessible data
Code review for data in dbt
Great Expectations Airflow operator
re_data - fix data issues before your users & CEO would discover them
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Various files useful for manual testing and test automation etc.
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Simple DB Fixtures for Sails.js v1 (fake data for testing).
data_check is a simple data validation tool
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
Dynamic data testing engine based on pySpark
Chrome extensions for developers and testers who want to easily see data attributes for testing directly on the page.
National Grid ( Python, SQL Server, SSIS, SSRS, Tableau, Power BI, SQL Server Import Export Wizard, Data Validations, Data Integrations, Data Conversions )
This project creates machine learning models capable of classifying candidate exoplanets from the raw dataset from NASA Kepler Space Telescope
I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know for data transformation and testing, and allows me to solidify concepts of testing in data ops.
Add a description, image, and links to the data-testing topic page so that developers can more easily learn about it.
To associate your repository with the data-testing topic, visit your repo's landing page and select "manage topics."