-
Updated
May 27, 2022 - Python
#
data-profiling
Here are 51 public repositories matching this topic...
Create HTML profiling reports from pandas DataFrame objects
python
data-science
machine-learning
statistics
deep-learning
jupyter
pandas-dataframe
exploratory-data-analysis
jupyter-notebook
eda
pandas
exploration
data-analysis
html-report
data-exploration
hacktoberfest
pandas-profiling
data-quality
data-profiling
big-data-analytics
enhancement
help wanted
Issues we'd love to see community contributions for. Join #contributors-contributing in our Slack!
good first issue
Good issues for new contributors. Join #contributors-contributing in our Slack for help!
core-team
pmbrull
commented
Apr 4, 2022
Let's prepare a mixin for interacting with Roles and Policies with the Python client, in case users want to use the API directly.
Do not only have the list, get etc, but also utility methods, such as updating a default role. It should wrap the following logic:
import requests
import json
# Get the ID
data_consumer = requests.get("http://localhost:8585/api/v1/roles/name/DataCoData profiling, testing, and monitoring for SQL accessible data.
python
data-science
airflow
monitoring
metrics
data-engineering
data-analytics
data-quality
data-profiling
data-monitoring
data-quality-monitoring
data-unit-tests
airflow-operators
data-testing
data-pipeline-monitoring
data-observability
data-reliability
data-quality-framework
-
Updated
May 27, 2022 - Python
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
visualization
python
tracking
data-science
machine-learning
ui
deep-learning
jupyter
tensorflow
models
plotly
data-visualization
pytorch
bokeh
matplotlib
data-processing
data-profiling
mlops
-
Updated
May 23, 2022 - Python
A library for managing, validating, summarizing, and visualizing data.
data-science
statistics
spark
plotly
pandas
data-visualization
dataops
data-analysis
matplotlib
dask
data-exploration
pandas-summary
dataframes
data-summary
data-quality-checks
data-quality
data-profiling
mlops
data-quality-monitoring
data-reporting
-
Updated
May 23, 2022 - Python
sbrugman
commented
Oct 27, 2021
Add type hints to as code documentation, IDE hints and possibly catching errors. Add flake8-annotations to pre-commit and enable mypy in CI.
good first issue
Good for newcomers
help wanted
Extra attention is needed
API
Programmable user interface
CI
Continuous integration/Github Actions related
DementevNikita
opened
May 16, 2022
python
gui
gpu
datasets
dask
optimus
data-preparation
data-cleaning
data-profiling
bumblebee
prepare-data
cudf
dask-cudf
-
Updated
May 26, 2022 - Vue
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
-
Updated
May 16, 2022 - Python
The official http://raymon.ai data profiling and logging library.
-
Updated
Feb 21, 2022 - Python
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
data-science
data-mining
exploratory-data-analysis
tabular-data
feature-selection
data-engineering
feature-extraction
data-analytics
knowledge-discovery
data-wrangling
data-preprocessing
feature-engineering
spreadsheets
data-exploration
data-mining-algorithms
data-cleaning
data-profiling
anomaly-detection
data-cleansing
correlations
-
Updated
May 25, 2022 - C++
A Node.js tool to examine the correctness of Open Data Metadata and build custom dataset profiles
-
Updated
Jun 20, 2018 - JavaScript
pandas
data-analysis
data-manipulation
data-quality-checks
data-quality
data-profiling
data-quality-measurement
data-quality-monitoring
streamlit
data-quality-assessment
-
Updated
Jan 14, 2022 - Python
Data cleaning tool.
-
Updated
Apr 20, 2021 - JavaScript
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Full customizable and flexible rules
-
Updated
May 23, 2022 - Python
Open Data Profiling, Quality and Analysis on NYC OpenData dataset with semantic profiling using fuzzy ratio, Levenshtein distance and regex
big-data
pandas
pyspark
levenshtein-distance
hdfs
dask
regular-expressions
fuzzywuzzy
fuzzy-logic
data-profiling
nyc-opendata
modin
nyc-311-dataset
dask-distributed
-
Updated
Nov 10, 2020 - Jupyter Notebook
Simplify usage of the RDS API for TypeScript/JavaScript developers
javascript
metadata
data-science
typescript
data-validation
data-mapping
data-transformation
open-data
rds
data-profiling
data-ingestion
data-dissemination
rich-data-services
metadata-technology-north-america
mtna
rds-js
rds-api
-
Updated
Aug 9, 2020 - TypeScript
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
python
docker
postgres
ecommerce
airflow
csv
sql
pipeline
etl
data-engineering
parquet
elt
data-pipeline
data-quality
data-profiling
great-expectations
-
Updated
Apr 8, 2022 - Python
-
Updated
Mar 1, 2022 - Python
Open-source metadata collector based on ODD Specification
data-catalog
data-discovery
data-platform
lineage
data-profiling
data-governance
datacatalog
data-observability
data-piplines
-
Updated
May 27, 2022 - Python
R package to simplify the usage of the RDS REST API and provide convenience in accessing data and metadata.
metadata
data-science
r
data-validation
data-mapping
data-transformation
open-data
rds
data-profiling
data-ingestion
data-dissemination
rich-data-services
metadata-technology-north-america
mtna
-
Updated
Apr 14, 2022 - R
HPCC Systems ECL bundle that provides some basic data profiling and research tools to an ECL programmer
-
Updated
May 19, 2022 - ECL
TypeScript/JavaScript example code using the RDS API
javascript
metadata
data-science
typescript
data-validation
data-mapping
data-transformation
open-data
rds
data-profiling
data-ingestion
data-dissemination
rich-data-services
metadata-technology-north-america
mtna
rds-api
-
Updated
Aug 9, 2021 - TypeScript
MetricDoc is an interactive visual exploration environment for assessing data quality
data-wrangling
data-quality-checks
visual-analytics
interactive-visualizations
data-quality
data-profiling
quality-metrics
-
Updated
Mar 30, 2020 - JavaScript
Automated exploration of files in a folder structure to extract metadata and potential usage of information.
-
Updated
Apr 19, 2022 - Python
The program compares two files at a time and does the following 1.Gathering metadata on the individual tables(column count,record count,list of columns with datatype etc) 2.Identifying matching columns between tables based on names as well as data. Using machine learning, we are handling syntactic as well as semantic variations of column names for accurate matching. 3. Finding duplicate columns in single table with the option to deduplicate if required 4. Finding columns with missing data/null values.
-
Updated
Feb 17, 2018 - Python
Demo on Data Engineering using Great Expectations API
-
Updated
Aug 25, 2021 - Jupyter Notebook
RimWorld game save data analyzer
data-science
time-series
analytics
xml
plotly
pandas
data-visualization
data-engineering
data-analysis
elt
rimworld
historical-data
data-profiling
rimworld-mod
simulation-data
-
Updated
May 23, 2022 - Python
Improve this page
Add a description, image, and links to the data-profiling topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-profiling topic, visit your repo's landing page and select "manage topics."
Describe the bug
data docs columns shrink to 1 character width with long query
To Reproduce
Steps to reproduce the behavior:
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4