Skip to content
#

data-cleaning

Here are 670 public repositories matching this topic...

sfd99
sfd99 commented Apr 7, 2020

GREAT, Sam!
janitor is wonderful.

btw:
a shortcut to get Total Sums
for BOTH rows AND cols:

mtcars %>%
tabyl(am, cyl) %>%
adorn_totals(c("row", "col"))
am 4 6 8 Total
0 3 4 12 19
1 8 3 2 13
Total 11 7 14 32

So,
(easy) SUGGESTION -
also allow keyword:
"both"
as param to:
adorn_totals("both")
or maybe simply:
adorn_totals()

less coding...easier...

cosmicBboy
cosmicBboy commented Oct 28, 2020

Documentation problem

The current documentation demonstrates pandera usage by using the pa.PandasDtype enum, which can make things look a little unfamiliar to new-comers, especially since it now supports the use of python types and numpy scalar types, for example, see:

janagombitova
janagombitova commented May 13, 2020

Context

Why do we add this issue?

Our goal is to make it easy to visualise data and to make those visualisations of good quality and thus trustworthy. This also means setting limitations to what users can do, so they do not make mistakes.

Problem or idea

What is the cause?

Line charts are similar to scatter plots except that the measurement points are ordered by their x-axis va

Skytrax-Data-Warehouse

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

  • Updated Apr 18, 2020
  • Python

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.