Skip to content
#

pydata

Here are 89 public repositories matching this topic...

gjoseph92
gjoseph92 commented Jan 27, 2022

Array.reshape "only allows for reshapings that collapse or merge dimensions" (xref dask/dask#2561). However, when you try to do one of these unsupported reshapings, the error message Shapes not compatible does not make it at all clear that what you're asking for just isn't supported by dask. Instead, it sounds as though your inputs are invalid.

A more descriptive e

bdice
bdice commented Feb 3, 2022

Is your feature request related to a problem? Please describe.
While reviewing PR #9817 to introduce DataFrame.diff, I noticed that it is restricted to acting on numeric types.

A time-series diff is probably a very common user need, if provided a series of timestamps and seeking the durations between observations.

Pandas supports diffs on non-numeric types like timestamps:

fjetter
fjetter commented Oct 20, 2021

As a dask maintainer, I want to trust the code coverage report.

Our coverage badge is a bit misleading showing coverage below 90%. This is due to us not collecting coverage in a few places. Also, we simply have a few modules which are only there for debugging and/or historical reasons

The most relevant parts (scheduler, worker, etc.) do have quite good coverage. I believe the <90% batch does

thatlittleboy
thatlittleboy commented Jan 2, 2022

Background

This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.

Criteria reiterated here for the benefit of discussion:

It sh
yhuang85
yhuang85 commented Aug 26, 2021

Description

There are several directives that are not supported in this theme (at least, they do not have an effect in the built docs), but that are a part of the rST / Sphinx spec. We should add support for these directives. Here are a few known ones:

  • highlights
  • pull-quotes
  • epigraphs

Implementation

The way to accomplish this would be to:

  1. See wha
NeroCorleone
NeroCorleone commented Aug 11, 2020

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data
randyzwitch
randyzwitch commented Mar 28, 2019

In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.

Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.

We should consider m

eric-czech
eric-czech commented Jun 15, 2021

For association testing and PCA (at least), it may be useful to have a function that imputes dosages/allele counts. With floating point values (i.e. from bgen), this can be very simple as a user, e.g. ds.call_genotype_probability.fillna(ds.call_genotype_probability.mean(dim="samples")). With alternate allele counts having a sentinel integer, it is a little more complicated. The best way t

Improve this page

Add a description, image, and links to the pydata topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pydata topic, visit your repo's landing page and select "manage topics."

Learn more