Skip to content
#

pydata

Here are 90 public repositories matching this topic...

orf
orf commented Jan 25, 2022

We're trying to introduce Parquet into our team, and the largest blocker that we've seen is the dreaded "schemas are inconsistent" error message:

RuntimeError: Schemas are inconsistent, try using to_parquet(..., schema="infer"), or pass an explicit pyarrow schema. Such as to_parquet(..., schema={"column1": pa.string()})

This error message is super unhelpful: surely Dask knows what th

good first issue dataframe parquet
vyasr
vyasr commented Apr 21, 2022

Is your feature request related to a problem? Please describe.
Our Python docstrings have various style violations when compared against standards like pep257. Not only does this impact readability (which may be subjective), it also reduces the effectiveness of tools like Sphinx or numpydoc that rely on specific formatting in order to parse docstrings.

feature request 0 - Backlog doc good first issue
fjetter
fjetter commented Apr 20, 2022

tornado.IOLoop.run_sync is deprecated and must be removed from our code base.

The CLI scripts are all calling this and a replacement with asyncio.run should be possible

Caveats

  • The way we handle signals needs to be adjusted
  • Once asyncio.run finishes we need to ensure the tornado loop is also closed
  • behaviour of preload modules may be affected if they are using loops about whe
good first issue
thatlittleboy
thatlittleboy commented Jan 2, 2022

Background

This thread is borne out of the discussion from #968 , in an effort to make documentation more beginner-friendly & more understandable.
One of the subtasks mentioned in that thread was to go through the function docstrings and include a minimal working example to each of the public functions in pyjanitor.

Criteria reiterated here for the benefit of discussion:

It sh
good first issue
yhuang85
yhuang85 commented Aug 26, 2021

Description

There are several directives that are not supported in this theme (at least, they do not have an effect in the built docs), but that are a part of the rST / Sphinx spec. We should add support for these directives. Here are a few known ones:

  • highlights
  • pull-quotes
  • epigraphs

Implementation

The way to accomplish this would be to:

  1. See wha
enhancement good first issue CSS improvement
NeroCorleone
NeroCorleone commented Aug 11, 2020

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data
good first issue usability
randyzwitch
randyzwitch commented Mar 28, 2019

In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.

Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.

We should consider m

eric-czech
eric-czech commented Jun 15, 2021

For association testing and PCA (at least), it may be useful to have a function that imputes dosages/allele counts. With floating point values (i.e. from bgen), this can be very simple as a user, e.g. ds.call_genotype_probability.fillna(ds.call_genotype_probability.mean(dim="samples")). With alternate allele counts having a sentinel integer, it is a little more complicated. The best way t

good first issue help wanted core operations
eriknw
eriknw commented Apr 15, 2022

As we write and update more docstrings, I think it would be helpful to specify what is expected and to do some checks in CI (and git pre-commit).

Like other libraries in the PyData ecosystem, I think we should rely heavily on the NumPy-style docstrings:

We can even use velin to help enforce this and identify common mistakes:

  • htt
documentation good first issue good second issue hygiene

Improve this page

Add a description, image, and links to the pydata topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pydata topic, visit your repo's landing page and select "manage topics."

Learn more