pydata

The Dask documentation references a to_numeric method: https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.astype

See Also:
to_numeric
    Convert argument to a numeric type.

I can't seem to find where that code exists. Is to_numeric implemented in Dask?

Implement Series.reindex.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html

IEX has a free plan which offers 500.000 messages per month, the next cheapest paid plan gets you 5.000.000 per month.

Their API offers a way to only retrieve adjusted close for historical data, which will return a df with date, close, and volume only. open, low and high are dropped and not transmitted so you won't be billed for it. This will save you 50% messages if you don't need those values

When workers die or halted it can be useful to see when that worker was last seen by the scheduler. We should bubble this information up to the dashboard

client.scheduler_info()

'tcp://172.17.0.2:43161': {'type': 'Worker',
   'id': 'tcp://172.17.0.2:43161',
   'host': '172.17.0.2',
   'resources': {},
   'local_directory': '/notebooks/dask-worker-space/worker-k540965f',
   'name':

It would be nice to add a tutorial(s) that reproduces the Matrix Profile Top Ten paper. The accompanying data at their Google sites page can be found here.

It might be best to make the individual top ten sections as separate items (i.e., sub-list) that rolls under one tutorial

UPDATE FROM MAINTAINERS: ANYBODY WHO IS INTERESTED IN THIS ISSUE, PLEASE SEE THIS COMMENT FOR PROPOSED CODE.

Brief Description

clean_names method does not work when an integer is used as column name

Minimally Reproducible Code

rankings = {
    "countries_to_play_cricket": [
        "Indi

Problem description

The distributed scheduler usually relies on knowledge about the size of the computation result and based on this makes certain scheduling decisions (e.g. work stealing). Our main data class, the MetaPartition should implement a __sizeof__ which performs a deep size calculation (including data frames, indices, etc.) too give the scheduler the best chance on making the

In trying to write tests for #189, I'm finding very difficult to add columns to existing tests, as in some cases like the all_types table, the table is defined in a separate file than the tests and multiple tests try to write to the same table.

Additionally, our test suite doesn't prove that the data that are uploaded are the same as the data downloaded for all types.

We should consider m

Consider looking into pandas.applymap() (may or may not be able to use it in the presence of novel levels and missing values) and also for a data_algebra version of .transform().

pydata

Here are 71 public repositories matching this topic...

dask / dask

databricks / koalas

pydata / pandas-datareader

dask / distributed

TDAmeritrade / stumpy

ericmjl / pyjanitor

Brief Description

Minimally Reproducible Code

DataTau / datascience-anthology-pydata

rasbt / pydata-chicago2016-ml-tutorial

JasonKessler / Scattertext-PyData

JDASoftwareGroup / kartothek

Problem description

omnisci / pymapd

mattilyra / pydataberlin-2017

WinVector / pyvtreat

martinapugliese / tales-science-data

gcampanella / pydata-london-2018

pydataberlin / meetup-slides

josephofiowa / pydata-dc-2018

PyDataKR / pydata.kr

cytora / clickbait-workshop

bweigel / ml_at_awslambda_pydatabln2018

yinleon / pydata2017

sachin-kmr / Neural-Image-Captioning

Shinichi-Nakagawa / scrapy-sample-baseball

AlexIoannides / lime-interpretable-ml

quasiben / kubernetes-pydata-parallel

GapData / PyDataBratislava

dimgold / pycon_social_networkx

TwentyBN / 20bn-video-data-loading-talk

koaning / kadro

PyDataCyprus / meetups

Improve this page

Add this topic to your repo