dataframe
Here are 323 public repositories matching this topic...
-
Updated
Jul 1, 2020 - Java
Hi,
I am trying to load a CSV with no header using
df = vaex.open('data/star0000-1.csv',sep=",", header=None, error_bad_lines=False)
but I get
could not convert column 0, error: TypeError('getattr(): attribute name must be string'), will try to convert it to string
Giving up column 0, error: TypeError('getattr(): attribute name must be string')
could not convert column
Hi again,
a second issue I ran into is related to the userguide:
The example for Grouping on calculated columns regarding binning doesn't compile in v0.37.3 and apart from that doesn't lead to a reasonable result, as far as I see.
- Compilation isn't possible, as bin() returns a DoubleColumn (even if c
Series.reindex
Implement Series.reindex.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html
Support error function and fresnel integrals in https://docs.scipy.org/doc/scipy/reference/special.html#error-function-and-fresnel-integrals, those are not universal functions may not need to be supported.
The documentation file appears to have been generated with no space between the hashes and the header text. This is causing the headers to not display correctly, and is difficult to read. See below for an example of with and without the space:
##
Mobius API Documentation
###Microsoft.Spark.CSharp.Core.Accumulator</
-
Updated
Jan 6, 2019 - Python
-
Updated
Jun 19, 2020 - C++
Hi, would it be possible to make the user warnings display only when using pipes that actually depend on these imports? Or at least display them in a way that allows filtering out (with logging package perhaps)?
It's just a minor flaw on otherwise great package. Awesome work!
janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.
strawman implementation below:
import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO
@pf.register_dataframe_method
def to_fasta(dAny plans to get this into DefinitelyTyped?
Originally posted by @danielgwilson in Gmousse/dataframe-js#43 (comment)
improve csv import
*fix column header issues in preview
*handle arbitrary whitespace
-
Updated
May 6, 2020 - Python
-
Updated
May 27, 2019 - Python
-
Updated
Jun 28, 2020 - Go
Hello,
I haven't tested append() yet, and I was wondering if duplicates are removed when an append is managed.
I had a look in collection.py script and following pandas function are used:
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")
After a look into pandas documentation, I understand that duplicate lines are removed, only the last occurence is kept.
-
Updated
Dec 17, 2019 - Go
-
Updated
Jun 26, 2020 - Python
-
Updated
Jun 6, 2020 - Rust
Update docs
In order to update https://bluenote10.github.io/NimData/nimdata.html I tried running build_docs.sh, but ran into the following Nim doc gen issues:
The following command is somewhat working, besides the missing dochack.js and with the `git.c
-
Updated
Nov 26, 2018 - Java
To improve spotting differences between datasets visually
(especially when there are many columns) it would be helpful if one could sort the categorical columns by the Jensen–Shannon divergence.
The code below tries to do so but it seems to distort the labels on the y-axis. Also, in case the jsd column contains missing values, those variables are deleted from the graph.
library(inImprove this page
Add a description, image, and links to the dataframe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataframe topic, visit your repo's landing page and select "manage topics."
Describe the problem
We should test on larger datasets that are commonly used in