Skip to content
#

dataframe

Here are 323 public repositories matching this topic...

argenisleon
argenisleon commented Feb 22, 2020

Hi,
I am trying to load a CSV with no header using

df = vaex.open('data/star0000-1.csv',sep=",", header=None, error_bad_lines=False)

but I get

could not convert column 0, error: TypeError('getattr(): attribute name must be string'), will try to convert it to string
Giving up column 0, error: TypeError('getattr(): attribute name must be string')
could not convert column 
brainbytes42
brainbytes42 commented Apr 14, 2020

Hi again,
a second issue I ran into is related to the userguide:
The example for Grouping on calculated columns regarding binning doesn't compile in v0.37.3 and apart from that doesn't lead to a reasonable result, as far as I see.

  • Compilation isn't possible, as bin() returns a DoubleColumn (even if c
andrewjw1995
andrewjw1995 commented May 16, 2018

The documentation file appears to have been generated with no space between the hashes and the header text. This is causing the headers to not display correctly, and is difficult to read. See below for an example of with and without the space:

##

Mobius API Documentation


###Microsoft.Spark.CSharp.Core.Accumulator</

pdpipe
Devligue
Devligue commented May 4, 2020

Hi, would it be possible to make the user warnings display only when using pipes that actually depend on these imports? Or at least display them in a way that allows filtering out (with logging package perhaps)?

It's just a minor flaw on otherwise great package. Awesome work!

ericmjl
ericmjl commented Mar 12, 2020

janitor.biology could do with a to_fasta function, I think. The intent here would be to conveniently export a dataframe of sequences as a FASTA file, using one column as the fasta header.

strawman implementation below:

import pandas_flavor as pf
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import SeqIO

@pf.register_dataframe_method
def to_fasta(d
pystore
yohplala
yohplala commented Jan 6, 2020

Hello,

I haven't tested append() yet, and I was wondering if duplicates are removed when an append is managed.
I had a look in collection.py script and following pandas function are used:
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")

After a look into pandas documentation, I understand that duplicate lines are removed, only the last occurence is kept.

inspectdf
RoelVerbelen
RoelVerbelen commented Apr 1, 2020

To improve spotting differences between datasets visually
(especially when there are many columns) it would be helpful if one could sort the categorical columns by the Jensen–Shannon divergence.

The code below tries to do so but it seems to distort the labels on the y-axis. Also, in case the jsd column contains missing values, those variables are deleted from the graph.

library(in

Improve this page

Add a description, image, and links to the dataframe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataframe topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.