data

We need to support writing to and reading from TFRecord format.

Reference doc: https://www.tensorflow.org/tutorials/load_data/tfrecord

More on TypeTransfomers can be found here.

Related PR that adds PyTorch tensor and module as Flyte types: flyteorg/flytekit#1032

Right now the tutorial is coherently designed, tested, and even documented. However, it doesn't build up in a way that's very beginner friendly. It establishes glom's value and then immediately uses it at an intermediate level.

I'd like it if it was a bit more drawn out to use basic features first and then add a multi-line Coalesce as the

References

https://cgg.mff.cuni.cz/publications/an-openexr-layout-for-spectral-images/

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

Describe the bug

Columns move order in the export formatting section. That is obviously not intended. Given that it's one of our main pro features, we should def fix it up!

To Reproduce

Go pro.
Go to Excel export. Add a few columns to the formatting.
Change the columns selected using their selects. Watch them switch order.

Expected behavior

They shouldn't ch

This is probably a bit of an arduous task but I think a quite valuable one. Fangraphs has a collection of the most valuable and in-depth stats and having this sort of granularity would be invaluable. Would even help out with some of the annoying stuff if some of the other contributors are on board.

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

Describe the bug
We have validation methods that return objects containing Optional collections of issues and things of that nature. In particular, the ModelBuildResult has an optional "issues" tuple which gets populated with a tuple of validation errors whenever validation runs. Making this optional leads to callsite shenanigans like this:

errors = ModelValidator.validate_model.issue

data

Here are 2,119 public repositories matching this topic...

pwxcoo / chinese-xinhua

akfamily / akshare

airbnb / knowledge-repo

lk-geimfari / mimesis

ckan / ckan

tensorflow / datasets

flyteorg / flyte

pydata / pandas-datareader

Belval / TextRecognitionDataGenerator

emirozer / fake2db

EntilZha / PyFunctional

mara / mara-pipelines

justinzm / gopup

kayak / pypika

nerevu / riko

mahmoud / glom

colour-science / colour

References

DataBrewery / cubes

sepandhaghighi / pycm

diffgram / diffgram

pyjanitor-devs / pyjanitor

mito-ds / monorepo

turicas / brasil.io

turicas / rows

san089 / Udacity-Data-Engineering-Projects

jldbc / pybaseball

pdpipe / pdpipe

Example Code

addisonlynch / iexfinance

transform-data / metricflow

Squarespace / datasheets

Improve this page

Add this topic to your repo