Skip to content
#

arrow

Here are 248 public repositories matching this topic...

nvdbaranec
nvdbaranec commented Dec 7, 2021

Based on @karthikeyann's work on this PR rapidsai/cudf#9767 I'm wondering if it makes sense to consider removing the defaults for the stream parameters in various detail functions. It is pretty surprising how often these are getting missed.

The most common case seems to be in factory functions and various ::create functions. Maybe just do it for those?

tfeda
tfeda commented Jan 2, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

#1514 points out a secondary goal for the python library to expose the datafusion and arrow-rust versions.
I think Spark has a good implementation of this by binding the spark version to SparkContext in the jvm code, then exposing it to the pyspark API. pyspark's version itself is hard-coded.

NeroCorleone
NeroCorleone commented Aug 11, 2020

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data
Max-Meldrum
Max-Meldrum commented Jan 10, 2022

An Operator that both filters and maps.

Akin to Rust's own FilterMap but on a Stream rather than Iterator.

let strings = ["1", "two", "NaN", "four", "5"];
let mut app = Application::default()
  .iterator(strings, |conf| {
     conf.set_arcon_time(ArconTime::Process);
  })
  .filter_map(|s| s.parse().ok())
  .b

Improve this page

Add a description, image, and links to the arrow topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the arrow topic, visit your repo's landing page and select "manage topics."

Learn more