Skip to content

ARROW-9403: [Python] add Array.tolist as alias of .to_pylist#7701

Closed
maartenbreddels wants to merge 1 commit intoapache:masterfrom
maartenbreddels:ARROW-9403
Closed

ARROW-9403: [Python] add Array.tolist as alias of .to_pylist#7701
maartenbreddels wants to merge 1 commit intoapache:masterfrom
maartenbreddels:ARROW-9403

Conversation

@maartenbreddels
Copy link
Contributor

No description provided.

@github-actions
Copy link

@wesm
Copy link
Member

wesm commented Jul 10, 2020

Since interchangeability with numpy.ndarray is not a goal of pyarrow.Array, I am curious where you think the line should be drawn with respect to compatibility functions. I'm concerned it could be a slippery slope.

@maartenbreddels
Copy link
Contributor Author

I'm mainly looking at my own code, especially the unittests. They are littered with .tolist, similar to how this project has a lot of .to_pylist. For very basic operations, you'd expect similar outputs, so that would mean many tests (and code) can stay the same. One could argue the same for .item() and .to_py for scalars, but comparing scalars is less common, and some effort to support two libraries is ok. I'm happy to see that both libraries have a .take method, but only Arrow has .filter. I'm ok with that, again, it's not that common.
Overloading pyarrow's getitem to act as .filter would make them compatible, but putting on my purist hat, I'd rather vote for numpy to adopt a .filter instead.

I'd refrain from ambiguity, so no overloading, an alias like this is... not pretty, but I doubt it will affect readability or maintainability. I could change the docstring not to stimulate more of these ideas :).

@wesm
Copy link
Member

wesm commented Jul 10, 2020

From a forward looking perspective, I don't expect new analytics libraries in the future to be implemented with a hybrid of Arrow and NumPy, they'll just use Arrow, except when they need to export to a NumPy (or NumPy-like tensor library, more likely), so I think we should try not to accumulate "NumPy cruft" that will just look anachronistic in 5 years.

@kszucs or @jorisvandenbossche what do you think about the addition of tolist?

@kszucs
Copy link
Member

kszucs commented Jul 10, 2020

I like tolist in general because we're on python land and it's shorter, but pylist aligns better with scalar.as_py() and prevents any confusion with a list array.

In summary I don't have a strong opinion on this, but don't like the idea of having aliases.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I will merge this with the caveat that it could be deprecated in the future. I think adding NumPy-compatible APIs creates the wrong expectations and so I'll likely push back on future efforts to do more NumPy duck-typing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants