Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort: -S/--buffer-size does not support memory percentages % #3500

Open
zacchiro opened this issue May 6, 2022 · 2 comments
Open

sort: -S/--buffer-size does not support memory percentages % #3500

zacchiro opened this issue May 6, 2022 · 2 comments
Labels

Comments

@zacchiro
Copy link

@zacchiro zacchiro commented May 6, 2022

GNU sort v. Rust sort:

$ /usr/bin/sort --buffer-size=50% < /dev/null
$ target/release/coreutils sort --buffer-size=50% < /dev/null
sort: invalid --buffer-size argument '50%'

Relevant excerpts from GNU sort manpage:

       -S, --buffer-size=SIZE
              use SIZE for main memory buffer

[...]

       SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

(this could be a good first issue)

@sylvestre sylvestre added the good first issue label May 6, 2022
@miDeb
Copy link
Member

@miDeb miDeb commented May 6, 2022

There is an even bigger problem here, which is that sort precomputes data for each line that will be sorted (this makes comparing them faster). Due to that the exact amount of memory usage depends on the number of lines in a chunk (shorter lines -> more lines in a chunk -> more memory usage).
Right now, when you specify a buffer size, the actual memory usage will be less or more than that. I think we should first solve that problem.

@zacchiro
Copy link
Author

@zacchiro zacchiro commented May 7, 2022

@miDeb I wasn't aware of that issue, but it looks to me the two can be fixed independently from one another.

jack-t added a commit to jack-t/coreutils that referenced this issue May 7, 2022
Addresses issue uutils#3500, which calls for sort to permit percentage values
for the buffer-size flag. Portability is an issue here, but I don't
think this change constrains `sort`'s portability any more than its
dependencies already do.

To calculate a fraction of the system's memory, naturally we need to
retrieve the system's total memory. GNU's equivalent -- see
`physmem_total` in `lib/physmem.c` in gnulib -- has a series of `ifdef`s
to decide how to retrive the value. Each platform has its own
considerations, and the result is a pretty serious mess.

This patch omits that mess. In exchange, it will only work on platforms
with a functioning libc. The thing is, `sort` already includes libc,
which is why I said this patch doesn't introduce new constraints on
`sort`'s portability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants