New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envmap #10796
Envmap #10796
Conversation
32c67a6
to
88066f2
Compare
lib/std/process.zig
Outdated
| if (upcase(c_a) != upcase(c_b)) | ||
| return false; | ||
| } | ||
| if (it_b.nextCodepoint()) |_| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (it_b.nextCodepoint()) |_| return false; | |
| if (it_b.nextCodepoint()) |_| return false; | |
| return true; |
Without the return true here, it falls down to eqlString which is case sensitive.
|
Did some benchmarking comparing the implemention in this PR vs #10650 (some context: #10650 (comment)) This PR is in blue, #10650 is in orange: Benchmark code is here: https://gist.github.com/squeek502/13fdaa4255c6c5e926ceb00a728e1ee1 Some notes:
EDIT: Here's a zip containing the |
|
These benchmarks are good data. I feel I should also share some of the things I think matter when it comes to deciding on a design like this.
In the case of environment variables, I would actually put runtime perfomance lower in the list of importance than with other types of modules because I don't think managing environment variables would usually be in the critical path of performance. Still important, but not the only criteria to consider. When I was younger I use to always put performance at the top of my list, but now I realize how complex performance is and now I tend to start with the "simplest" design and only optimize for performance when the simplest design isn't fast enough. This not only results in less work, it also ends up having faster code overall because once you have a working design, you don't need to guess what will be faster, you can actually benchmark the application that isn't performing fast enough. This focuses your efforts on optimizing code that will actually make your application faster instead of trying to shoot fish in a barrel. All this being said, I find the benchmarks interesting and we can use that data to help guide us if we find that environment variables management performance becomes an issue. Adding a mechanism to cache string conversion like you've done trades higher memory usage and higher complexity for less calculations, which can increase performance in certain situations. But to reiterate, I think it would be best to leave a performance optimization like that out until we see that it has a noticeable benefit, and not at the cost of making other usages slower if those usages are just as common. |
|
Unifying directory path traversal (without opening directories for perf) on Linux and Windows has the same problem, so it looks to me like the performance optimization could go into a lib for doing string stuff on paths (without the syscall overhead). |
|
@matu3ba could you be more specific? I'm unsure what performance optimization you're talking about or how it would apply to path traversal. |
|
@squeek502 My intention was to provide a faster iterator that works forward and backwards on paths |
|
@matu3ba, sorry still confused. Faster than what? What parts of these PRs could it use to go faster? |
I think the idea is stupid, so never mind. Sorry for noise. |
|
Yeah I think we'll need clarification from @andrewrk on what he meant by 8bf7cff:
This PR uses |
My assumption is that he was assuming it would require putting Unicode data in the standard library. |
Yeah that could definitely be it. Out of curiosity, does anyone know how Windows implements |
My guess is |
|
I was unaware that NtDll had functions for environment variables when I made that comment. This changes my position. What I do not want to do is have the Zig std lib depend on shipping with Unicode.txt or similar data. However, if NtDll provides us with an API that allows us to match the Windows-native method of accessing environment variables, then I am in favor of using that API. This really has nothing to do with proper Unicode support; really what's happening is that Windows has some set of weird rules about what environment variables do or don't match via string equality, and we just have to match those rules by calling the appropriate NtDll functions. For example, the In conclusion, I'm in favor of this change, assuming that it matches the equality checking properties of RtlEqualUnicodeString.
|
Unsure if this is what you're looking for, but for ReactOS:
This PR uses |
EnvMap provides the same API as the previously used BufMap (besides `putMove` and `getPtr`), so usage sites of `getEnvMap` can usually remain unchanged. For non-Windows, EnvMap is a wrapper around BufMap. On Windows, it uses a new EnvMapWindows to handle some Windows-specific behavior: - Lookups use Unicode-aware case insensitivity (but `get` cannot return an error because EnvMapWindows has an internal buffer to use for lookup conversions) - Canonical names are returned when iterating the EnvMap Fixes ziglang#10561, closes ziglang#4603
…e applicable # Conflicts: # lib/std/build/RunStep.zig
Fixes a process.EnvMap compile error on 32-bit architectures
`key` was never being given a value, caused by b83cea1
Now that BufMap.BufMapHashMap is pub, we can just get Size directly
Co-authored-by: Veikka Tuominen <git@vexu.eu>
|
@squeek502 I am satisfied with this. Thank you for digging! |
I've rebased and repushed, looks like the Azure CI failure is no more. Probably an intermittent failure from long ago. |





This is my take on fixing #4603 which is an alternative to #10650
This PR includes commits from that PR and modifies it to a different design. In this PR, instead of having
EnvMapstore environment variables in both their original casing and a canonicalized casing, this one keeps the original casing and modifies the hash map to ignore case.