Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upVocabulary min_count should error when using a nonexistent namespace #2292
Comments
|
We agree it'd be great to have a fix here, since a number of us have hit it, but we're not sure what the right solution is presently. |
|
When iterating through token counts in the loop at line 553 (https://github.com/allenai/allennlp/blob/master/allennlp/data/vocabulary.py#L553), the An alternate method is iterating through all the keys in If you think either solution is suitable, I could implement it. |
|
It seems to me like the second solution is correct - (I think) we want to fall back to a default of 1 for namespaces that aren't mentioned, so the first solution doesn't work. @nelson-liu, what do you think? |
|
@nelson-liu @matt-gardner If this issue is still pending, I would like to take this up. If so, should I go forward with the second approach to solve it? |
|
Yes, @nadgeri14, that would be great. The line numbers referenced above have changed, so the place to do it is probably right above this line, at the top of allennlp/allennlp/data/vocabulary.py Line 436 in c9bd9b2 |
|
@matt-gardner Thanks, that was really helpful. I need a small help: Could you please provide a test case, as it will help me to validate my solution before submitting the PR. Thanks in advance for the help. |
|
You just need a |
If you use the
min_countparameter of theVocabulary, but you specify a namespace that does not exist, the vocabulary creation will just silently proceed. It'd be great if it could error in this case, perhaps by popping off the namespaces inmin_countand erroring if any are left at the end of vocab creation (would probably go at the end of https://github.com/allenai/allennlp/blob/master/allennlp/data/vocabulary.py#L454)?