Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 50 million developers.
Sign up
Pinned
1,149 contributions in the last year
Contribution activity
July 2020
- thomwolf/nlp Python
Created a pull request in huggingface/transformers that received 2 comments
[pipelines] Update fill mask pipeline to remove special tokens in the output
Small fix to remove the special tokens from the output of the fill mask pipeline.
+4
−2
•
2
comments
- [AutoModels] Fix config params handling of all PT and TF AutoModels
- Fix Trainer in DataParallel setting
- More explicit error when failing to tensorize overflowing tokens
- Fix #5507
- Various tokenizers fixes
- GPT2 tokenizer should not output token type IDs
- The `add_space_before_punct_symbol` is only for TransfoXL
- Gradient checkpointing BERT & ALBERT poc
- Exposing prepare_for_model for both slow & fast tokenizers
- Change model outputs types to self-document outputs
Created an issue in huggingface/nlp that received 3 comments
[Dataset requests] New datasets for Text Classification
We are missing a few datasets for Text Classification which is an important field. Namely, it would be really nice to add: TREC-6 dataset (see her…
3
comments
- Conversion through to_pandas output numpy arrays for lists instead of python objects
- [dataset] Structure of MLQA seems unecessary nested
- to_pandas conversion doesn't always work and output numpy arrays instead of lists
- Features should be updated when `map()` changes schema
- [Dataset requests] New datasets for Open Question Answering