gpt
Here are 126 public repositories matching this topic...
-
Updated
Jun 26, 2021 - Python
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
-
Updated
Jul 7, 2021 - Python
I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.
I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:
INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)
-
Updated
Jul 7, 2021 - Cuda
-
Updated
Jul 7, 2021 - Rust
-
Updated
Jun 5, 2021 - Python
-
Updated
Jul 7, 2021 - C++
-
Updated
Apr 22, 2019
-
Updated
Jul 7, 2021 - Jupyter Notebook
-
Updated
Jun 9, 2021 - Python
-
Updated
Jul 4, 2021 - JavaScript
-
Updated
Jun 22, 2021 - Python
-
Updated
Jun 23, 2021 - Python
-
Updated
Jun 19, 2021
-
Updated
Dec 10, 2020 - Jupyter Notebook
-
Updated
Sep 18, 2019 - JavaScript
-
Updated
May 19, 2021 - C
Improve this page
Add a description, image, and links to the gpt topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gpt topic, visit your repo's landing page and select "manage topics."
Add better error message to
HubertForCTC,Wav2Vec2ForCTCif labels are bigger than vocab size.Motivation
Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are >
self.config.vocab_sizeor else silent errors can sneak into the training script.So w