gpt

🚀 Feature request

Add better error message to HubertForCTC, Wav2Vec2ForCTC if labels are bigger than vocab size.

Motivation

Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are > self.config.vocab_size or else silent errors can sneak into the training script.

So w

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.

I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:

INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)

gpt

Here are 126 public repositories matching this topic...

huggingface / transformers

Add error message to Wav2Vec2 & Hubert if labels > vocab_size

🚀 Feature request

Motivation

[Performance] Tracking open Issues and PRs (pytorch transformers)

Getting time offsets of beginning and end of each word in Wav2Vec2

pbatard / rufus

EleutherAI / gpt-neo

huggingface / tokenizers

Add SplitDelimiterBehavior to Punctuation constructor

dbiir / UER-py

huggingface / transfer-learning-conv-ai

RuntimeError: shape '[-1, 2, 34]' is invalid for input of size 61710

bytedance / lightseq

thu-coai / CDial-GPT

guillaume-be / rust-bert

systemd / mkosi

MorvanZhou / NLP-Tutorials

bradfitz / embiggen-disk

limine-bootloader / limine

lonePatient / awesome-pretrained-chinese-nlp-models

NVIDIA / FasterTransformer

ValdikSS / Super-UEFIinSecureBoot-Disk

Novetta / adaptnlp

akanyaani / gpt-2-tensorflow2.0

teddykoker / image-gpt

jaanauati / react-dfp

geekjr / quickai

IBM / TabFormer

shreyansh26 / Annotated-ML-Papers

will-thompson-k / deeplearning-nlp-models

Mexit / MultiOS-USB

jhermsmeier / node-disk

luni64 / TeensyTimerTool

ethanmad / chromeos-resize

pampanic / pam_panic

itoffshore / alpine-linux-scripts

Improve this page

Add this topic to your repo