gpt

🚀 Feature request

Hello I was thinking it would be of great help if I can get the time offsets of start and end of each word .

Motivation

I was going through Google Speech to text documentation and found this feature and thought will be really amazing if i can have something similar here.

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.

I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:

INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)

gpt

Here are 109 public repositories matching this topic...

huggingface / transformers

Getting time offsets of beginning and end of each word in Wav2Vec2

🚀 Feature request

Motivation

Enable Wav2Vec2 Pretraining

[docs] [sphinx] need to resolve cross-references for inherited/mixin methods

pbatard / rufus

huggingface / tokenizers

Add SplitDelimiterBehavior to Punctuation constructor

EleutherAI / gpt-neo

dbiir / UER-py

huggingface / transfer-learning-conv-ai

RuntimeError: shape '[-1, 2, 34]' is invalid for input of size 61710

thu-coai / CDial-GPT

systemd / mkosi

guillaume-be / rust-bert

bradfitz / embiggen-disk

MorvanZhou / NLP-Tutorials

limine-bootloader / limine

ValdikSS / Super-UEFIinSecureBoot-Disk

Novetta / adaptnlp

lonePatient / awesome-pretrained-chinese-nlp-models

akanyaani / gpt-2-tensorflow2.0

teddykoker / image-gpt

jaanauati / react-dfp

IBM / TabFormer

will-thompson-k / deeplearning-nlp-models

Mexit / MultiOS-USB

jhermsmeier / node-disk

pampanic / pam_panic

ethanmad / chromeos-resize

luni64 / TeensyTimerTool

itoffshore / alpine-linux-scripts

JRC1995 / Chatbot

eBayClassifiedsGroup / react-advertising

amazon-research / transformers-data-augmentation

manilarome / A-Personal-Arch-Installation-Guide

Improve this page

Add this topic to your repo