gpt

Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.

Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokeni

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.

I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:

INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)

gpt

Here are 104 public repositories matching this topic...

huggingface / transformers

DeBERTa Fast Tokenizer

[Wav2Vec2] Improve SpecAugment function by converting numpy based function to pytorch based function

[pretrained] model classes aren't checking the arch of the pretrained model it loads

pbatard / rufus

huggingface / tokenizers

Add SplitDelimiterBehavior to Punctuation constructor

dbiir / UER-py

huggingface / transfer-learning-conv-ai

RuntimeError: shape '[-1, 2, 34]' is invalid for input of size 61710

EleutherAI / gpt-neo

systemd / mkosi

thu-coai / CDial-GPT

guillaume-be / rust-bert

bradfitz / embiggen-disk

MorvanZhou / NLP-Tutorials

ValdikSS / Super-UEFIinSecureBoot-Disk

Novetta / adaptnlp

akanyaani / gpt-2-tensorflow2.0

lonePatient / awesome-pretrained-chinese-nlp-models

teddykoker / image-gpt

jaanauati / react-dfp

will-thompson-k / deeplearning-nlp-models

IBM / TabFormer

jhermsmeier / node-disk

pampanic / pam_panic

ethanmad / chromeos-resize

luni64 / TeensyTimerTool

JRC1995 / Chatbot

itoffshore / alpine-linux-scripts

eBayClassifiedsGroup / react-advertising

Mexit / MultiOS-USB

manilarome / A-Personal-Arch-Installation-Guide

amazon-research / transformers-data-augmentation

thewhiteninja / ntfstool

Improve this page

Add this topic to your repo