natural-language-understanding

🚀 Feature request

Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3

Motivation

DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa

Description

While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file

Error Message

ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.

To Reproduce

from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p

natural-language-understanding

Here are 585 public repositories matching this topic...

huggingface / transformers

DeBERTa V3 Fast Tokenizer

🚀 Feature request

Motivation

Make `CLIPFeatureExtractor` accept batch of images as `torch.Tensor`.

Encapsulate all forward passes of integration tests with "with torch.no_grad()"

google-research / bert

hanxiao / bert-as-service

microsoft / nlp-recipes

huggingface / tokenizers

dmlc / gluon-nlp

[Error Message] Improve error message in SentencepieceTokenizer when arguments are not expected.

Description

Error Message

To Reproduce

Use official MXNet batchify to implement the batchify functions

NMT Inference: Chunk overlength sequences and translate in sequence

opencog / opencog

google / sling

namisan / mt-dnn

explosion / spacy-transformers

thunlp / OpenPrompt

turtlesoupy / this-word-does-not-exist

KartikChugh / Otto

chatopera / insuranceqa-corpus-zh

declare-lab / conv-emotion

microsoft / DeBERTa

MITESHPUTHRANNEU / Speech-Emotion-Analyzer

practical-nlp / practical-nlp-code

huggingface / autonlp

Decalogue / chat

suragnair / seqGAN

Picovoice / rhino

BotLibre / BotLibre

JohnSnowLabs / nlu

soulbliss / NLP-conference-compendium

graphbrain / graphbrain

jayparks / tf-seq2seq

gkiril / oie-resources

chatopera / clause

Droidtown / ArticutAPI

Improve this page

Add this topic to your repo