While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p
Java API for Natural Language Generation. Originally developed by Ehud Reiter at the University of Aberdeen’s Department of Computing Science and co-founder of Arria NLG. This git repo is the official SimpleNLG version.
Accelerated Text is a no-code natural language generation platform. It will help you construct document plans which define how your data is converted to textual descriptions varying in wording and structure.
Pen.el stands for Prompt Engineering in emacs. It facilitates the creation, discovery and usage of prompts to language models. Pen supports OpenAI, EleutherAI, Aleph-Alpha, HuggingFace and others. It's the engine for the LookingGlass imaginary web browser.
This repository have scripts and Jupyter-notebooks to perform all the different steps involved in Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer
Description
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p