gpt
Here are 104 public repositories matching this topic...
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
-
Updated
Mar 10, 2021 - Python
I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.
I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:
INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)
-
Updated
Feb 27, 2021 - Python
-
Updated
Mar 9, 2021 - Rust
-
Updated
Mar 7, 2021 - Python
-
Updated
Apr 22, 2019
-
Updated
Mar 5, 2021 - Jupyter Notebook
-
Updated
Feb 8, 2021 - Python
-
Updated
Mar 8, 2021 - JavaScript
-
Updated
Dec 10, 2020 - Jupyter Notebook
-
Updated
Mar 2, 2021 - Python
-
Updated
Sep 18, 2019 - JavaScript
-
Updated
Oct 17, 2019 - C
-
Updated
Nov 5, 2020 - Jupyter Notebook
-
Updated
Mar 8, 2021 - JavaScript
-
Updated
Dec 23, 2020
-
Updated
Jan 29, 2021 - Python
Improve this page
Add a description, image, and links to the gpt topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gpt topic, visit your repo's landing page and select "manage topics."
Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.
Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokeni