gpt
Here are 109 public repositories matching this topic...
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
-
Updated
Apr 20, 2021 - Python
-
Updated
Apr 26, 2021 - Python
I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.
I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:
INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)
-
Updated
Apr 26, 2021 - Rust
-
Updated
Mar 12, 2021 - Python
-
Updated
Apr 22, 2019
-
Updated
Apr 21, 2021 - Jupyter Notebook
-
Updated
Feb 8, 2021 - Python
-
Updated
Apr 19, 2021 - JavaScript
-
Updated
Apr 27, 2021 - Python
-
Updated
Dec 10, 2020 - Jupyter Notebook
-
Updated
Sep 18, 2019 - JavaScript
-
Updated
Oct 17, 2019 - C
-
Updated
Nov 5, 2020 - Jupyter Notebook
-
Updated
Apr 23, 2021 - JavaScript
-
Updated
Jan 29, 2021 - Python
-
Updated
Dec 23, 2020
Improve this page
Add a description, image, and links to the gpt topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gpt topic, visit your repo's landing page and select "manage topics."
Hello I was thinking it would be of great help if I can get the time offsets of start and end of each word .
Motivation
I was going through Google Speech to text documentation and found this feature and thought will be really amazing if i can have something similar here.