Skip to content
#

gpt-2

Here are 122 public repositories matching this topic...

transfer-learning-conv-ai
jb33k
jb33k commented Jun 4, 2019

I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.

I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:

INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)
lamthuy
lamthuy commented Apr 1, 2020

Hi,
When we try to tokenize the following sentence:

If we use spacy

a = spacy.load('en_core_web_lg')

doc = a("I like the link http://www.idph.iowa.gov/ohds/oral-health-center/coordinator")

list(doc)

We got

[I, like, the, link, http://www.idph.iowa.gov, /, ohds, /, oral, -, health, -, center, /, coordinator]

But if we use the Spacy transformer tokenizer:

Improve this page

Add a description, image, and links to the gpt-2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpt-2 topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.