language-model

Many models have identical implementations of prune_heads it would be nice to store that implementation as a method on PretrainedModel and reduce the redundancy.

The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?

Spacy has customizable word level tokenizers with rules for multiple languages. I think porting that to rust would add nicely to this package. Having a customizable uniform word level tokenization across platforms (client web, server) and languages would be beneficial. Currently, idk any clean way or whether it's even possible to write bindings for spacy cython.

Spacy Tokenizer Code

https:

Hi,

First thanks for releasing this, it has been quite helpful.
Would be great if the README page mentioned in software requirements the dependency on pytorch-qrnn (for QRNN-based models). Currently, following the instructions and running one of the standard QRNN models will just throw a ModuleNotFoundError with no instructions. Would be great if there was a prior mention and/or a try/catch w

On home page of website: https://nlp.johnsnowlabs.com/ I read "Full Python, Scala, and Java support"

Unfortunately it's 3 days now I'm trying to use Spark NLP in Java without any success.

I cannot find Java API (JavaDoc) of the framework.
not event a single example in Java is available
I do not know Scala, I do not know how to convert things like:
val testData = spark.createDataFrame(

It would be great to have instructions on how to train a language model from scratch - not just loading the paper's model.

Hi,
When we try to tokenize the following sentence:

If we use spacy

a = spacy.load('en_core_web_lg')

doc = a("I like the link http://www.idph.iowa.gov/ohds/oral-health-center/coordinator")

list(doc)

We got

[I, like, the, link, http://www.idph.iowa.gov, /, ohds, /, oral, -, health, -, center, /, coordinator]

But if we use the Spacy transformer tokenizer:

I think the filenames in models.sh referred to on lines 4-9 should refer to kaldi-generic-en-tdnn_f-r20190609* which is downloaded on line 3.

File "main.py", line 40, in
tf.app.run()
File "/home/luban/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 30, in main
train(args)
File "/nfs/private/proj/chatbot/lib/train.py", line 32, in train
model = seq2seq_model_utils.create_model(sess, arg

language-model

Here are 517 public repositories matching this topic...

huggingface / transformers

brightmart / nlp_chinese_corpus

codertimo / BERT-pytorch

huggingface / tokenizers

Spacy Tokenizer Code

tensorflow / lingvo

chiphuyen / lazynlp

CyberZHG / keras-bert

salesforce / awd-lstm-lm

zzw922cn / awesome-speech-recognition-speech-synthesis-papers

JohnSnowLabs / spark-nlp

NVIDIA / OpenSeq2Seq

huggingface / pytorch-openai-transformer-lm

CLUEbenchmark / CLUE

mihail911 / nlp-library

brightmart / bert_language_understanding

explosion / spacy-transformers

LiyuanLucasLiu / LM-LSTM-CRF

codekansas / keras-language-modeling

smilelight / lightNLP

prabhuomkar / pytorch-cpp

IsaacChanghau / DL-NLP-Readings

pykaldi / pykaldi

nlpodyssey / spago

brightmart / sentiment_analysis_fine_grain

lonePatient / albert_pytorch

githubharald / CTCDecoder

ymcui / Chinese-ELECTRA

cedrickchee / awesome-bert-nlp

SKTBrain / KoBERT

Marsan-Ma-zz / tf_chatbot_seq2seq_antilm

Improve this page

Add this topic to your repo