Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

🚀 Feature request

Currently, the EncoderDecoderModel class in PyTorch automatically creates the decoder_input_ids based on the labels provided by the user (similar to how this is done for T5/BART). This should also be implemented for TFEncoderDecoderModel, because currently users should manually provide decoder_input_ids to the model.

One can take a look at the TF implementation

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervi

Describe the bug
I'm having major trouble with from_csv.

Context: I'm writing tutorial for build simple text search engine with Jina + Hub. I don't want to include a whole section of processing datasets, hence just passing a CSV into from_csv. I tried with meme dataset (converted tsv) before, and now using [superhero dataset](https://www.kaggle.com/jonathanbesomi/superheroes-nlp-datas

Motivated by huggingface/transformers#12789 in Transformers, one welcoming change would be replacing assertions with proper exceptions. The only type of assertions we should keep are those used as sanity checks.

Currently, there is a total of 87 files with the assert statements (located under datasets and src/datasets), so when working on this, to manage the PR s

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

cf nltk/nltk#2856 (comment)

Natural language processing

Here are 16,680 public repositories matching this topic...

huggingface / transformers

🚀 Feature request

apachecn / AiLearning

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

virgili0 / Virgilio

RasaHQ / rasa

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

jina-ai / jina

huggingface / datasets

flairNLP / flair

allenai / allennlp

NLP-LOVE / ML-NLP

nltk / nltk

chiphuyen / stanford-tensorflow-tutorials

spencermountain / compromise

graykode / nlp-tutorial

hanxiao / bert-as-service

botpress / botpress

stanfordnlp / CoreNLP

sloria / TextBlob

ashishpatel26 / 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

PaddlePaddle / PaddleHub

brightmart / text_classification

brightmart / nlp_chinese_corpus

crownpku / Awesome-Chinese-NLP

ymcui / Chinese-BERT-wwm