bert

🚀 Feature request

Currently, the EncoderDecoderModel class in PyTorch automatically creates the decoder_input_ids based on the labels provided by the user (similar to how this is done for T5/BART). This should also be implemented for TFEncoderDecoderModel, because currently users should manually provide decoder_input_ids to the model.

One can take a look at the TF implementation

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.

While running the tutorials is not rare to meet with UserWarnings that are caused by underlying dependencies like transformers or pytorch. I think UserWarnings that are triggered by Haystack's or the user's code should stay visible, but those coming from dependencies could be hidden, as there's nothing we or the final users can do about it.

Examples:

Tutorial 1: `/home/sara/work/hayst

欢迎您反馈PaddleNLP使用问题，非常感谢您对PaddleNLP的贡献！
在留下您的问题时，辛苦您同步提供如下信息：

版本、环境信息
1）PaddleNLP和PaddlePaddle版本：请提供您的PaddleNLP和PaddlePaddle版本号，例如PaddleNLP 2.0.4，PaddlePaddle2.1.1
2）系统环境：请您描述系统类型，例如Linux/Windows/MacOS/，python版本
复现信息：如为报错，请给出复现环境、复现步骤
paddle版本2.0.8 paddlenlp版本2.1.0
建议，能否在paddlenlp文档中，整理列出各个模型的tokenizer是基于什么类别的based，如bert tokenizer是word piece的，xlnet tokenizer是sentence piece的，以及对应的输入输出样例

bert

Here are 1,938 public repositories matching this topic...

huggingface / transformers

🚀 Feature request

graykode / nlp-tutorial

hanxiao / bert-as-service

brightmart / nlp_chinese_corpus

ymcui / Chinese-BERT-wwm

huggingface / tokenizers

codertimo / BERT-pytorch

PaddlePaddle / ERNIE

macanv / BERT-BiLSTM-CRF-NER

brightmart / albert_zh

jessevig / bertviz

bentrevett / pytorch-sentiment-analysis

shibing624 / pycorrector

IntelLabs / nlp-architect

deepset-ai / haystack

JohnSnowLabs / spark-nlp

CLUEbenchmark / CLUE

PaddlePaddle / PaddleNLP

CyberZHG / keras-bert

BrikerMan / Kashgari

asyml / texar

Separius / awesome-sentence-embedding

brightmart / roberta_zh

namisan / mt-dnn

km1994 / nlp_paper_study

bytedance / lightseq

dbiir / UER-py

Jiakui / awesome-bert

utterworks / fast-bert

MaartenGr / BERTopic

Improve this page

Add this topic to your repo