bert

Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.

Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokeni

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.

Create a suite of tools to easily manipulate SQuAD format data. It would be useful to have tools to do things such as merging annotations, converting SQuAD format to Pandas data frame and vice versa, easier functions to remove samples / paragraphs / annotations.

你好，看代码使用的训练数据为Restaurants_Train.xml.seg，请问这是这是在哪里下载的吗，还是semeval14的任务4中xml文件生成的？如果是后续生成的，请问有数据生成部分的代码吗？

bert

Here are 1,357 public repositories matching this topic...

huggingface / transformers

hanxiao / bert-as-service

graykode / nlp-tutorial

brightmart / nlp_chinese_corpus

ymcui / Chinese-BERT-wwm

huggingface / tokenizers

codertimo / BERT-pytorch

PaddlePaddle / ERNIE

macanv / BERT-BiLSTM-CRF-NER

brightmart / albert_zh

IntelLabs / nlp-architect

bentrevett / pytorch-sentiment-analysis

jessevig / bertviz

asyml / texar

CyberZHG / keras-bert

BrikerMan / Kashgari

shibing624 / pycorrector

JohnSnowLabs / spark-nlp

Separius / awesome-sentence-embedding

CLUEbenchmark / CLUE

Jiakui / awesome-bert

brightmart / roberta_zh

utterworks / fast-bert

deepset-ai / haystack

ChineseGLUE / ChineseGLUE

github / CodeSearchNet

msgi / nlp-journey

dbiir / UER-py

synrc / n2o

songyouwei / ABSA-PyTorch

Improve this page

Add this topic to your repo