bert
Here are 1,458 public repositories matching this topic...
-
Updated
Jan 1, 2021 - Python
-
Updated
Oct 20, 2020 - Jupyter Notebook
-
Updated
Oct 22, 2020
-
Updated
Jan 27, 2021 - Python
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
Mar 31, 2021 - Python
-
Updated
Feb 24, 2021 - Python
-
Updated
Oct 22, 2020 - Python
-
Updated
Apr 18, 2021 - Jupyter Notebook
-
Updated
Mar 12, 2021 - Jupyter Notebook
-
Updated
Apr 13, 2021 - Python
-
Updated
Apr 21, 2021 - Python
-
Updated
Jul 28, 2020 - Python
-
Updated
Sep 17, 2020 - Python
-
Updated
Mar 21, 2021 - Python
-
Updated
Apr 23, 2021 - Scala
-
Updated
Nov 6, 2020 - Python
-
Updated
Apr 23, 2021 - Python
Create a suite of tools to easily manipulate SQuAD format data. It would be useful to have tools to do things such as:
- merging annotations
- converting SQuAD format to Pandas data frame and vice versa
- easier functions to remove samples / paragraphs / annotations
- easier functions to count
- easier functions to return all questions or all answers
-
Updated
Jun 29, 2020 - Python
-
Updated
Mar 21, 2021
-
Updated
Mar 9, 2021 - Python
-
Updated
Jan 3, 2021 - Python
-
Updated
Jan 28, 2021 - Jupyter Notebook
-
Updated
Apr 21, 2021 - Python
-
Updated
Jan 14, 2021 - Python
训练数据集问题
你好,看代码使用的训练数据为Restaurants_Train.xml.seg,请问这是这是在哪里下载的吗,还是semeval14的任务4中xml文件生成的?如果是后续生成的,请问有数据生成部分的代码吗?
Improve this page
Add a description, image, and links to the bert topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."
This is a feature request to add Wav2Vec2 Pretraining functionality to the transformers library. This is a "Good Second Issue" feature request, which means that interested contributors should have some experience with the transformers library and ideally also with training/fine-tuning Wav2Vec2.
Motivation
The popular [Wav2Vec2](https://huggingface.co/models?filter=w