Skip to content
#

bert

Here are 1,458 public repositories matching this topic...

transformers
patrickvonplaten
patrickvonplaten commented Apr 14, 2021

🚀 Feature request

This is a feature request to add Wav2Vec2 Pretraining functionality to the transformers library. This is a "Good Second Issue" feature request, which means that interested contributors should have some experience with the transformers library and ideally also with training/fine-tuning Wav2Vec2.

Motivation

The popular [Wav2Vec2](https://huggingface.co/models?filter=w

tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
haystack

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more