Here are
25 public repositories
matching this topic...
Fast and customizable text tokenization library with BPE and SentencePiece support
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Updated
Sep 9, 2022
Python
Minimal example of using a traced huggingface transformers model with libtorch
R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Updated
Jul 9, 2022
Python
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Updated
Sep 8, 2021
Python
Extremely simple and understandable GPT2 implementation with minor tweaks
Updated
Dec 6, 2019
Python
BERT implementation of PyTorch
Updated
Mar 16, 2020
Python
To investigate various DNN text classifiers including MLP, CNN, RNN, BERT approaches.
Updated
Jan 15, 2020
Jupyter Notebook
Rust binding for the sentencepiece library
Updated
Oct 11, 2022
Rust
Bengali language Tokenizer (SentencePiece)
Updated
Oct 20, 2019
Python
Sentencepiece Dart is a wrapper for Google's Sentencepiece C++ library modified
NMT with RNN Models: (1) in Vanilla style, (2) with Sentencepiece, (3) using Pre-trained models from FairSeq
Updated
Sep 19, 2021
Python
Updated
May 16, 2020
JavaScript
Automated generation of new `The Office` scripts using deep neural networks
Updated
Sep 23, 2022
Jupyter Notebook
This repository contains codes related to the experiments in "An Experimental Evaluation of Japanese Tokenizers for Sentiment-Based Text Classification" presented at
https://www.anlp.jp/nlp2021/ . Authors: Andre Rusli and Makoto Shishido (Tokyo Denki University).
Updated
Mar 8, 2022
Jupyter Notebook
Updated
Mar 8, 2020
Jupyter Notebook
Bengali SentencePiece Model created with wiki dump data.
Escape unknown symbols in SentecePiece vocabularies
Updated
Sep 28, 2022
Python
Workshops of natural language processing
Updated
Jan 6, 2021
Jupyter Notebook
Improve this page
Add a description, image, and links to the
sentencepiece
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
sentencepiece
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.