Skip to content
#

language-classification

Here are 20 public repositories matching this topic...

Muhtasham
Muhtasham commented Oct 5, 2021

Is your feature request related to a problem? Please describe.
Since the Oscar is limited by the fasttext language classifier which was trained on Wikipedia, the datasets contain also the sentences in other languages. For instance, Tajik (tg.txt) language contains large chunks of Uzbek sentences in Cyrillic script

Describe the solution you'd like
Train new models using other data othe

Classified sentences into one of Slovak, Czech, and English. Implemented relevant preprocessing steps, addressed the class imbalance in training set by employing the learned theory of Naive Bayes Models, and implementing subword units.

  • Updated Jun 4, 2020
  • Smalltalk

Improve this page

Add a description, image, and links to the language-classification topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the language-classification topic, visit your repo's landing page and select "manage topics."

Learn more