Skip to content
#

word-segmentation

Here are 85 public repositories matching this topic...

loretoparisi
loretoparisi commented Jan 23, 2019

It would be worth to provide a tutorial how to train a simple cross-language classification model using sentencepiece. Supposed to have a given training set and have chosen a model (let'say a simple Word2Vec plus softmax or a LSTM model, etc), how to use the created sentencepiece model (vocabulary/codes) to feed this model for train and inference?

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

  • Updated Oct 22, 2019
  • Python
DoumanAsh
DoumanAsh commented Jan 4, 2018

For Juman++ to be widely usable, we want to have a documented and stable C API and an option to have a dynamically linked library.
That library probably should use -fvisibility=hidden and explicit visibility on exported symbols on Unixes and __declspec(dllimport/dllexport) on Windows.

The minimal API should be:

  1. Loading a model using a config file
  2. Analyzing a sentence
  3. Accessing

Improve this page

Add a description, image, and links to the word-segmentation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the word-segmentation topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.