Skip to content
@bitextor

Bitextor Team

Translation memories generator

Pinned

  1. bitextor Public

    Bitextor generates translation memories from multilingual websites

    Python 187 37

  2. Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 86 17

  3. bifixer Public

    Tool to fix bitexts and tag near-duplicates for removal

    Python 11 1

  4. biroamer Public

    Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

    Python 3 2

  5. PDF parser and converter to HTML

    Java 45 13

  6. Extracts plain text, language identification and more metadata from WARC records

    C++ 3 1

Repositories

  • bitextor

    Bitextor generates translation memories from multilingual websites

    Python 187 GPL-3.0 37 1 0 Updated Sep 30, 2021
  • vecalign

    Improved Sentence Alignment in Linear Time and Space

    Python 0 Apache-2.0 12 0 0 Updated Sep 29, 2021
  • bitextor-neural

    Bitextor Neural generates translation memories from multilingual websites using state-of-the-art Machine Learning tools

    Python 1 GPL-3.0 0 0 0 Updated Sep 29, 2021
  • warc2text

    Extracts plain text, language identification and more metadata from WARC records

    C++ 3 MIT 1 3 0 Updated Sep 27, 2021
  • neural-document-aligner

    Document aligner which uses neural technologies to search matches across bilingual documents

    Python 1 GPL-3.0 0 0 0 Updated Sep 23, 2021
  • bicleaner-ai

    Bicleaner fork that uses neural networks

    Python 5 GPL-3.0 0 0 0 Updated Sep 20, 2021
  • deferred-crawling

    Reconstructs sentences using deferred crawling standoff annotations from Bitextor

    Python 0 MIT 0 0 0 Updated Sep 7, 2021
  • pdf-extract

    PDF parser and converter to HTML

    Java 45 GPL-3.0 13 1 2 Updated Aug 13, 2021
  • bicleaner-hardrules

    Pre-filtering step for bicleaner

    Python 2 GPL-3.0 0 0 0 Updated Jul 12, 2021
  • bicleaner

    Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 86 GPL-3.0 17 0 0 Updated Jul 5, 2021

Top languages

Loading…

Most used topics

Loading…