-
Updated
Oct 22, 2020
#
corpus
Here are 686 public repositories matching this topic...
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
nlp
news
wiki
text-classification
word2vec
corpus
dataset
question-answering
chinese
chinese-nlp
language-model
bert
chinese-corpus
pretrain
chinese-dataset
Auditormadness9
opened
Oct 18, 2021
Open
Add Scents
1
1
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
benchmark
tensorflow
nlu
glue
corpus
transformers
pytorch
dataset
chinese
pretrained-models
language-model
albert
bert
roberta
chineseglue
-
Updated
Jan 7, 2022 - Python
Deep Learning and deep reinforcement learning research papers and some codes
-
Updated
Mar 10, 2022
搜索所有中文NLP数据集,附常用英文NLP数据集
nlp
qa
sentiment-analysis
text-classification
match
machine-translation
text-similarity
corpus
knowledge-graph
chinese
text-summarization
datasets
ner
machine-reading-comprehension
-
Updated
Mar 1, 2020 - Python
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
-
Updated
Feb 10, 2020 - Python
machine-learning
natural-language-processing
insurance
chatbot
corpus
dataset
question-answering
natural-language-understanding
qasystem
insuranceqa-corpus-zh
-
Updated
Oct 11, 2020 - Python
Chatbot in 200 lines of code using TensorLayer
-
Updated
Oct 5, 2021 - Python
An R package for the Quantitative Analysis of Textual Data
-
Updated
Mar 20, 2022 - R
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
text-classification
corpus
dataset
chinese
semantic-similarity
pretrained-models
sentence-classification
albert
bert
sentence-analysis
distillation
sentence-pairs
roberta
-
Updated
Jul 8, 2020 - Python
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
natural-language-processing
information-retrieval
corpus
language-detection
embeddings
named-entity-recognition
normalizer
spell-check
persian-language
stemmer
dependency-parser
persian-nlp
part-of-speech-tagger
morphological-analysis
persian-stemmer
shallow-parser
-
Updated
Mar 22, 2022
微信公众号语料库
nlp
natural-language-processing
corpus
linguistics
weixin
chinese-nlp
corpora
weixin-data
wei-xin
yu-liao
yu-liao-ku
-
Updated
Jan 7, 2019
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
multilingual
language
translation
tokenizer
corpus
tagger
literature
corpus-linguistics
lemmatizer
corpus-tools
corpus-processing
corpus-statistics
stopword
corpus-analysis
-
Updated
Mar 12, 2022 - Python
adbar
commented
Jan 9, 2020
I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.
Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com
A dataset of millions of news articles scraped from a curated list of data sources.
nlp
machine-learning
natural-language-processing
database
corpus
artificial-intelligence
dataset
fakenews
-
Updated
Jan 25, 2020
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
-
Updated
Aug 28, 2021 - Python
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
nlp
natural-language-processing
library
sentiment-analysis
packages
corpus
dataset
corpus-linguistics
indonesian-language
bahasa-indonesia
indonesian
sentiment-analysis-dataset
nlp-bahasa-resources
-
Updated
Mar 15, 2022
-
Updated
Oct 11, 2020 - Python
非常全的文言文(古文)-现代文平行语料
-
Updated
Feb 8, 2022
Improve this page
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."