#
wikipedia-dump
Here are 53 public repositories matching this topic...
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
-
Updated
Jan 10, 2018 - Ruby
html
docker
nginx
wiki
docker-compose
mediawiki
wikipedia
archiving
datascience
zim
wikipedia-dump
openzim
kimix
xowa
internet-archiving
mwdumper
-
Updated
Feb 13, 2020 - PLpgSQL
Corpus creator for Chinese Wikipedia
-
Updated
Mar 11, 2020 - Python
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
-
Updated
May 13, 2020 - Java
Extracting useful metadata from Wikipedia dumps in any language.
multilingual
redirects
wikipedia
python3
disambiguation
wikipedia-dump
metadata-extraction
wikiextractor
-
Updated
Sep 20, 2019 - Python
A simple utility to index wikipedia dumps using Lucene.
-
Updated
Aug 24, 2019 - Java
Reading the data from OPIEC - an Open Information Extraction corpus
nlp
natural-language-processing
wiki
wikipedia
corpus
information-extraction
dataset
corpora
corpus-data
nlp-resources
wikipedia-dump
corpus-tools
natural-language-understanding
open-information-extraction
dataset-interface
wikipedia-corpus
corpus-processing
nlp-datasets
-
Updated
Jun 12, 2019 - Java
Research for master degree, operation projizz-I/O
-
Updated
Dec 27, 2017 - Python
Node.js module for parsing the content of wikipedia articles into javascript objects
-
Updated
Jul 9, 2017 - JavaScript
Downloads and imports Wikipedia page histories to a git repository
-
Updated
Jun 4, 2019 - Python
Ranking of Programming Languages on English Wikipedia (Spark/Scala)
-
Updated
Mar 17, 2017 - Scala
Extracts geodata from a wikipedia dump
converter
json
geojson
mapping
wikipedia
conversion
geodata
geotagged-wikipedia-articles
wikipedia-dump
geotagging
wikipedia-scraper
-
Updated
Feb 12, 2020 - Go
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
-
Updated
Dec 2, 2019 - Python
Python package for working with MediaWiki XML content dumps
-
Updated
Apr 21, 2020 - Python
OlehOnyshchak
commented
Apr 25, 2020
If you want to give just a short feedback, post a comment on this issue. That will help me to know what works good and what may be better or is missing. Thanks!
A Kotlin project which extracts ngram counts from Wikipedia data dumps.
-
Updated
Apr 29, 2020 - Kotlin
Extract human names from Wikipedia
-
Updated
Jul 19, 2019 - HTML
Wiki dump parser (jupyter)
python
parser
tutorial
jupyter
wiki
wikipedia
xml
jupyter-notebook
tutorials
python3
xml-parser
wikia
jupyter-notebooks
demos
wikipedia-dump
bz2
tutorial-code
wiktionary
wikipedia-corpus
-
Updated
Sep 23, 2018 - Jupyter Notebook
Visualize/explore word2vec datasets with pygame
-
Updated
Mar 30, 2018 - Python
Java tool to Wikimedia dumps into Java Article pojos for test or fake data.
-
Updated
Jan 11, 2020 - Java
mirror of https://git.noxz.tech/wikid
-
Updated
Jun 14, 2020 - C
WikiBank is a new partially annotated resource for multilingual frame-semantic parsing task.
multilingual
python
mongodb
dataset
wikipedia-dump
wikidata-dump
semantic-role-labeling
semantic-role
-
Updated
Dec 2, 2019 - Python
wikititle - script for printing list all Wikipedia title in few language
-
Updated
Feb 11, 2018 - Shell
Use the Word2Vec proposed by Google to train models (vectors) to be used in any word2vec application.
-
Updated
Jan 15, 2018 - Python
A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
-
Updated
Sep 12, 2019 - Python
A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time
-
Updated
Nov 2, 2019 - Python
Improve this page
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."
Custom css themes shouldn't be too difficult. Probably pass in a url parameter for the them you want and then handle it in the template. Night mode has been requested.