Skip to content
#

corpus

Here are 686 public repositories matching this topic...

adbar
adbar commented Jan 9, 2020

I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.

Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

Learn more