A short version of the documentation is available straight from Github (README.rst) while a more exhaustive one is present in the docs folder and online on trafilatura.readthedocs.io
Several problems could arise:
Non-idiomatic use of English (not quite fluent or natural)
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.
A short version of the documentation is available straight from Github (README.rst) while a more exhaustive one is present in the
docsfolder and online on trafilatura.readthedocs.ioSeveral problems could arise: