html2text

RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

nlp natural-language-processing text-mining nlp-apis mashape html2text topic-extraction sentence-clustering opinosis-summarization rxnlp-apis

Updated Jan 24, 2020

ThatXliner / unmarkd

Star

An extremely configurable markdown reverser for Python3.

python html markdown parser flexible reverse-engineering python3 beautifulsoup html2text reverser markdown-reverser reverse-markdown

Updated Jun 8, 2023
Python

pH-7 / Html2Text

Sponsor

Star

A very simple (but efficient) "HTML to plain text" converter ✍️

php converter php7 text plain-text html2text convertor text-converter email-text-parsing htmltotext symfony-mailer text-convertor

Updated Jun 11, 2023
PHP

zacanger / html2txt

Star

html2text but in node

html markdown cli node html2text

Updated Jul 2, 2023
JavaScript

x28 / inscriptis-java

Star

inscriptis - HTML to text conversion library for Java

java converter library html2text

Updated Aug 4, 2022
Java

dreipunktnull / twig-extensions

Star

A collection of useful, generic twig extensions.

twig twig-extension symfony html2text

Updated Jun 22, 2018
PHP

puhoy / readability_cli

Star

a cli tool to fetch webpages main content and print it as markdown

markdown html-to-markdown python3 readability html2text readability-lxml readability-cli fetch-webpages

Updated Oct 31, 2020
Python

erayon / PubMed

Star

This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.

html xml sklearn nltk xgboost beautifulsoup html2text svm-classifier

Updated Nov 7, 2017
HTML

LukaszNiewinski / Microservice-for-retrieving-img-and-text

Star

Microservice for text and images collection for data science purposes.

python api docker flask service docker-compose scrapy html2text

Updated Nov 22, 2022
Python

rubix1138 / html2text

Star

html2text Search Command for Splunk

python splunk html2text splunk-enterprise splunk-application splunk-searches

Updated Mar 4, 2019
Python

hcq0618 / html-files-to-markdown-files

Star

batch convert html files to mardown files

html html2text mardown

Updated May 17, 2019
Python

gsdefender / packtpub_telegram_bot

Star

Receive Packt Publishing Ltd. Free Learning updates in Telegram every day

telegram telegram-bot selenium packtpub html2text selenium-python

Updated May 16, 2020
Python

importcjj / go-readability

Star

Go package that cleans a HTML page for better readability.

go html golang text extractor text-extraction readability html2text html-extractor

Updated Aug 1, 2023
HTML

AbdellatifCHE / Collect_Store_Search

Star

The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.

python mongodb pymongo nltk scrapy html2text lemmatization

Updated Mar 3, 2020
Python

MattJeanLouis / scrap_web

Star

C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.

markdown open-source interactive python3 web-application web-scraping data-extraction html2text beautifulsoup4 streamlit