#
pdf-document-processor
Here are 117 public repositories matching this topic...
Convert PDF to HTML without losing text or format.
-
Updated
Sep 4, 2020 - HTML
Read and extract text and other content from PDFs in C# (port of PdfBox)
pdf
csharp
pdfbox
netstandard
pdf-files
pdf-document
hocr
document-analysis
pdf-extractor
alto-xml
page-xml
layout-analysis
pdf-document-processor
-
Updated
Sep 4, 2020 - C#
A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk
-
Updated
May 31, 2020 - Python
pdfCropMargins -- a program to crop the margins of PDF files
-
Updated
Aug 30, 2020 - Python
DocNET is as fast PDF editing and reading library for modern .NET applications
pdf
csharp
jpeg
pdf-converter
netcore
netstandard
pdf-files
pdf-document
pdf-conversion
pdf-extractor
pdf-document-processor
-
Updated
Aug 28, 2020 - C#
CCKS2019评测任务五-公众公司公告信息抽取,第3名
-
Updated
Sep 15, 2019 - Python
Converting pdf to any format for analyzing
-
Updated
Jul 15, 2020 - Python
Python library to manipulate PDF page labels
-
Updated
Feb 10, 2020 - Python
Utility to convert PDF into JPG files
-
Updated
Jun 17, 2020 - Java
nlp
information-extraction
ibm-research
table-extraction
scientific-papers
pdf-document-processor
ibm-research-ai
-
Updated
Jul 18, 2019 - Java
PDFViewer is a GUI tool, written using python3 and tkinter, which lets you view PDF documents.
pdf
tkinter
pdf-viewer
pdf-files
pdf-document
tkinter-graphic-interface
tkinter-gui
pdf-document-processor
tkinter-python
tkinter-library
-
Updated
Nov 9, 2018 - Python
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
nlp
data-science
text-mining
watson
natural-language
jupyter-notebook
artificial-intelligence
cloud-computing
recommender-system
self-learning
ibm-cloud
watson-nlu
watson-natural-language
unstructured-data
pdf-document-processor
watson-studio
-
Updated
May 27, 2020 - Jupyter Notebook
Prepare documents for distribution
-
Updated
Jun 9, 2020 - Python
Parse and export pdf bank statements to QIF format.
-
Updated
Dec 29, 2017 - Python
Family helper websites.
jquery
php
authentication
codeigniter
pdf-converter
dropzonejs
pdf-generation
pdf-document-processor
-
Updated
Nov 28, 2017 - HTML
Code used in my Medium Story https://medium.com/@umerfarooq_26378/python-for-pdf-ef0fac2808b0
-
Updated
Sep 24, 2019 - Jupyter Notebook
Extract essential data (e.g. GPA, skills, education, age, ...) from PDF-formatted working Resume files (under develop)
-
Updated
Jul 31, 2018 - Python
Full featured wrapper for leptonica 1.77.0
wrapper
library
cmake
computer-vision
csharp
dll
libraries
computer-graphics
image-processing
bytes
tesseract
clang
leptonica
image-manipulation
image-classification
image-recognition
pdf-files
image-segmentation
image-analysis
pdf-generation
marshaller
pix
cmake-gui
pdf-document-processor
uinteger
-
Updated
Sep 12, 2019 - Visual Basic
Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google
-
Updated
Feb 17, 2019 - Python
Based on Foxit Quick PDF Library,python interface
-
Updated
Apr 4, 2020 - Python
Spire.PDF for Java is a PDF component that enables to read, write, print and convert PDF documents in Java applications without using Adobe Acrobat.
-
Updated
Mar 12, 2019
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
classifier
pdf
machine-learning
csharp
lightgbm
pdf-document
document-layout
layout-analysis
pdf-document-processor
document-layout-analysis
ml-net
pdfpig
publaynet
-
Updated
Mar 16, 2020 - C#
persian and arabic fonts for TCPDF - PHP -فونت فارسی برای tcpdf
-
Updated
May 23, 2019
Converting pdf to any format for easily analyzing
-
Updated
Aug 29, 2019 - Python
Android port of pdf2htmlEX - Convert PDF to HTML without losing text or format.
-
Updated
Jul 19, 2020 - Java
Edaily API server with configured JWT and GraphQL. 🤘
-
Updated
Jun 10, 2020 - JavaScript
PDF Editor (remove JS, find/replace, redact) based on iTextSharp
-
Updated
Mar 26, 2018 - C#
Simple python utilities to play around with PDF Files
python
pdf
player
pdf-converter
merge
python3
rotate
pdf-viewer
pdfkit
pdf-files
pdf-document
pypdf2
pdf-manipulation
python-utilities
pdf-merge
pdfmerger
mergepdf
pdf-document-processor
pdf-player
rotatepdf
-
Updated
Jun 13, 2020 - Python
Improve this page
Add a description, image, and links to the pdf-document-processor topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pdf-document-processor topic, visit your repo's landing page and select "manage topics."
Is your feature request related to a problem? Please describe.
The problem is inefficiency when simply looking for a single operand and then stopping processing.
For example, if only looking for a single colored pixel in a page.
Describe the solution you'd like
It would make sense to be able to set a stop flag on the processor and return out of the handler, which would cause the proc