Here are
201 public repositories
matching this topic...
Fast, secure, efficient backup program
Deduplicating archiver with compression and authenticated encryption.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Updated
Jul 30, 2020
Python
Extremely fast tool to remove duplicates and other lint from your filesystem
Simple, configuration-driven backup software for servers and workstations
Updated
Jul 24, 2020
Python
Data deduplication engine, supporting optional compression and public key encryption.
Updated
Apr 13, 2020
Rust
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Updated
Jul 16, 2020
JavaScript
A toolkit for record linkage and duplicate detection in Python
Updated
Jun 4, 2020
Python
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
A list of free data matching and record linkage software.
A pair of kernel modules which provide pools of deduplicated and/or compressed block storage.
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Updated
Jun 7, 2020
Python
Quickly detect already witnessed data.
Userspace tools for managing VDO volumes.
Spark RDD with Lucene's query and entity linkage capabilities
Updated
Jul 27, 2020
Scala
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Updated
Mar 23, 2020
Python
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Updated
Apr 11, 2019
Python
Record Linkage ToolKit (Find and link entities)
Updated
Jun 4, 2020
Python
CLI utility to find duplicate files
Dedupe/batch geocode addresses and venues around the world with libpostal
Updated
Jul 19, 2020
Python
The Dropbox for IPFS (without the icky stuff)
Updated
Oct 23, 2017
Python
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
Updated
Jul 27, 2020
Python
Resources for tackling record linkage / deduplication / data matching problems
Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
Updated
Mar 1, 2020
JavaScript
A simple command line interface to the datamade/dedupe library.
Updated
Oct 22, 2019
Jupyter Notebook
Implementation in Apache Spark of the EM algorithm to estimate parameters of Fellegi-Sunter's canonical model of record linkage.
Updated
Jul 29, 2020
Python
Improve this page
Add a description, image, and links to the
deduplication
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
deduplication
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.