GitHub - boudinfl/pke: Python Keyphrase Extraction module

`pke` - python keyphrase extraction

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction models, and ships with supervised models trained on the SemEval-2010 dataset.

Installation

To pip install pke from github:

pip install git+https://github.com/boudinfl/pke.git

pke relies on spacy (>= 3.2.3) for text processing and requires models to be installed:

# download the english model
python -m spacy download en_core_web_sm

Minimal example

pke provides a standardized API for extracting keyphrases from a document. Start by typing the 5 lines below. For using another model, simply replace pke.unsupervised.TopicRank with another model (list of implemented models).

import pke

# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be a simple 
# test string and preprocessing is carried out using spacy
extractor.load_document(input='text', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
extractor.candidate_selection()

# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting()

# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10)

A detailed example is provided in the examples/ directory.

Getting started

To get your hands dirty with pke, we invite you to try our tutorials out.

Name	Link
Getting started with `pke` and keyphrase extraction
Model parameterization
Benchmarking models

Implemented models

pke currently implements the following keyphrase extraction models:

Unsupervised models
- Statistical models
  - FirstPhrases
  - TfIdf
  - KPMiner (El-Beltagy and Rafea, 2010)
  - YAKE (Campos et al., 2020)
- Graph-based models
  - TextRank (Mihalcea and Tarau, 2004)
  - SingleRank (Wan and Xiao, 2008)
  - TopicRank (Bougouin et al., 2013)
  - TopicalPageRank (Sterckx et al., 2015)
  - PositionRank (Florescu and Caragea, 2017)
  - MultipartiteRank (Boudin, 2018)
Supervised models
- Feature-based models
  - Kea (Witten et al., 2005)

Model performances

For comparison purposes, overall results of implemented models on commonly-used benchmark datasets are available in results. Code for reproducing these experiments are in the benchmarking notebook (also available on ).

Citing pke

If you use pke, please cite the following paper:

@InProceedings{boudin:2016:COLINGDEMO,
  author    = {Boudin, Florian},
  title     = {pke: an open source python-based keyphrase extraction toolkit},
  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  pages     = {69--73},
  url       = {http://aclweb.org/anthology/C16-2015}
}

README.md

`pke` - python keyphrase extraction

Table of Contents

Installation

Minimal example

Getting started

Implemented models

Model performances

Citing pke

About

Releases 1

Packages

Used by 113

Contributors 12

Languages

License

boudinfl/pke

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

pke - python keyphrase extraction

Table of Contents

Installation

Minimal example

Getting started

Implemented models

Model performances

Citing pke

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Used by 113

Contributors 12

Languages

`pke` - python keyphrase extraction

Packages