Skip to content
#

newsgroups

Here are 14 public repositories matching this topic...

The task is to cluster a given collection of documents into well-define, identifiable clusters. Clustering is an unsupervised and very challenging problem, here you need to identify many parameters for the task. You are now aware of two basic types of clustering algorithm partition and hierarchical, here you have a choice to apply any of them. The feature choices are still open for you. In order to evaluate the clustering results, you should apply one internal and one external clustering evaluation measure. Dataset The dataset is a subset of famous NEWS20 dataset. It contains 50 textual documents. In supervise learning the input is the only thing available for learning. You can set a baseline for this dataset by using tf*idf based features from the text.

  • Updated Feb 17, 2018
  • Python

Improve this page

Add a description, image, and links to the newsgroups topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the newsgroups topic, visit your repo's landing page and select "manage topics."

Learn more