DBSCAN algorithm and clustering algorithm for data mining

Question

How do you implement DBSCAN algorithm on categorical data (mushroom data set)?

And what is a one pass clustering algorithm?

Could you provide pseudo code for a one pass clustering algorithm?

Anony-Mousse · Answer 1 · 2012-02-08 18:00:24Z

You can run DBSCAN with an arbitrary distance function without any changes to it. The indexing part will be more difficult, so you will likely only get O(n^2) complexity.

But if you look closely at DBSCAN, all it does is compute distances, compare them to a threshold, and count objects. This is a key strength of it, it can easily be applied to various kinds of data, all you need is to define a distance function and thresholds.

I doubt there is a one-pass version of DBSCAN, as it relies on pairwise distances. You can prune some of these computations (this is where the index comes into play), but essentially you need to compare every object to every other object, so it is in O(n log n) and not one-pass.

One-pass: I believe the original k-means was a one-pass algorithm. The first k objects are your initial means. For every new object, you choose the closes mean and update it (incrementally) with the new object. As long as you don't do another iteration over your data set, this was "one-pass". (The result will be even worse than lloyd-style k-means though).

Buhake Sindi · Answer 2 · 2011-04-21 13:08:27Z

Read the first k items and hold them. Compute the distances between them.

For each remaining item:

Find out which of the k items it is closest to, and the distance between these two items.
If this is longer than the closest distance between any two of the k items, you can swap the new item with one of these two and at least not decrease the closest distance between any two of the new k items. Do so so as to increase this distance as much as possible.

Suppose that the set of all items can be divided up into l <= k clusters so that the distance between any two points in the same cluster is smaller than the distance between any two points in different clusters. Then after running this algorithm, you will retain at least one point from each cluster.

Doesn't sound like en.wikipedia.org/wiki/DBSCAN at all to me. — Anony-Mousse, Feb 8 '12 at 17:55

asked	4 years ago
viewed	2198 times
active	3 years ago

current community

your communities

more stack exchange communities

DBSCAN algorithm and clustering algorithm for data mining

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged algorithm data-mining cluster-analysis dbscan or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

DBSCAN algorithm and clustering algorithm for data mining

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged algorithm data-mining cluster-analysis dbscan or ask your own question.

Related

Hot Network Questions