How do you implement DBSCAN algorithm on categorical data (mushroom data set)?
And what is a one pass clustering algorithm?
Could you provide pseudo code for a one pass clustering algorithm?
How do you implement DBSCAN algorithm on categorical data (mushroom data set)? And what is a one pass clustering algorithm? Could you provide pseudo code for a one pass clustering algorithm? |
||||
|
You can run DBSCAN with an arbitrary distance function without any changes to it. The indexing part will be more difficult, so you will likely only get But if you look closely at DBSCAN, all it does is compute distances, compare them to a threshold, and count objects. This is a key strength of it, it can easily be applied to various kinds of data, all you need is to define a distance function and thresholds. I doubt there is a one-pass version of DBSCAN, as it relies on pairwise distances. You can prune some of these computations (this is where the index comes into play), but essentially you need to compare every object to every other object, so it is in One-pass: I believe the original k-means was a one-pass algorithm. The first k objects are your initial means. For every new object, you choose the closes mean and update it (incrementally) with the new object. As long as you don't do another iteration over your data set, this was "one-pass". (The result will be even worse than lloyd-style k-means though). |
|||
|
Read the first k items and hold them. Compute the distances between them. For each remaining item:
Suppose that the set of all items can be divided up into l <= k clusters so that the distance between any two points in the same cluster is smaller than the distance between any two points in different clusters. Then after running this algorithm, you will retain at least one point from each cluster. |
|||||
|