Take the 2-minute tour ×

Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

Clustering algorithm

up vote 5 down vote favorite

I have a set of n elements (1,000 <= n <= 100,000) and I can compute the grade of similarity between each pair, that is a value from 0 (very similar) to 1 (very different). I would like to cluster the elements based on their grade of similarity.

I thought about representing them as a graph, the elements are the vertices and the weighted edges are the similarity between them. I read about the MCL algorithm but I think it isn't the best approach since my graph is complete.

On the other hand, as there are a lot of elements, maybe computing the similarity between each pair is not the best practice (I want a fast algorithm). I also read something about leader clustering algorithms but, again, I am not sure if it is the best approach because, as far as I know, it is quite prone to fail due to its greediness (I would like something more robust).

Edit: I forgot to mention that I know a threshold for which when the comparison between two elements is higher than it, then I know that they belong to different clusters.

edited Mar 19 at 14:00

asked Mar 19 at 13:44

ibci
262

If I know Similarity(a,b) and Similarity(b,c), does that tell me (or allow me to bound) Similarity(a,c)? – Ben Aaronson Mar 19 at 13:49

No. Sometimes it does, but I never know when it is likely to occur – ibci Mar 19 at 14:03

The type of clustering algorithms you might want to use (given your requirements) is probably complete-linkage clustering which is necessary to avoid the chaining problem. If it is more complex than that, you may have to reconsider how that "similarity" is defined - e.g. if it is originally computed by a vector of scores, you may have to use that whole vector instead of a single similarity value. – rwong Mar 19 at 15:33

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged artificial-intelligence or ask your own question.

question feed

asked	10 days ago
viewed	93 times

current community

your communities

more stack exchange communities

Clustering algorithm

Your Answer

Browse other questions tagged artificial-intelligence or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Clustering algorithm

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged artificial-intelligence or ask your own question.

Related

Hot Network Questions