Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

learn more… | top users | synonyms

6
votes
2answers
58 views

Item-based collaborative filtering

I wrote the following code using Ruby to compute item-based collaborative filtering (using Jaccard coefficient of similarity). I would appreciate feedback about any potential issues with the code ...
7
votes
2answers
103 views

Performance Issues: Finding Nearest Polygon

Finding the nearest polygon out of a list of polygons seems to be quite a challenge on performance, due to lack of optimization. Consider the following image: I'll give the concept behind this. The ...
3
votes
1answer
62 views

Calculating the distance between one point, and many others

In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like: ...
8
votes
2answers
275 views

Possible optimizations for calculating squared euclidean distance

I need to do a few hundred million euclidean distance calculations every day in a Python project. Here is what I started out with: ...
4
votes
2answers
97 views

K-means clustering in Python

The following code uses scikit-learn to carry out K-means clustering where \$K = 4\$, on an example related to wine marketing from the book DataSmart. That book uses excel but I wanted to learn Python ...
5
votes
3answers
107 views

Std lib-like C++ function to find nearest elements in a container

Initial problem For a project I found myself in the need to search for elements in a container that are the closest neighbours to another precise element I have. In my case it was points in any ...
7
votes
2answers
206 views

Given a page of content, determine shortest snippet containing all search phrases (no order required)

A recruiter gave me a homework problem as a part of the recruiting process and after receiving my submission he told me that he decided not to proceed with me. When I asked for the reason, he told me ...
6
votes
1answer
69 views

Calculating Euclidean norm for each vector in a sparse matrix

Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space. ...
4
votes
1answer
304 views

K-nearest neighbours in MATLAB

I implemented K-Nearest Neighbours algorithm, but my experience using MATLAB is lacking. I need you to check the small portion of code and tell me what can be improved or modified. I hope it is a ...
4
votes
1answer
83 views

KMeans in the shortest and most readable format

I'm learning Python (coming from Java) so I decided to write KMeans as a practice for the language. However I want to see how could one improve the code and making it shorter and yet readable. I ...
4
votes
1answer
76 views

Nearest pair of points

Given a set of 2 dimensional points, it returns the two nearest points. If more pairs have the same min-distance between them, then an arbitrary choice is made. This program expects the points to be ...
2
votes
1answer
915 views

String Matching and Clustering

I have a pretty simple problem. A large set of unique strings that are to various degrees not clean, and in reality they in fact represent the same underlying reality. They are not person names, but ...
6
votes
3answers
442 views

How to speed up this k-nearest neighbors code?

I have implemented kNN (k-nearest neighbors) as following, but it is very slow. I want to get an exact k-nearest-neighbor, not the approximate ones, so I didn't use the FLANN or ANN libraries. Could ...
4
votes
2answers
1k views

Is this the fastest way to find the closest point to a list of points using numpy?

I'm trying to find the closest point (Euclidean distance) from a user inputed point to a list of 50,000 points that I have. Note that the list of points changes all the time. and the closest distance ...
4
votes
3answers
342 views

Nearest Neighbour classification algorithm

The following code is from a university assignment of mine to write a classification algorithm (using nearest neighbour) to classify whether or not a given feature set (each feature is the frequency ...
4
votes
1answer
800 views

Object Orientation and Style - Document Clustering / NLP in Java

This is a solution from the CodeSprint 2012 coding competition. Any suggestions on ways to improve the object orientation or general style would be much appreciated. The code takes several text ...
5
votes
4answers
2k views

Grouping consecutive numbers into ranges in Python 3.2

The following is a function that I wrote to display page numbers as they appear in books. If you enter the list [1,2,3,6,7,10], for example, it would return: ...
10
votes
1answer
171 views

Is this a BSP tree? Am I missing something?

I've been practicing using BSP trees as I hear they are a good start to create procedural spaces such as houses. For ease-of-deployment purposes, I've tested this code in Lua and it seems to work. ...
1
vote
1answer
167 views

A sophisticated algorithm for geometric computation (distance between points) [closed]

I am confused by an algorithm for counting the number of pairs of N random points which has a distance closer than d = sqrt((p1.x-p2.x)^2 + (p1.y-p2.y)^2) The ...