Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
6
votes
1answer
51 views
Bitmap problem performance boost
Problem statement:
Input is a rectangular bitmap like this:
0001
0011
0110
The task is to find for each black (0) "pixel", the distance to the
...
3
votes
1answer
44 views
Construct KD-Tree in Java
I want to construct KD-Tree from unsorted List of points in the plane (i.e. 2-KD-Tree). To do so I used the method from Wikipedia: http://en.wikipedia.org/wiki/Kd-tree#Construction This is my code:
...
6
votes
2answers
66 views
Item-based collaborative filtering
I wrote the following code using Ruby to compute item-based collaborative filtering (using Jaccard coefficient of similarity).
I would appreciate feedback about any potential issues with the code ...
7
votes
2answers
115 views
Performance Issues: Finding Nearest Polygon
Finding the nearest polygon out of a list of polygons seems to be quite a challenge on performance, due to lack of optimization. Consider the following image:
I'll give the concept behind this. The ...
3
votes
1answer
85 views
Calculating the distance between one point, and many others
In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like:
...
8
votes
2answers
344 views
Possible optimizations for calculating squared euclidean distance
I need to do a few hundred million euclidean distance calculations every day in a Python project.
Here is what I started out with:
...
4
votes
2answers
199 views
K-means clustering in Python
The following code uses scikit-learn to carry out K-means clustering where \$K = 4\$, on an example related to wine marketing from the book DataSmart. That book uses excel but I wanted to learn Python ...
5
votes
3answers
132 views
Std lib-like C++ function to find nearest elements in a container
Initial problem
For a project I found myself in the need to search for elements in a container that are the closest neighbours to another precise element I have. In my case it was points in any ...
8
votes
2answers
216 views
Given a page of content, determine shortest snippet containing all search phrases (no order required)
A recruiter gave me a homework problem as a part of the recruiting process and after receiving my submission he told me that he decided not to proceed with me. When I asked for the reason, he told me ...
6
votes
1answer
96 views
Calculating Euclidean norm for each vector in a sparse matrix
Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space.
...
4
votes
1answer
475 views
K-nearest neighbours in MATLAB
I implemented K-Nearest Neighbours algorithm, but my experience using MATLAB is lacking. I need you to check the small portion of code and tell me what can be improved or modified. I hope it is a ...
4
votes
1answer
96 views
KMeans in the shortest and most readable format
I'm learning Python (coming from Java) so I decided to write KMeans as a practice for the language. However I want to see how could one improve the code and making it shorter and yet readable. I ...
4
votes
1answer
76 views
Nearest pair of points
Given a set of 2 dimensional points, it returns the two nearest points. If more pairs have the same min-distance between them, then an arbitrary choice is made. This program expects the points to be ...
2
votes
1answer
1k views
String Matching and Clustering
I have a pretty simple problem. A large set of unique strings that are to various degrees not clean, and in reality they in fact represent the same underlying reality. They are not person names, but ...
6
votes
3answers
547 views
How to speed up this k-nearest neighbors code?
I have implemented kNN (k-nearest neighbors) as following, but it is very slow. I want to get an exact k-nearest-neighbor, not the approximate ones, so I didn't use the FLANN or ANN libraries. Could ...
4
votes
2answers
2k views
Is this the fastest way to find the closest point to a list of points using numpy?
I'm trying to find the closest point (Euclidean distance) from a user inputed point to a list of 50,000 points that I have. Note that the list of points changes all the time. and the closest distance ...
7
votes
3answers
2k views
Density-based clustering of image keypoints
I have implemented the DBSCAN algorithm for clustering image keypoints. I have been following the pseudocode on the wiki page pretty strictly, and it's working, but I get the feeling its a very naive ...
4
votes
3answers
350 views
Nearest Neighbour classification algorithm
The following code is from a university assignment of mine to write a classification algorithm (using nearest neighbour) to classify whether or not a given feature set (each feature is the frequency ...
4
votes
1answer
865 views
Object Orientation and Style - Document Clustering / NLP in Java
This is a solution from the CodeSprint 2012 coding competition. Any suggestions on ways to improve the object orientation or general style would be much appreciated. The code takes several text ...
5
votes
4answers
2k views
Grouping consecutive numbers into ranges in Python 3.2
The following is a function that I wrote to display page numbers as they appear in books.
If you enter the list [1,2,3,6,7,10], for example, it would return:
...
11
votes
1answer
183 views
Is this a BSP tree? Am I missing something?
I've been practicing using BSP trees as I hear they are a good start to create procedural spaces such as houses.
For ease-of-deployment purposes, I've tested this code in Lua and it seems to work. ...
1
vote
1answer
168 views
A sophisticated algorithm for geometric computation (distance between points) [closed]
I am confused by an algorithm for counting the number of pairs of N random points which has a distance closer than d = sqrt((p1.x-p2.x)^2 + (p1.y-p2.y)^2)
The ...