Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

learn more… | top users | synonyms

1
vote
0answers
54 views

Depth First Search for percolation to find clusters in Go game

I have some questions about Depth First Search and whether I implemented it correctly. Below is a more thorough discussion. The graph in question is a randomly colored square grid (I use 3 colors). ...
1
vote
0answers
14 views

Collaborative filtering to group similar users and products

I'm doing product recommendation module based on collaborative_filtering. The recommendation will be generated by users, ...
2
votes
0answers
80 views

C# port of data mining algorithm much slower than reference implementation

I was trying to implement the algorithm specified in this research paper (please ignore the math, since it's irrelevant to the question). This algorithm is very basic in formal concept analysis. The ...
3
votes
1answer
57 views

Implementation of KNN in R

I have implemented the K-Nearest Neighbor algorithm with Euclidean distance in R. It works fine but takes tremendously huge time than the library function (get.knn). Please point out the possibility ...
1
vote
1answer
70 views

Simple string-root detection in a string-family

(This problem is related to Simple string-split by root and sufix algorithm) There are many ways to find a "common root" of a list of similar strings, that begins with the same substring... The ...
7
votes
2answers
109 views

Finding the maximum pairwise difference in a collection of colors

Note that this problem is equivalent to finding the longest line segment defined by any two points in a collection of 3D coordinates, which may be an easier way to visualize the problem, and is almost ...
6
votes
3answers
213 views

Finding clusters in a matrix

I got asked at an interview to write a program that, given a NxM matrix with zeros and ones, prints out the list of clusters of 1s. The clusters are defined as patches of 1s connected horizontally, ...
2
votes
0answers
99 views

Discretization of continuous attributes for automatic classification [closed]

Background In machine learning, it's common to encounter the problem of making a decision as to which discrete category an object belongs to based on a set of continuous attributes. For example, we ...
4
votes
4answers
445 views

Aggregate array values into ranges

In five minutes I made a pretty ugly looking function. Can you help before I have to commit the code into history? Requirements: I would like a function that takes an array of numbers, and ...
2
votes
1answer
927 views

K-means clustering algorithm in python

Here is my implementation of the k-means algorithm in python. I would love to get any feedback on how it could be improved or any logical errors that you may see. I've left off a lot of the ...
8
votes
3answers
232 views

Breadth-first search for clusters of pixels in a given color range

I am a beginner in programming languages, so I apologise if my code is badly formatted or doesn't make any sense. My program gets an image and a RGB color range as input and it counts how many pixels ...
3
votes
1answer
377 views

Finding the shortest substring containing keywords

Problem: Write a function that takes a String document and a String[] keywords and returns the smallest substring of ...
7
votes
2answers
147 views

What are my highest activity streaks?

I have written the following query to figure out activity streaks on a per-user basis. I find it... Ugly... And would love to improve it! Limitations Those are explained as commented text at the ...
1
vote
0answers
48 views

A scalable function for get boundary vertices in a graph

Given a community division I need a list of vertices that have edges in more than one community, i.e., boundary vertices. I've tried this: ...
2
votes
2answers
43 views

Cluster date time values

I came up with this TSQL for SQL Server 2008: ...
1
vote
0answers
162 views

Optimize QuadTree to find K Nearest Neighbors

I'm looking a way to make my k nearest neighbors search more efficient. The context of the question is that I'm given a list of topics that have a unique ID (integer) and a (x,y) coordinate (floats) ...
6
votes
1answer
674 views

K-Means in Rust

I have implemented for learning purposes a simple K-Means clustering algorithm in Rust. For those who are not familiar: you are given N points, say in the plane, ...
4
votes
1answer
866 views

K-Mean with Numpy

I have implemented the K-Mean clustering Algorithm in Numpy: ...
9
votes
1answer
200 views

Bitmap problem performance boost

Problem statement: Input is a rectangular bitmap like this: 0001 0011 0110 The task is to find for each black (0) "pixel", the distance to the ...
2
votes
1answer
158 views

Neighbours from point connections

I am working with a mesh of triangles (in 3D, although I doubt it makes a difference). The mesh is given as list of lists, each list containing the indices of the three vertices of a triangle in said ...
3
votes
1answer
242 views

Construct KD-Tree in Java

I want to construct KD-Tree from unsorted List of points in the plane (i.e. 2-KD-Tree). To do so I used the method from Wikipedia: http://en.wikipedia.org/wiki/Kd-tree#Construction This is my code: ...
6
votes
2answers
134 views

Item-based collaborative filtering

I wrote the following code using Ruby to compute item-based collaborative filtering (using Jaccard coefficient of similarity). I would appreciate feedback about any potential issues with the code ...
8
votes
2answers
163 views

Finding the nearest polygon

Finding the nearest polygon out of a list of polygons seems to be quite a challenge on performance, due to lack of optimization. Consider the following image: I'll give the concept behind this. ...
3
votes
1answer
750 views

Calculating the distance between one point, and many others

In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like: ...
11
votes
2answers
2k views

Possible optimizations for calculating squared euclidean distance

I need to do a few hundred million euclidean distance calculations every day in a Python project. Here is what I started out with: ...
6
votes
2answers
6k views

K-means clustering in Python

The following code uses scikit-learn to carry out K-means clustering where \$K = 4\$, on an example related to wine marketing from the book DataSmart. That book uses excel but I wanted to learn Python ...
6
votes
3answers
651 views

Std lib-like C++ function to find nearest elements in a container

Initial problem For a project I found myself in the need to search for elements in a container that are the closest neighbours to another precise element I have. In my case it was points in any ...
11
votes
2answers
545 views

Given a page of content, determine shortest snippet containing all search phrases (no order required)

A recruiter gave me a homework problem as a part of the recruiting process and after receiving my submission he told me that he decided not to proceed with me. When I asked for the reason, he told me ...
6
votes
1answer
367 views

Calculating Euclidean norm for each vector in a sparse matrix

Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space. ...
4
votes
1answer
7k views

K-nearest neighbours in MATLAB

I implemented K-Nearest Neighbours algorithm, but my experience using MATLAB is lacking. I need you to check the small portion of code and tell me what can be improved or modified. I hope it is a ...
4
votes
1answer
186 views

KMeans in the shortest and most readable format

I'm learning Python (coming from Java) so I decided to write KMeans as a practice for the language. However I want to see how could one improve the code and making it shorter and yet readable. I ...
4
votes
1answer
217 views

Nearest pair of points

Given a set of 2 dimensional points, it returns the two nearest points. If more pairs have the same min-distance between them, then an arbitrary choice is made. This program expects the points to be ...
4
votes
1answer
5k views

String Matching and Clustering

I have a pretty simple problem. A large set of unique strings that are to various degrees not clean, and in reality they in fact represent the same underlying reality. They are not person names, but ...
6
votes
3answers
3k views

Speeding up this k-nearest neighbors implementation

I have implemented kNN (k-nearest neighbors) as follows, but it is very slow. I want to get an exact k-nearest-neighbor, not the approximate ones, so I didn't use the FLANN or ANN libraries. ...
15
votes
1answer
7k views

Finding the closest point to a list of points

I'm trying to find the closest point (Euclidean distance) from a user-inputted point to a list of 50,000 points that I have. Note that the list of points changes all the time. and the closest distance ...
7
votes
3answers
4k views

Density-based clustering of image keypoints

I have implemented the DBSCAN algorithm for clustering image keypoints. I have been following the pseudocode on the wiki page pretty strictly, and it's working, but I get the feeling its a very naive ...
4
votes
3answers
459 views

Nearest Neighbour classification algorithm

The following code is from a university assignment of mine to write a classification algorithm (using nearest neighbour) to classify whether or not a given feature set (each feature is the frequency ...
7
votes
3answers
3k views

Finding points with the shortest distance

This takes in a user specified number of points then finds the two points with the shortest distance. ...
2
votes
1answer
863 views

Validating a clustering document

My goal in the method is to write a simple clustering program that determines if a clustering document is valid based on these rules: All clusterings start with ...
5
votes
1answer
1k views

Document clustering / NLP

This is a solution from the CodeSprint 2012 coding competition. Any suggestions on ways to improve the object orientation or general style would be much appreciated. The code takes several text ...
5
votes
4answers
3k views

Grouping consecutive numbers into ranges in Python 3.2

The following is a function that I wrote to display page numbers as they appear in books. If you enter the list [1,2,3,6,7,10], for example, it would return: ...
11
votes
1answer
233 views

Is this a BSP tree? Am I missing something?

I've been practicing using BSP trees as I hear they are a good start to create procedural spaces such as houses. For ease-of-deployment purposes, I've tested this code in Lua and it seems to work. ...
5
votes
1answer
246 views

Performing machine learning

I've written the code below to do some work on machine learning in R. I'm not overly happy with some bits of it, and I suspect I could improve it quite a bit. Bits I'm specifically interested in ...