Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
5
votes
2answers
76 views
Grouping rectangles horizontally and vertically
As you can see the below code for each method is that same, except for the properties it uses. For example X vs Y and ...
0
votes
0answers
15 views
Applying kmodes on every “column wise subset” of a dataframe
I want to apply kmodes for 2 clusters on every possible combination of columns from a dataframe. Finally, I want to compare the clusters with another column that ...
0
votes
0answers
61 views
K-Means Clustering - F# Learning Challenge
Inspired by this blog I went on implementing my own version as a F# learning challenge. It turned out to be quite different than the original (but somewhat faster for large samples).
The first code ...
14
votes
1answer
139 views
Dynamic Colour Binning: Grouping Similar Colours in Images
This is a piece of code that implements an image-processing algorithm I came up with. I call it Dynamic Colour Binning. It's a fairly academic exercise that was more about providing a learning ...
5
votes
1answer
99 views
k-means clustering algorithm implementation
Here is my personal implementation of the clustering k-means algorithm.
...
2
votes
1answer
96 views
Clustering similar tweets in a corpus
I am attempting to write a statistical program using an LDA model I've trained/created using Gensim. I am very new to Python and am a student level programmer. This current program is working and ...
4
votes
1answer
64 views
K-means clustering in Rust
I've implemented K-means clustering in Rust. It's my second Rust project (my first one is here: Randomly selecting an adjective and noun, combining them into a message)
I would like advice on ...
1
vote
1answer
41 views
Store and output hard-coded relationships among hosts
The following code has begun to smell, but I have not yet decided with what to replace it, other than, obviously, a database.
I made a very unsatisfactory workaround for my attempt to make ...
1
vote
1answer
65 views
DBSCAN in C++ for general and Android use
I've implemented a templated DBSCAN for general use. At the moment, it's going to be used on Android through the JNI. I used Wikipedia's pseudocode and a little bit of the DBSCAN paper for reference. ...
3
votes
1answer
101 views
PANDAS spatial clustering
I'am writing on a spatial clustering algorithm using pandas and scipy's kdtree. I profiled the code and the .loc part takes most time for bigger datasets. I wonder ...
2
votes
0answers
54 views
Cluster arrays according to similarity of key values
The below script will compare a set of arrays according to similarities between their key's values. For example, if the first 4 keys values of an array are equal to another array's first 4 keys values,...
8
votes
1answer
592 views
Implementing a fast DBScan in C#
I tried to implement a DBScan in C# using kd-trees. I followed the implementation from here.
...
12
votes
4answers
375 views
N closest points to the reference point
Here is working code to get N closest points to some reference point.
Please help to improve it, specifically by commenting on my use of std algorithms and ...
1
vote
0answers
169 views
Depth First Search for percolation to find clusters in Go game
I have some questions about Depth First Search and whether I implemented it correctly. Below is a more thorough discussion. The graph in question is a randomly colored square grid (I use 3 colors). ...
1
vote
0answers
24 views
Collaborative filtering to group similar users and products
I'm doing product recommendation module based on collaborative_filtering.
The recommendation will be generated by users, ...
2
votes
0answers
141 views
C# port of data mining algorithm much slower than reference implementation
I was trying to implement the algorithm specified in this research paper (please ignore the math, since it's irrelevant to the question). This algorithm is very basic in formal concept analysis. The ...
4
votes
1answer
1k views
Implementation of KNN in R
I have implemented the K-Nearest Neighbor algorithm with Euclidean distance in R. It works fine but takes tremendously huge time than the library function (get.knn). Please point out the possibility ...
3
votes
1answer
99 views
Simple string-root detection in a string-family
(This problem is related to Simple string-split by root and sufix algorithm)
There are many ways to find a "common root" of a list of similar strings, that begins with the same substring... The ...
7
votes
2answers
186 views
Finding the maximum pairwise difference in a collection of colors
Note that this problem is equivalent to finding the longest line segment defined by any two points in a collection of 3D coordinates, which may be an easier way to visualize the problem, and is almost ...
6
votes
3answers
386 views
Finding clusters in a matrix
I got asked at an interview to write a program that, given a NxM matrix with zeros and ones, prints out the list of clusters of 1s. The clusters are defined as patches of 1s connected horizontally, ...
2
votes
0answers
382 views
Discretization of continuous attributes for automatic classification [closed]
Background
In machine learning, it's common to encounter the problem of making a decision as to which discrete category an object belongs to based on a set of continuous attributes. For example, we ...
4
votes
4answers
1k views
Aggregate array values into ranges
In five minutes I made a pretty ugly looking function. Can you help before I have to commit the code into history?
Requirements:
I would like a function that takes an array of numbers, and ...
3
votes
2answers
9k views
K-means clustering algorithm in python
Here is my implementation of the k-means algorithm in python. I would love to get any feedback on how it could be improved or any logical errors that you may see. I've left off a lot of the ...
8
votes
3answers
250 views
Breadth-first search for clusters of pixels in a given color range
I am a beginner in programming languages, so I apologise if my code is badly formatted or doesn't make any sense.
My program gets an image and a RGB color range as input and it counts how many pixels ...
3
votes
1answer
1k views
Finding the shortest substring containing keywords
Problem: Write a function that takes a String document and a String[] keywords and returns the smallest substring of ...
7
votes
2answers
154 views
What are my highest activity streaks?
I have written the following query to figure out activity streaks on a per-user basis. I find it... Ugly... And would love to improve it!
Limitations
Those are explained as commented text at the ...
1
vote
0answers
59 views
A scalable function for get boundary vertices in a graph
Given a community division I need a list of vertices that have edges in more than one community, i.e., boundary vertices.
I've tried this:
...
2
votes
3answers
62 views
1
vote
0answers
282 views
Optimize QuadTree to find K Nearest Neighbors
I'm looking a way to make my k nearest neighbors search more efficient. The context of the question is that I'm given a list of topics that have a unique ID (integer) and a (x,y) coordinate (floats) ...
8
votes
1answer
826 views
K-Means in Rust
I have implemented for learning purposes a simple K-Means clustering algorithm in Rust. For those who are not familiar: you are given N points, say in the plane, ...
6
votes
1answer
3k views
9
votes
1answer
282 views
Bitmap problem performance boost
Problem statement:
Input is a rectangular bitmap like this:
0001
0011
0110
The task is to find for each black (0) "pixel", the distance to the
...
2
votes
1answer
184 views
Neighbours from point connections
I am working with a mesh of triangles (in 3D, although I doubt it makes a difference). The mesh is given as list of lists, each list containing the indices of the three vertices of a triangle in said ...
3
votes
1answer
499 views
Construct KD-Tree in Java
I want to construct KD-Tree from unsorted List of points in the plane (i.e. 2-KD-Tree). To do so I used the method from Wikipedia: http://en.wikipedia.org/wiki/Kd-tree#Construction This is my code:
<...
6
votes
2answers
167 views
Item-based collaborative filtering
I wrote the following code using Ruby to compute item-based collaborative filtering (using Jaccard coefficient of similarity).
I would appreciate feedback about any potential issues with the code ...
8
votes
2answers
182 views
Finding the nearest polygon
Finding the nearest polygon out of a list of polygons seems to be quite a challenge on performance, due to lack of optimization.
Consider the following image:
I'll give the concept behind this. ...
3
votes
1answer
1k views
Calculating the distance between one point, and many others
In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like:
...
12
votes
2answers
3k views
Possible optimizations for calculating squared euclidean distance
I need to do a few hundred million euclidean distance calculations every day in a Python project.
Here is what I started out with:
...
7
votes
2answers
13k views
K-means clustering in Python
The following code uses scikit-learn to carry out K-means clustering where \$K = 4\$, on an example related to wine marketing from the book DataSmart. That book uses excel but I wanted to learn Python ...
6
votes
3answers
1k views
Std lib-like C++ function to find nearest elements in a container
Initial problem
For a project I found myself in the need to search for elements in a container that are the closest neighbours to another precise element I have. In my case it was points in any ...
14
votes
2answers
701 views
Given a page of content, determine shortest snippet containing all search phrases (no order required)
A recruiter gave me a homework problem as a part of the recruiting process and after receiving my submission he told me that he decided not to proceed with me. When I asked for the reason, he told me ...
6
votes
1answer
610 views
Calculating Euclidean norm for each vector in a sparse matrix
Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space.
...
4
votes
1answer
13k views
K-nearest neighbours in MATLAB
I implemented K-Nearest Neighbours algorithm, but my experience using MATLAB is lacking. I need you to check the small portion of code and tell me what can be improved or modified. I hope it is a ...
4
votes
1answer
208 views
KMeans in the shortest and most readable format
I'm learning Python (coming from Java) so I decided to write KMeans as a practice for the language. However I want to see how could one improve the code and making it shorter and yet readable. I ...
4
votes
1answer
389 views
Nearest pair of points
Given a set of 2 dimensional points, it returns the two nearest points. If more pairs have the same min-distance between them, then an arbitrary choice is made. This program expects the points to be ...
6
votes
1answer
8k views
String Matching and Clustering
I have a pretty simple problem. A large set of unique strings that are to various degrees not clean, and in reality they in fact represent the same underlying reality. They are not person names, but ...
6
votes
3answers
4k views
k-nearest neighbors using MATLAB with MEX
I have implemented kNN (k-nearest neighbors) as follows, but it is very slow. I want to get an exact k-nearest-neighbor, not the approximate ones, so I didn't use the FLANN or ANN libraries.
...
18
votes
1answer
12k views
Finding the closest point to a list of points
I'm trying to find the closest point (Euclidean distance) from a user-inputted point to a list of 50,000 points that I have. Note that the list of points changes all the time. and the closest distance ...
10
votes
3answers
5k views
Density-based clustering of image keypoints
I have implemented the DBSCAN algorithm for clustering image keypoints. I have been following the pseudocode on the wiki page pretty strictly, and it's working, but I get the feeling its a very naive ...
4
votes
3answers
573 views
Nearest Neighbour classification algorithm
The following code is from a university assignment of mine to write a classification algorithm (using nearest neighbour) to classify whether or not a given feature set (each feature is the frequency ...