Newest 'clustering' Questions

2

votes

1answer

33 views

KNN algorithm implemented in Python

This is the first time I tried to write some code in Python. I think it gives proper answers but probably some "vectorization" is needed ...

asked yesterday

Newbie

946

4

votes

0answers

27 views

Implementation of DBSCAN in C++

I've recently just finished my implementation of a DBSCAN in C++ for a machine learning framework. I've tried to follow the pseudocode implementation on Wikipedia as best I could. I also found some ...

c++ performance algorithm clustering

asked 2 days ago

hateAngularJS

211

9

votes

2answers

319 views

+50

Selecting kids for a Christmas play with similar heights

I am doing this problem on SPOJ. The challenge is : My kid's kindergarten class is putting up a Christmas play. (I hope he gets the lead role.) The kids are all excited, but the teacher has a ...

c++ performance programming-challenge sorting clustering

asked Jan 20 at 9:27

Suraj Jain

966

1

vote

0answers

63 views

Implementation of a KNN in OCaml

I wrote the following implementation of the k-nearest neighbor algorithm (for a binary classification task). I am not familiar with OCaml's built in functions, I have the feeling that some of them ...

algorithm machine-learning clustering ocaml

asked Jan 16 at 19:23

RUser4512

451115

0

votes

0answers

25 views

Sentences Clustering - Affinity Propagation & Cosine Similarity - Python & SciKit

I am looking for advices regarding my code. I am interested about the correctness, legibility and minimality of the solution. ...

python machine-learning clustering

asked Jan 6 at 20:02

NMO

4516

5

votes

0answers

45 views

K-Means clustering in Python2

I've implemented the K-Means clustering algorithm in Python2, and I wanted to know what remarks you guys could make regarding my code. I've included a small test set with 2D-vectors and 2 classes, but ...

python performance python-2.7 reinventing-the-wheel clustering

asked Jan 6 at 17:05

BusyAnt

41813

3

votes

0answers

20 views

Closest points using Rabin randomizing approach

I was told to use the following Rabin algorithm to find the shortest distance between 2 points in 2D: Randomly choose sqrt(n) and brute force to find the closest ...

c++ performance algorithm random clustering

asked Dec 9 '16 at 16:17

dh16

1161

0

votes

0answers

55 views

Find k nearest points

I'm working on a problem to select k nearest points for a given point. Any advice for bugs, improvements are appreciated, including general advice to implement find nearest k points. My major idea is ...

python algorithm python-2.7 computational-geometry clustering

asked Dec 2 '16 at 8:35

Lin Ma

1,032212

2

votes

1answer

54 views

Zip code reduce function

My task is to write a function that would take an array of zip codes and spit out only the zip codes that do not qualify. A non-qualifying zip code will not exist in the database and does NOT have ...

php clustering geospatial

asked Nov 20 '16 at 20:35

kratos

1134

0

votes

1answer

129 views

Cosine similarity computation

I have a matrix of ~4.5 million vector [4.5mil, 300] and I want to calculate the distance between a vector of length 300 against all the entries in the matrix. I got some great performance time ...

python python-3.x numpy clustering scipy

asked Nov 15 '16 at 23:11

ajaanbaahu

213

3

votes

1answer

154 views

OpenCV 3: Using k-Nearest Neighbors to analyse RGB image

I'm new to computer vision and numpy. I wrote a simple script to seperate red, green and blue colors from the original image by using the kNN algorithm. After reading through some numpy tutorials, I'...

python python-3.x numpy clustering opencv

asked Oct 7 '16 at 11:55

dev-random

283

3

votes

1answer

494 views

Finding closest pair of 2D points, using divide-and-conquer

I'm learning C++ as well as algorithms. Here's my implementation of finding the closest pair problem. I tried to minimize memory by using only iterators. And points are being read from ...

c++ algorithm computational-geometry clustering divide-and-conquer

asked Oct 1 '16 at 1:01

Rafael Adel

166126

3

votes

0answers

65 views

Divide-and-conquer approach for finding the closest pair of points

This is an algorithm for finding the closest pair of points on a 2d plane by dividing the problem by half recursively, as illustrated here: ...

computational-geometry rust clustering divide-and-conquer

asked Sep 20 '16 at 22:09

qed

712520

5

votes

2answers

117 views

Calculating cooccurrence probabilities for pairs of words in a document

It is a 1.5 hour coding test, started the moment when the question was sent by email. My solution was done under the strict condition. I was not told anything before the test. The question is about ...

c++ algorithm interview-questions clustering natural-language-proc

asked Sep 7 '16 at 5:08

Student T

1737

5

votes

1answer

283 views

K-means clustering implemented in Python 3

Here is the classic K-means clustering algorithm implemented in Python 3. My main concern is time/memory efficiency and if there are version specific idioms that I could use to address issues of the ...

python performance algorithm python-3.x clustering

asked Sep 3 '16 at 8:49

Sorrop

924

3

votes

1answer

191 views

Clustering 16 million records in parallel

I have a dataset with 16 million rows and may increase upwards of 30 million. I am using the parLapply to run across three cores in R. But it's taking two days to ...

time-limit-exceeded r clustering machine-learning geospatial

asked Aug 2 '16 at 13:40

user2757228

213

2

votes

0answers

137 views

Solving the Mining algorithm from HackerRank

I was working on this problem for a few hours last night and finally came up with a brute-force solution. The task is to report the minimum work necessary (sum of weight × distance) to relocate gold ...

c# algorithm programming-challenge time-limit-exceeded clustering

asked Jul 31 '16 at 18:08

jamkin89

111

4

votes

1answer

78 views

“Similar Destinations” challenge

I am currently solving the Similar Destinations challenge on HackerRank and am in need of some assistance in the code optimization/performance department. The task is to take a list of up to 1000 ...

java algorithm programming-challenge time-limit-exceeded clustering

asked Jul 31 '16 at 17:34

cottonman

515

6

votes

1answer

155 views

Similarity research : K-Nearest Neighbour(KNN) using a linear regression to determine the weights

I have a set of houses with categorical and numerical data. Later I will have a new house and my goal will be to find the 20 closest houses. The code is working fine, and the result are not so bad but ...

python time-limit-exceeded pandas clustering machine-learning

asked Jul 27 '16 at 18:15

mitsi

335

3

votes

2answers

45 views

Getting the smallest snippet from content containing all keywords

This returns the smallest snippet from the content containing all of the given keywords (in any order). This provides the correct solution but I would like to know if it can be made more efficient. <...

python performance algorithm strings clustering

asked Jul 23 '16 at 7:26

nirvana

161

3

votes

0answers

64 views

KNN pipeline w/ cross_validation_scores

Using the wine quality dataset, I'm attempting to perform a simple KNN classification (w/ a scaler, and the classifier in a pipeline). It works, but I've never used ...

python python-2.7 pandas clustering

asked Jul 22 '16 at 23:37

srytoomanyquestions

161

5

votes

1answer

196 views

Clustering nodes with Hamming distance < 3

I want to speed up the following code, which is from an algorithm class. I get a list of 200000 nodes where every node is a tuple of the length of 24 where every item is either a 1 or 0. These ...

python performance graph clustering edit-distance

asked Jun 28 '16 at 7:39

David Michael Gang

1956

3

votes

1answer

81 views

Predict new ratings for each user based on their pearson correlation with other users

I am new to R and programming. I have a set of ratings for 45000 users and 40 odd movies. I need to predict new ratings for each user based on their pearson correlation with other users. I also need ...

beginner matrix r clustering

asked Jun 27 '16 at 19:44

RUser

183

5

votes

2answers

103 views

Grouping rectangles horizontally and vertically

As you can see the below code for each method is that same, except for the properties it uses. For example X vs Y and ...

c# computational-geometry clustering

asked Jun 16 '16 at 15:50

TheLethalCoder

222112

0

votes

0answers

21 views

Applying kmodes on every “column wise subset” of a dataframe

I want to apply kmodes for 2 clusters on every possible combination of columns from a dataframe. Finally, I want to compare the clusters with another column that ...

r clustering

asked Jun 11 '16 at 0:17

Churchill Nolan

1185

2

votes

1answer

174 views

K-Means Clustering - F# Learning Challenge

Inspired by this blog I went on implementing my own version as a F# learning challenge. It turned out to be quite different than the original (but somewhat faster for large samples). The first code ...

f# clustering

asked Jun 8 '16 at 21:50

Henrik Hansen

99329

14

votes

1answer

311 views

Dynamic Colour Binning: Grouping Similar Colours in Images

This is a piece of code that implements an image-processing algorithm I came up with. I call it Dynamic Colour Binning. It's a fairly academic exercise that was more about providing a learning ...

python object-oriented image clustering opencv

asked May 16 '16 at 14:00

Marco Tompitak

1713

5

votes

1answer

980 views

k-means clustering algorithm implementation

Here is my personal implementation of the clustering k-means algorithm. ...

python reinventing-the-wheel numpy clustering machine-learning

asked May 13 '16 at 20:27

Daniyal Shahrokhian

285

2

votes

1answer

250 views

Clustering similar tweets in a corpus

I am attempting to write a statistical program using an LDA model I've trained/created using Gensim. I am very new to Python and am a student level programmer. This current program is working and ...

python sorting dictionary clustering natural-language-proc

asked Apr 22 '16 at 19:44

Kenneth Orton

536

5

votes

1answer

135 views

K-means clustering in Rust

I've implemented K-means clustering in Rust. It's my second Rust project (my first one is here: Randomly selecting an adjective and noun, combining them into a message) I would like advice on ...

beginner rust clustering

asked Apr 21 '16 at 9:22

lochsh

606

1

vote

1answer

46 views

Store and output hard-coded relationships among hosts

The following code has begun to smell, but I have not yet decided with what to replace it, other than, obviously, a database. I made a very unsatisfactory workaround for my attempt to make ...

python python-3.x networking clustering redis

asked Apr 4 '16 at 9:54

Nathan Basanese

1066

1

vote

1answer

87 views

DBSCAN in C++ for general and Android use

I've implemented a templated DBSCAN for general use. At the moment, it's going to be used on Android through the JNI. I used Wikipedia's pseudocode and a little bit of the DBSCAN paper for reference. ...

c++ performance algorithm android clustering

asked Apr 1 '16 at 23:38

Ben

61

3

votes

1answer

276 views

PANDAS spatial clustering

I'am writing on a spatial clustering algorithm using pandas and scipy's kdtree. I profiled the code and the .loc part takes most time for bigger datasets. I wonder ...

python performance geospatial pandas clustering

asked Jan 29 '16 at 16:47

user96102

161

2

votes

0answers

110 views

Cluster arrays according to similarity of key values

The below script will compare a set of arrays according to similarities between their key's values. For example, if the first 4 keys values of an array are equal to another array's first 4 keys values,...

php array clustering

asked Jan 6 '16 at 9:57

Mohammad

1183

8

votes

1answer

878 views

Implementing a fast DBScan in C#

I tried to implement a DBScan in C# using kd-trees. I followed the implementation from here. ...

c# performance algorithm tree clustering

asked Oct 28 '15 at 3:49

John Tan

1552

15

votes

4answers

703 views

N closest points to the reference point

Here is working code to get N closest points to some reference point. Please help to improve it, specifically by commenting on my use of std algorithms and ...

c++ c++11 coordinate-system clustering

asked Oct 23 '15 at 23:35

Maxim Galushka

485213

1

vote

0answers

217 views

Depth First Search for percolation to find clusters in Go game

I have some questions about Depth First Search and whether I implemented it correctly. Below is a more thorough discussion. The graph in question is a randomly colored square grid (I use 3 colors). ...

python game clustering depth-first-search

asked Aug 17 '15 at 21:16

john mangual

16811

1

vote

0answers

28 views

Collaborative filtering to group similar users and products

I'm doing product recommendation module based on collaborative_filtering. The recommendation will be generated by users, ...

ruby clustering

asked Aug 7 '15 at 3:50

poc

36027

2

votes

0answers

150 views

C# port of data mining algorithm much slower than reference implementation

I was trying to implement the algorithm specified in this research paper (please ignore the math, since it's irrelevant to the question). This algorithm is very basic in formal concept analysis. The ...

c# performance matrix clustering data-mining

asked Jul 22 '15 at 4:59

sisck vabrigas

162

4

votes

1answer

2k views

Implementation of KNN in R

I have implemented the K-Nearest Neighbor algorithm with Euclidean distance in R. It works fine but takes tremendously huge time than the library function (get.knn). Please point out the possibility ...

performance matrix r clustering data-mining

asked Jun 23 '15 at 4:01

Arighna

234

3

votes

1answer

102 views

Simple string-root detection in a string-family

(This problem is related to Simple string-split by root and sufix algorithm) There are many ways to find a "common root" of a list of similar strings, that begins with the same substring... The ...

php strings clustering natural-language-proc

asked Jun 4 '15 at 21:49

Peter Krauss

1495

7

votes

2answers

228 views

Finding the maximum pairwise difference in a collection of colors

Note that this problem is equivalent to finding the longest line segment defined by any two points in a collection of 3D coordinates, which may be an easier way to visualize the problem, and is almost ...

c# performance computational-geometry clustering

asked Jun 1 '15 at 20:16

Oblivious Sage

1365

6

votes

3answers

730 views

Finding clusters in a matrix

I got asked at an interview to write a program that, given a NxM matrix with zeros and ones, prints out the list of clusters of 1s. The clusters are defined as patches of 1s connected horizontally, ...

c# interview-questions matrix clustering

asked May 3 '15 at 2:09

Alec Bryte

311

2

votes

0answers

641 views

Discretization of continuous attributes for automatic classification [closed]

Background In machine learning, it's common to encounter the problem of making a decision as to which discrete category an object belongs to based on a set of continuous attributes. For example, we ...

python performance clustering machine-learning

asked Apr 16 '15 at 16:35

BrassboundBatman

112

5

votes

4answers

2k views

Aggregate array values into ranges

In five minutes I made a pretty ugly looking function. Can you help before I have to commit the code into history? Requirements: I would like a function that takes an array of numbers, and ...

php array interval clustering

asked Feb 9 '15 at 20:10

William George

1264

4

votes

2answers

16k views

K-means clustering algorithm in python

Here is my implementation of the k-means algorithm in python. I would love to get any feedback on how it could be improved or any logical errors that you may see. I've left off a lot of the ...

python algorithm clustering

asked Feb 9 '15 at 14:34

aus_lacy

143116

8

votes

3answers

258 views

Breadth-first search for clusters of pixels in a given color range

I am a beginner in programming languages, so I apologise if my code is badly formatted or doesn't make any sense. My program gets an image and a RGB color range as input and it counts how many pixels ...

optimization c image clustering breadth-first-search

asked Jan 18 '15 at 19:01

greenro

433

3

votes

1answer

2k views

Finding the shortest substring containing keywords

Problem: Write a function that takes a String document and a String[] keywords and returns the smallest substring of ...

java strings clustering

asked Jan 14 '15 at 5:55

David Grinberg

22329

7

votes

2answers

164 views

What are my highest activity streaks?

I have written the following query to figure out activity streaks on a per-user basis. I find it... Ugly... And would love to improve it! Limitations Those are explained as commented text at the ...

sql sql-server t-sql stackexchange clustering

asked Dec 24 '14 at 23:23

Phrancis

13k434120

1

vote

0answers

63 views

A scalable function for get boundary vertices in a graph

Given a community division I need a list of vertices that have edges in more than one community, i.e., boundary vertices. I've tried this: ...

python graph complexity clustering

asked Dec 11 '14 at 16:17

Alan Valejo

1061

your communities

Tagged Questions

Related Tags