I am learning how to cluster large datasets and this maybe a newbie question, but I haven’t found a suitable answer in the Documentation or on this board.
My question is best illustrated by an example. Let’s import some data:
irisData = Import["http://aima.cs.berkeley.edu/data/iris.csv", "CSV”];
(*Column labels *)
pLabels = {"Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Type”};
How do I construct clusters based on only two columns say “Sepal.Length” and “Sepal.Width” and still retain the rest of the data… as part of the clusters
Ideally I would like a function:
findClusters[data,n,columnlist]
that would cluster the data into n clusters using the data in the columnlist such that output data format of the clusters is the same as the input data.
I can find the clusters and plot them as:
clusters = {cluster1, cluster2, cluster3} = FindClusters[irisData[[All, 1 ;; 2]], 3];
ListPlot[{{#1, #2} & @@@ cluster1, {#1, #2} & @@@ cluster2, {#1, #2} & @@@ cluster3}, AspectRatio -> 1, ImageSize -> 200, FrameLabel -> pLabels[[1 ;; 2]]]
But I’d like to use the clusters that I found using FindClusters above to explore say “Sepal.Length” vs “Petal.Width” or some other data analysis.