Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
2 votes
0 answers
116 views

Regression on Pandas DataFrame

I am working on the following assignment and I am a bit lost: Build a regression model that will predict the rating score of each product based on attributes which correspond to some very common ...
Stefano Pozzi's user avatar
3 votes
0 answers
578 views

Compute distance matrix using DTW acceptable for scipy.cluster.hierarchy

I am new to both data science and python. I have a dataset of the time-dependent samples, which I want to run agglomerative hierarchical clustering on them. I have found that Dynamic Time Warping (DTW)...
user3933607's user avatar
7 votes
1 answer
1k views

PANDAS nearest site algorithm

I have got CSVs full of property transactions in the UK from 1995 to 2017, separated by year such as "RS2015.csv". I have a 2nd CSV with a list of wind turbines in the UK. Both have coordinates in WGS ...
CTaylor19's user avatar
  • 173
6 votes
1 answer
604 views

Clustering points on a sphere

I have written a short Python program which does the following: loads a large data file (\$10^9+\$ rows) where each row is a point on a sphere. The code then loads a pre-determined triangular grid on ...
John's user avatar
  • 329
7 votes
1 answer
687 views

Similarity research : K-Nearest Neighbour(KNN) using a linear regression to determine the weights

I have a set of houses with categorical and numerical data. Later I will have a new house and my goal will be to find the 20 closest houses. The code is working fine, and the result are not so bad but ...
mitsi's user avatar
  • 173
5 votes
1 answer
2k views

KNN pipeline w/ cross_validation_scores

Using the wine quality dataset, I'm attempting to perform a simple KNN classification (w/ a scaler, and the classifier in a pipeline). It works, but I've never used ...
chrymxbrwn's user avatar
5 votes
1 answer
2k views

PANDAS spatial clustering

I'am writing on a spatial clustering algorithm using pandas and scipy's kdtree. I profiled the code and the .loc part takes most time for bigger datasets. I wonder ...
user96102's user avatar