Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

Suppose I have a list of objects sorted by a person based on that person's subjective taste. Let's assume that this person is being somewhat consistent in their sorting and that there are quantifiable attributes of these objects that I can identify. For example, a person might sort cheese based on how much they enjoy the taste, and people who disliked salt would rank salty cheese lower than cheeses that were low salt. One person might not care what the color of the cheese was or whether it came from cows, goats, or sheep, while another might rank sheep's milk cheeses higher than goat's milk cheeses.

I assume there is a procedure to identify what attributes of data elements in a list are and aren't relevant to the determining the order. What is the name for this problem, and what algorithms exist for solving it? (ideally in Python)

share|improve this question
    
So, based on a sorted data set, you want to know what sort criteria was applied. –  poke Dec 24 '13 at 22:38
    
I guess? Although it is not actually a direct sort. I'll come up with a better example in my question that makes it clearer. –  Jordan Reiter Dec 25 '13 at 0:50
    
Quick question: are you assuming the sort order is based on a single attribute or are multiple attributes allowed? If multiple attributes are being used, I see similarities to the use of Singular Value Decomposition as used in the Netflix prize (ala Simon Funk). The basic idea is that the user has some unknown linear weighting of the attributes that when you plug in the attributes of say the cheese into the weighted sum, out pops a number - in this case, your sorting number. Of course, with error, etc. This is a very high-powered approach, though, so trying to see if the approach fits. –  J Trana Dec 25 '13 at 5:05
    
@JTrana yeah this problem is similar. The idea is to be able to sort new lists based on the same criteria, so the things the people like best show up first in the list. –  Jordan Reiter Dec 27 '13 at 0:04

1 Answer 1

Since the sort isn’t necessarily perfect, the first thing that comes to mind is regression analysis.

For each property, you want to determine the strength of a correlation between the indices of the items (the response variable) and the values of that property. I think you can then assume that—or ask the user if—the sorted property is the one you found to have the strongest correlation.

If multiple properties might be considered in the sort, as they very well could be, you need to use multivariate regression. I guess you would consider the 2n possible subsets of all the properties, if n is small, or else only the size-k subsets.

share|improve this answer
    
This sounds right I think. I guess my concern is how computationally complex this might become, especially with a long list & a lot of variables. –  Jordan Reiter Dec 25 '13 at 3:33

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.