Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

My aim is to sort a list of strings where words have to be sorted alphabetically. Words starting with s should be at the start of the list (they should be sorted as well), followed by the other words.

The below function does that for me.

def mysort(words):
    mylist1 = sorted([i for i in words if i[:1] == "s"])
    mylist2 = sorted([i for i in words if i[:1] != "s"])
    list = mylist1 + mylist2
    return list

I am looking for better ways to achieve this, or if anyone can find any issues with the code above.

share|improve this question

4 Answers

up vote 8 down vote accepted

Some notes about your code:

  • sorted([i for i in words if i[:1] == "s"]): No need to generate an intermediate list, drop those [ ] and you'll get a lazy generator expression instead.

  • mylist1. This my prefix looks pretty bad. Do other languages use them?

  • list: Red flag. Don't overshadow a built-in (at least not one so important) with a variable.

  • return list. You can directly write return mylist1 + mylist2, no gain in naming the value you're about to return (the name of the function should inform about that).

  • i[:1] == "s" -> i.startswith("s").

Now what I'd write. Using sorted with the argument key (Schwartzian transform) and taking advantage of the lexicographical order defined by tuples:

def mysort(words):
    return sorted(words, key=lambda word: (0 if word.startswith("s") else 1, word))

If you are familiar with the fact that False < True, it can be simplified a bit more:

sorted(words, key=lambda word: (not word.startswith("s"), word))

Here's another way using the abstraction partition:

words_with_s, other_words = partition(words, lambda word: word.startswith("s"))
sorted(words_with_s) + sorted(other_words)

Writing partition is left as an exercise for the reader :-)

share|improve this answer
Dang, I was just writing up the same thing! +1 :-) – tobias_k Jul 12 at 14:00
@tobias_k: sorry ;-) I was a bit suprised no one had proposed something similar yet, this is pretty standard stuff. – tokland Jul 12 at 14:01
1  
No problem, but two times the same answer within 2 minutes for a 8 hours old question... how probable is this? – tobias_k Jul 12 at 14:02

For this snippet, I would have probably written the more compact:

def mysort(words):
    return (sorted([i for i in words if i[0] == "s"]) +
            sorted([i for i in words if i[0] != "s"]))

(You're not supposed to align code like this in Python, but I tend to do it anyway.) Note that I wrote i[0] and not i[:1] - I think the former is clearer. Using str.startswith() is even better.

Also, it is generally considered bad practice to use list as a variable name.

However, your algorithm iterates the list at least three times: Once to look for the words starting with s, once to look for the words not starting with s and then finally O(n log n) times more to sort the two lists. If necessary, you can improve the algorithm to do just one pass before sorting, by populating both lists simultaneously:

def prefix_priority_sort(words, special_prefix = "s"):
    begins_with_special = []
    not_begin_with_special = []

    for word in words:
        if word.startswith(special_prefix):
            begins_with_special.append(word)
        else:
            not_begin_with_special.append(word)

    return sorted(begins_with_special) + sorted(not_begin_with_special)

However, the best way is to define your own comparator and pass it to the sorting function, like mariosangiorgio suggests. In Python 3, you need to pass in a key function, see the docs for details, or this article.

Depending on execution speed requirements, list sizes, available memory and so on, you might want to pre-allocate memory for the lists using [None] * size. In your case it is probably premature optimization, though.

share|improve this answer
4  
Note that i[0] will fail for empty strings, while i[:1] will not. Agreed that startswith is best. – tobias_k Jul 12 at 14:04
Your mysort needs another pair of parentheses, or a backslash before the newline. – Gareth Rees Jul 12 at 14:48
@tobias_k Ah, good point! – Lstor Jul 12 at 16:48
@GarethRees Thanks, fixed. – Lstor Jul 12 at 16:49
What makes you say you're not supposed to align code like that? PEP-8 does it. – flornquake Jul 16 at 0:53

Lastor already said what I was going to point out so I am not going to repeat that. I'll just add some other things.

I tried timing your solution with a bunch of other solutions I came up with. Among these the one with best time-memory combination should be the function mysort3 as it gave me best timing in nearly all cases. I am still looking about proper timing in Python. You can try putting in different test cases in the function tests to test the timing for yourself.

def mysort(words):
    mylist1 = sorted([i for i in words if i[:1] == "s"])
    mylist2 = sorted([i for i in words if i[:1] != "s"])
    list = mylist1 + mylist2
    return list

def mysort3(words):
    ans = []
    p = ans.append
    q = words.remove
    words.sort()
    for i in words[:]:
        if i[0] == 's':
            p(i)
            q(i)
    return ans + words

def mysort4(words):
    ans1 = []
    ans2 = []
    p = ans1.append
    q = ans2.append
    for i in words:
        if i[0] == 's':
            p(i)
        else:
            q(i)
    ans1.sort()
    ans2.sort()
    return ans1 + ans2

def mysort6(words):
    return ( sorted([i for i in words if i[:1] == "s"]) +
              sorted([i for i in words if i[:1] != "s"])
             )

if __name__ == "__main__":
    from timeit import Timer
    def test(f):
        f(['a','b','c','abcd','s','se', 'ee', 'as'])

    print Timer(lambda: test(mysort)).timeit(number = 10000)
    print Timer(lambda: test(mysort3)).timeit(number = 10000)
    print Timer(lambda: test(mysort4)).timeit(number = 10000)
    print Timer(lambda: test(mysort6)).timeit(number = 10000)
share|improve this answer
Thanks for the various options provided – misguided Jul 14 at 22:08

If you want to sort the list according to your own criterion probably the best choice is to write your own comparator.

A comparator function takes as parameters two values and has to return -1 if the first one is the smaller, 0 if they are equal and +1 if the second is the bigger.

In your case you should first check wether one of the two parameters starts with 's'. If it does manage it properly, otherwise fall back to standard string ordering.

This will turn your code in a single call to the sort function plus the definition of the comparison function.

If you want an example of how comparators can be implemented have a look at the answers of this StackOVerflow Q&A

share|improve this answer
+1. While not applicable for Python 3, it's still a very valid point. – Lstor Jul 12 at 8:02
Are you sure? IMO sorted along the key argument is on a higher level of abstraction than defining a comparator thus should be generally preferred. en.wikipedia.org/wiki/Schwartzian_transform – tokland Jul 12 at 14:24
@tokland Python 2 takes a comparator function, Python 3 takes a key function, if that's what you mean? – Lstor Jul 12 at 16:50
AFAIK sort (in-place) has always taken a comparator function and sorted (not in-place) a key function, but I am not 100% sure. – tokland Jul 12 at 18:08
@tokland Ah, you are more or less right regarding sorted. In Python 2.7, it takes a comparator function and a key function: docs.python.org/2.7/library/… – Lstor Jul 12 at 18:16

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.