Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

This little program is self-explanatory. I count letters in a string (can be any string), using a for loop to iterate through each letter. The problem is that this method is very slow and I want to avoid loops.

Any ideas? I thought that maybe if I remove checked letters from the string after each loop, then in some cases, where many letters repeat, that would make a difference.

def count_dict(mystring):
    d = {}
# count occurances of character
    for w in mystring: 
        d[w] = mystring.count(w)
# print the result
    for k in sorted(d):
        print (k + ': ' + str(d[k]))

mystring='qwertyqweryyyy'
count_dict(mystring)

The output:

e: 2
q: 2
r: 2
t: 1
w: 2
y: 5
share|improve this question

2 Answers 2

up vote 4 down vote accepted

Use the built in Counter in the collections module:

>>> from collections import Counter
>>> Counter('qwertyqweryyyy')
Counter({'y': 5, 'e': 2, 'q': 2, 'r': 2, 'w': 2, 't': 1})
share|improve this answer
1  
Wow, I did not know about Counter! Thanks for the hint. Suppose there wasn't a Counter() method. What would be an alternative? –  casper Jun 26 '13 at 6:44
1  
I've added that for you. –  Josay Jun 26 '13 at 21:43

Counter is definitely the way to go (and I've upvoted Jaime's answer).

If you want to do it yourself and iterate only once, this should work :

d={}
for l in s:
        d[l] = d.get(l,0) + 1

There might be a short/more pythonic way to do so but it works...

Edit : I must confess that Jaime's comment to this answer surprised me but I've just tested this code :

from profilehooks import profile

s="qwertyuiopasdfghjklzxcvbnm"

@profile
def function1(s):
        d={}
        for l in s:
                d[l] = d.get(l,0)+1
        return d

@profile
def function2(s):
        return dict((char_, s.count(char_)) for char_ in set(s))

for i in xrange(0,200):
        function1(s*i)
        function2(s*i)

And the results can hardly be contested :

*** PROFILER RESULTS ***
function2 (./fsdhfsdhjk.py:13)
function called 200 times

         10948 function calls in 0.161 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    0.083    0.000    0.161    0.001 fsdhfsdhjk.py:13(function2)
     5374    0.033    0.000    0.077    0.000 fsdhfsdhjk.py:15(<genexpr>)
     5174    0.044    0.000    0.044    0.000 {method 'count' of 'str' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)



*** PROFILER RESULTS ***
function1 (./fsdhfsdhjk.py:6)
function called 200 times

         517800 function calls in 2.891 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    1.711    0.009    2.891    0.014 fsdhfsdhjk.py:6(function1)
   517400    1.179    0.000    1.179    0.000 {method 'get' of 'dict' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)

TL;DR Jaime's solution (function2) is 18 times faster than mine (function1).

share|improve this answer
2  
This would be the right way of doing things in a language such as C. Because Python loops have so much overhead, it actually may turn out to be faster to do something like: char_counts = dict((char_, test_string.count(char_)) for char_ in set(test_string)) It does run multiple times over the string, but because the loops run in C, not in Python, it is faster. –  Jaime Jun 26 '13 at 23:12
    
I am really really impressed. I've updated my answer accordingly. –  Josay Jun 26 '13 at 23:32
    
I tested it on a 10**6 character string and there it was only 4x faster. It's one of the few things I don't like about Python, that sometimes optimizing code is not about writing the most efficient algorithm, but about figuring out which built-in functions run faster. –  Jaime Jun 27 '13 at 3:38
    
Yes, as function2 loops roughly (x+1) times on the string (x being the number of different characters), I can imagine that the performance gain, compared to function 1 looping only once, gets smaller as x and the string get bigger. Still, this is a damn long string :-) –  Josay Jun 27 '13 at 6:31
    
Thanks guys. This was very enlightening –  casper Jun 28 '13 at 10:13

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.