Counting letters in a string: Python optimisation

Question

This little program is self-explanatory. I count letters in a string (can be any string), using a for loop to iterate through each letter. The problem is that this method is very slow and I want to avoid loops.

Any ideas? I thought that maybe if I remove checked letters from the string after each loop, then in some cases, where many letters repeat, that would make a difference.

def count_dict(mystring):
    d = {}
# count occurances of character
    for w in mystring: 
        d[w] = mystring.count(w)
# print the result
    for k in sorted(d):
        print (k + ': ' + str(d[k]))

mystring='qwertyqweryyyy'
count_dict(mystring)

The output:

e: 2
q: 2
r: 2
t: 1
w: 2
y: 5

Jaime · Accepted Answer · 2013-06-25 22:53:21Z

up vote 4 down vote accepted

Use the built in Counter in the collections module:

>>> from collections import Counter
>>> Counter('qwertyqweryyyy')
Counter({'y': 5, 'e': 2, 'q': 2, 'r': 2, 'w': 2, 't': 1})

answered Jun 25 '13 at 22:53

Jaime
1,407210

1

Wow, I did not know about Counter! Thanks for the hint. Suppose there wasn't a Counter() method. What would be an alternative? – casper Jun 26 '13 at 6:44

1

I've added that for you. – Josay Jun 26 '13 at 21:43

add a comment |

Josay · Answer 2 · 2013-06-26 23:32:06Z

Counter is definitely the way to go (and I've upvoted Jaime's answer).

If you want to do it yourself and iterate only once, this should work :

d={}
for l in s:
        d[l] = d.get(l,0) + 1

There might be a short/more pythonic way to do so but it works...

Edit : I must confess that Jaime's comment to this answer surprised me but I've just tested this code :

from profilehooks import profile

s="qwertyuiopasdfghjklzxcvbnm"

@profile
def function1(s):
        d={}
        for l in s:
                d[l] = d.get(l,0)+1
        return d

@profile
def function2(s):
        return dict((char_, s.count(char_)) for char_ in set(s))

for i in xrange(0,200):
        function1(s*i)
        function2(s*i)

And the results can hardly be contested :

*** PROFILER RESULTS ***
function2 (./fsdhfsdhjk.py:13)
function called 200 times

         10948 function calls in 0.161 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    0.083    0.000    0.161    0.001 fsdhfsdhjk.py:13(function2)
     5374    0.033    0.000    0.077    0.000 fsdhfsdhjk.py:15(<genexpr>)
     5174    0.044    0.000    0.044    0.000 {method 'count' of 'str' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)



*** PROFILER RESULTS ***
function1 (./fsdhfsdhjk.py:6)
function called 200 times

         517800 function calls in 2.891 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      200    1.711    0.009    2.891    0.014 fsdhfsdhjk.py:6(function1)
   517400    1.179    0.000    1.179    0.000 {method 'get' of 'dict' objects}
      200    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)

TL;DR Jaime's solution (function2) is 18 times faster than mine (function1).

This would be the right way of doing things in a language such as C. Because Python loops have so much overhead, it actually may turn out to be faster to do something like: char_counts = dict((char_, test_string.count(char_)) for char_ in set(test_string)) It does run multiple times over the string, but because the loops run in C, not in Python, it is faster. — Jaime, Jun 26 '13 at 23:12
I am really really impressed. I've updated my answer accordingly. — Josay, Jun 26 '13 at 23:32
I tested it on a 10**6 character string and there it was only 4x faster. It's one of the few things I don't like about Python, that sometimes optimizing code is not about writing the most efficient algorithm, but about figuring out which built-in functions run faster. — Jaime, Jun 27 '13 at 3:38
Yes, as function2 loops roughly (x+1) times on the string (x being the number of different characters), I can imagine that the performance gain, compared to function 1 looping only once, gets smaller as x and the string get bigger. Still, this is a damn long string :-) — Josay, Jun 27 '13 at 6:31

asked	1 year ago
viewed	5632 times
active	5 months ago

current community

your communities

more stack exchange communities

Counting letters in a string: Python optimisation

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python strings or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Counting letters in a string: Python optimisation

2 Answers 2

Did you find this question interesting? Try our newsletter

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python strings or ask your own question.

Related

Hot Network Questions