Count the frequency of words in a text file

Question

#!/usr/bin/python
file=open("C:/python27/python operators.txt","r+")

wordcount={}

for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

for k,v in wordcount.items():
    print k, v

Hi, could you please add some text detailing what the purpose of your code is, and what you're looking for in terms of feedback? — Zak, 19 hours ago
Please do add an introductory paragraph at the top of your post. Right now your post is summarized as "..." on question list pages. You can improve that and make your question more attractive my adding an introductory paragraph at the top. — janos♦, 8 hours ago

Barry · Answer 1 · 2015-09-04 12:26:26Z

Use a Context Manager

When dealing with something like a file, it's safer to write:

with open(filename) as file:
    ## code ##

That will automatically handle file closing correctly in case of exceptions and the like. I also find it clearer. Also, why r+?

Prefer Generators

file.read() reads the entire file into memory. Prefer to just go a line at a time by iterating through it.

wordcount.items() gives you a full list of all the items. But you don't need them all at once, you just need to iterate through them. For that there's iteritems().

Use the tools at your disposal

You have:

if word not in wordcount:
    wordcount[word] = 1
else:
    wordcount[word] += 1

That's a lot of code for a simple operation. What are you doing here? You're counting the incidence of word. But there's an app for that: collections.Counter:

wordcount = collections.Counter()
for line in file:
    for word in line.split():
        wordcount[word] += 1

Furthermore, we also have Counter.update, which lets us do:

for line in file:
    wordcount.update(line.split())

Full solution:

#!/usr/bin/python
import collections

wordcount = collections.Counter()
with open("C:/python27/python operators.txt") as file:
    for line in file:
        wordcount.update(line.split())

for k,v in wordcount.iteritems():
    print k, v

I think he only wants to work with basic data structures. – CodeYogi 4 hours ago — CodeYogi, 4 hours ago

SuperBiasedMan · Answer 2 · 2015-09-04 13:35:03Z

You can save this syntax:

if word not in wordcount:
    wordcount[word] = 1
else:
    wordcount[word] += 1

Using a defaultdict. Basically a dictionary that will create a new key with a default value if you try to access a non existent one. You tell it the default value by passing a callable object, in your case int would just set default values as 0.

It would simplify your code to his:

from collections import defaultdict

...

wordcount = defaultdict(int)

for word in file.read().split():
    wordcount[word] += 1

But actually, the collections module has an even more useful object for your purposes. A Counter object. It's basically a dictionary that is specialised to do exactly what you want, count instances of a key value in an iterable. You could just directly pass the list that your currently looping over and shorten your script to a few scant lines with it.

from collections import Counter

with open("C:/python27/python operators.txt") as f:
    wordcount = Counter(file.read().split())

Note I used with as suggested in another answer and used f instead of file as file is a built in object and you're shadowing it by using that name. Stick with f for files.

And lastly, you don't need to use 'r+' as your file mode unless you plan to write in it. The default mode is 'r' to just read the file, which is all you need here.

asked	today
viewed	138 times
active	today

current community

your communities

more stack exchange communities

Count the frequency of words in a text file

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python python-2.7 or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Count the frequency of words in a text file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python python-2.7 or ask your own question.

Linked

Related

Hot Network Questions