2
\$\begingroup\$

Is this the most efficient solution or can the performance be improved? Just opens a file and prints random words (lines with newlines stripped out).

import random

filename = "20k.txt"

def main(filename):
    words_file = open(filename, 'r')

    words = words_file.read().split("\n")
    temp_words = []
    for word in words:
        temp_words.append(word.strip("\r"))
    words = temp_words

    while True:
        length = int(raw_input("Words:"))
        for i in range(0,length):
            print(words[random.randrange(0,len(words)+1)]),
        print("\n")


if __name__ == "__main__":
    main(filename)
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

First, you should always close your files. Or, use the with keyword, which takes care of this for you.

Then, the 'r' mode is already the default mode for opening a file. Also, consider storing your words in a set, instead of a list (gets rid of duplicate words, which might be wanted, or not if you want a higher probability for some words).

You should consider using random.choice to get a random element from an iterable (actually an iterable with len defined).

I would also separate the file reading into a separate method. and make it more explicit what the prompt actually means (the number of words to generate).

I would use str.join and give it a generator expression rather than multiple calls to print.

import random

def get_words(file_name):
    with open(filename) as word_file:
        return list(set(word.strip() for word in word_file))


def random_words(n, words)
    return " ".join(random.choice(words) for _ in range(n))


def main(file_name):
    words = get_words(file_name)
    while True:
        n = int(raw_input("Number of words:"))
        print(random_words(n, words))

if __name__ == "__main__":
    file_name = "20k.txt"
    main(file_name)
\$\endgroup\$
7
  • \$\begingroup\$ Right, I forgot about the close. Using a set is a better option, yes. Didn't know about random.choice. Thanks a lot. \$\endgroup\$ Commented Oct 4, 2016 at 16:19
  • \$\begingroup\$ Your code ran: TypeError: 'set' object does not support indexing \$\endgroup\$ Commented Oct 4, 2016 at 16:23
  • \$\begingroup\$ What does .strip() do without any arguments? \$\endgroup\$ Commented Oct 4, 2016 at 16:27
  • \$\begingroup\$ @SamuelShifterovich It strips away whitespace from the edges, including all kinds of line-breaks. So " asgsg \n\r\n".strip() == 'asgsg'. \$\endgroup\$
    – Graipher
    Commented Oct 4, 2016 at 16:28
  • \$\begingroup\$ Your answer contains words_file = open(filename) and with open(filename) as word_file:. I guess the first line is a mistake? \$\endgroup\$ Commented Oct 4, 2016 at 16:30
1
\$\begingroup\$

You would greatly improve performance if you stop reading the whole 20k lines file into memory. The difference for 100 executions (3 words per execution) is 0.018 vs 0.793 seconds.

I suggest you to use word_file.seek method instead. It manipulates stream position to read just the word you need, not the whole file.

To get random position, use random.randint(0, EOF).

Details are here: https://github.com/artgromov/CorrectHorse.

I have made pull request to your repo.

\$\endgroup\$
1
  • \$\begingroup\$ I should've specified it's Python 2.7 \$\endgroup\$ Commented Oct 5, 2016 at 13:20

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.