Grouping string to list of strings

Question

The following is function that I have written to group a string to groups, based on whether there's consecutive repeated occurrence. Fop example AAAABBBBAAB is grouped as [A+,B+,A+,B]. Is it possible to make below code more pythonic? If yes, how?

def create_groups(alphabets):
    """ function group the alphabets to list of A(+)s and B(+)s """
    index = 1
    current = alphabets[0]
    count = 0
    groups = []
    accumulate = False
    while index < len(alphabets):
        if current == alphabets[index]:
            count += 1
            accumulate = True
        else:
            accumulate = False
        if accumulate == False or index == len(alphabets)-1:
            group_indicator = current + '+' if count > 0 else current
            groups.append(group_indicator)
            current = alphabets[index]
            count = 0
        index += 1
    return groups

janos · Accepted Answer · 2014-08-24 14:27:20Z

First of all, your method is not really correct: for AAAABBBBAAB it returns [A+, B+, A+] instead of the required [A+, B+, A+, B]. That's because the last group is never added to the list of groups.

In terms of being Pythonic, don't write this:

if accumulate == False:

write like this:

if not accumulate:

Also, instead of iterating over the "alphabet" using indexes, it would be more Pythonic to rewrite to iterate over each letter, in the style for letter in alphabet.

"alphabets" is not a good name. It seems letters would be better.

The algorithm can be simplified, and you could eliminate several intermediary variables:

def create_groups(letters):
    """ function group the alphabets to list of A(+)s and B(+)s """
    prev = letters[0]
    count = 0
    groups = []
    for current in letters[1:] + '\0':
        if current == prev:
            count += 1
        else:
            group_indicator = prev + '+' if count > 0 else prev
            groups.append(group_indicator)
            count = 0
        prev = current
    return groups

In the for loop, I appended '\0' to the end, as a dirty trick to make the loop do one more iteration to append the last letter group to groups. For this to work, it must be a character that's different from the last letter in letters.

The above is sort of a "naive" solution, in the sense that probably there is a Python library that can do this easier. Kinda like what @jonrsharpe suggested, but he didn't complete the solution of converting [['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']] in the format that you need. Based on his solution, you could do something like this:

from itertools import groupby


def create_groups(letters):
    return [x + '+' if list(g)[1:] else x for x, g in groupby(letters, str)]

What I don't like about this is the way we put the letters in a list just to know if there are 2 or more of them (the list(g)[1:] step). There might be a better way.

"he didn't complete the solution" - this isn't a code-writing service, I thought I'd let the OP have some of the fun! — jonrsharpe, 11 hours ago

jonrsharpe · Answer 2 · 2014-08-24 14:10:09Z

You can simplify your logic significantly using itertools.groupby:

>>> from itertools import groupby
>>> [list(g) for _, g in groupby("AAAABBBBAAB", ord)]
[['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']]

Here ord groups by identical characters.

asked	yesterday
viewed	82 times
active	yesterday

current community

your communities

more stack exchange communities

Grouping string to list of strings

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python strings or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Grouping string to list of strings

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python strings or ask your own question.

Related

Hot Network Questions