Python Removing last character _ from string using regex

Question

I know there are a bunch of other regex questions, but I was hoping someone could point out what is wrong with my regex. I have done some research into it and it looks like it should work. I used rubular to test it, yes I know that is regex for ruby, but the same rules I used should apply to python from what it looks like in the python docs

Currently I have

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.sub(r'[A-Z]+', "", x) for x in a]

which returns

['SDFSD_SFSDF', 'SDFSDF_SDFSDF_', 'TSFSD_SDF_']

But I want it to return

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

I try to use this regex

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

but I am getting this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib64/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression

Can anyone help me figure out what I am doing wrong?

That's now what your code returns: c should be ['_234234', '__234324', '__213123']. — arshajii, Jul 17 '13 at 21:59

mr2ert · Accepted Answer · 2013-07-17 22:11:09Z

The error in:

c = [re.sub(r'$?_[^A-Z_]+', "", x) for x in a]

Is caused by the ?, it is not preceded by any characters so it doesn't know what to match 0 or 1 times. If you change it to:

>>> [re.sub(r'_?[^A-Z_]+$', "", x) for x in a]
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

It works as you expect.

Another thing, $ is used to detonate the end of the line, so it probably shouldn't be the first character.

Zoran Pavlovic · Answer 2 · 2013-07-17 22:04:49Z

import re

a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
c = [re.match(r'[A-Z_]+[A-Z]', x).group() for x in a]

print c

Results:

['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']

Please note, that "re.sub" which you use in your example is a regex replace command, not a search. Your regex seems to be matching for what you're asking for, not what you're trying to get rid of to get what you're asking for.

yoni · Answer 3 · 2013-07-17 22:15:47Z

You could insert 'lookahead' into your regexp. Written as (?=...) your regexp will match only text followed by whatever you put in the …. So in your case you could choose to ignore the underscore unless it is followed by [A-Z]. Your reg exp will look like this: r'[A-Z]+_(?[A-Z])' so an underscore not followed by letters will be ignored.

Trevor Senior · Answer 4 · 2013-07-17 22:25:51Z

Without regex using rstrip:

a = ["ends_with_underscore_", "does_not", "multiple_____"]
b = [ x.rstrip("_") for x in a]
print b

>> ['ends_with_underscore', 'does_not', 'multiple']

iCodez · Answer 5 · 2013-07-17 23:13:31Z

>>> import re
>>> a = ["SDFSD_SFSDF234234","SDFSDF_SDFSDF_234324","TSFSD_SDF_213123"]
>>> c = [re.sub('_?\d+','',x) for x in a]
>>> c
['SDFSD_SFSDF', 'SDFSDF_SDFSDF', 'TSFSD_SDF']
>>>

It's short and simple. Basically, it's saying "replace everything that is a stream of digits or a stream of digits preceded by an _".

asked	7 months ago
viewed	458 times
active	7 months ago

current community

your communities

more stack exchange communities

Python Removing last character _ from string using regex

5 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged python regex or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Python Removing last character _ from string using regex

5 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python regex or ask your own question.

Related

Hot Network Questions