How to input a regex in string.replace in python?

Question

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

The required output is:

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

I've tried this:

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

I've also tried this (but it seems like I'm using the wrong regex syntax):

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

I dont want to hard-code the replace from 1 to 99 . . .

The accepted answer already covers your problem and solves it. Do you need anything else ? — HamZa, Jun 25 '13 at 14:02
What should be the result for where the<[99> number ranges from 1-100</[100>? — utapyngo, Jun 26 '13 at 5:19
it should also remove the number in the <...> tag, so the output should be where the number rangers from 1-100 ? — alvas, Jun 26 '13 at 6:12

ridgerunner · Accepted Answer · 2011-04-14 15:33:07Z

up vote 113 down vote accepted

+150

This tested snippet should do it:

import re
line = re.sub(r"</?\[\d+>", "", line)

Edit: Here's a commented version explaining how it works:

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

edited Apr 14 '11 at 15:33

answered Apr 14 '11 at 4:12

ridgerunner
17.8k33344

care to explain the regex? – alvas Apr 14 '11 at 6:35

yep it works!! thanks but can you explain the regex in brief? – alvas Apr 14 '11 at 6:39

3

Also don't neglect The Book on Regular Expressions - Mastering Regular Expressions, by Jeffrey Friedl – pcurry May 14 '13 at 5:05

@pcurry - I agree 110% - Most useful book I've ever read. – ridgerunner May 14 '13 at 5:48

add a comment |

Ignacio Vazquez-Abrams · Answer 2 · 2011-04-14 04:00:53Z

up vote 15 down vote

str.replace() does fixed replacements. Use re.sub() instead.

answered Apr 14 '11 at 4:00

Ignacio Vazquez-Abrams
389k47666845

2

Also worth noting that your pattern should look something like "</{0-1}\d{1-2}>" or whatever variant of regexp notation python uses. – bdares Apr 14 '11 at 4:05

What does fixed replacements mean? – avi Jul 3 at 11:35

add a comment |

kurumi · Answer 3 · 2011-04-14 04:06:17Z

don't have to use regular expression (for your sample string)

>>> s
'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'

>>> for w in s.split(">"):
...   if "<" in w:
...      print w.split("<")[0]
...
this is a paragraph with
 in between
 and then there are cases ... where the
 number ranges from 1-100
.
and there are many other lines in the txt files
with
 such tags

Ezequiel Marquez · Answer 4 · 2013-05-14 16:13:42Z

The easiest way

import re

txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'

out = re.sub("(<[^>]+>)", '', txt)
print out

Zac · Answer 5 · 2013-05-16 10:42:21Z

replace method of string objects does not accept regular expressions but only fixed strings (see documentation: http://docs.python.org/2/library/stdtypes.html#str.replace).

You have to use re module:

import re
newline= re.sub("<\/?\[[0-9]+>", "", line)

Boydon · Answer 6 · 2013-06-27 08:33:56Z

I would go like this (regex explained in comments):

import re

# If you need to use the regex more than once it is suggested to compile it.
pattern = re.compile(r"</{0,}\[\d+>")

# <\/{0,}\[\d+>
# 
# Match the character “<” literally «<»
# Match the character “/” literally «\/{0,}»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «{0,}»
# Match the character “[” literally «\[»
# Match a single digit 0..9 «\d+»
#    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the character “>” literally «>»

subject = """this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>"""

result = pattern.sub("", subject)

print(result)

If you want to learn more about regex I recomend to read Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan.

You could simply use * instead of {0,} – HamZa Jun 27 '13 at 10:39 — HamZa, Jun 27 '13 at 10:39
I think that {0,} is more readable. Just a matter of style – Boydon Jun 27 '13 at 11:31 — Boydon, Jun 27 '13 at 11:31

asked	4 years ago
viewed	63586 times
active	2 years ago

current community

your communities

more stack exchange communities

How to input a regex in string.replace in python?

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged python regex string replace or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

How to input a regex in string.replace in python?

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python regex string replace or ask your own question.

Linked

Related

Hot Network Questions