5
votes
0answers
247 views
+100

Writing a tokenizer in Python

I want to design a custom tokenizer module in Python that lets users specify what tokenizer(s) to use for the input. For instance, consider the following input: Q: What is a good way to achieve ...
3
votes
0answers
205 views

How to create custom function in a QSqlDatabase: regex in sqlite and pyqt

We define the python type REGEXP function in sqlite and python following Problem with regexp python and sqlite How can we do the same thing in PyQT, ie. with a QSqlDatabase? More precisely, we use ...
2
votes
0answers
454 views

python regex sub space

CODE: word = 'aiuhsdjfööäö ; sdfdfd' word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\t\r\n\f(!){$}.+?|\]*""", word) ; print 'word= ', word ...
1
vote
0answers
51 views

Matching string across multiple regular expressions

I have a postgresql database with table that contains about 50 million entries of strings that are addresses. Example of strings are NIAID, Opportunist Infect Res Branch, Treatment Res Programs, Div ...
1
vote
0answers
63 views

OR'ed backreferences in regular expressions

Could someone please explain how this regular expression is supposed to work: ^(a)|\1$ ? I intepret it as: *1. start of string, followed by: *2. either: *2a. an a, or: *2b. the previously ...
0
votes
0answers
27 views

Django filter with weird regex

Going through some legacy code, I've stumbled upon this regex in a model filter: "[[:<:]](%s)[[:>:]]" % value I get that the inner square braces are matching literal square braces, but I ...
0
votes
0answers
61 views

Separate incoming data with regex

I have data that's coming one character at a time. For example: out = { than after some time: out = {"created_at":"Fri Apr 19 19:43:25 +0000 2013","id":325333836244" So what I want to do is to ...
0
votes
0answers
56 views

Regular expressions in POS tagged NLTK corpus

I'm loading a POS-tagged corpus in NLTK, and I would like to find certain patterns involving POS tags. These patterns can be quite complex, including a lot of different combinations of POS tags. ...
0
votes
0answers
28 views

how to search pattern after my last input, pexpect in case of interaction scenario

i am using pexpect(a python module) in my automation testing. But pexpect sometime doesn't work as what I expect, my case is that: telnet to a host execute command1, and wait for a pattern ...
0
votes
0answers
77 views

Regular expression parsing of HTML throws DjangoUnicodeDecodeError

Im just trying to do some web scrapping, using regular expressions, but learning regular expressions took me an hour, but this unicode curse took me a day. After scrapping something, I want to save ...
0
votes
0answers
37 views

Retrieve groups not filled by match.expand in Python

I'm building some sort of a CMS for Python from ground-up(yes, I know there are a ton, I just want to make my own, partly because I want to improve my Python skills). My problem is when routing ...
0
votes
0answers
45 views

uncode string not working with urlparse inspite of conversion to utf8

Im trying to find a substring 'foro.enfemenino.com' in the url str2 = 'http://foro.enfemenino.com/forum/f166/__f22092_f166-Servicio-tecnico-philips-en-castelldefells-tel-900-100-137.html#25144' ...
0
votes
0answers
93 views

How can I use regular expressions to remove docstrings from python source code?

Assuming that a given python source file contains perfectly indented code (using 4 whitespaces, not sure if it'll be relevant), how I can I write a python program that will use regular expressions to ...
0
votes
0answers
78 views

Split string into regex matches

Having multiple regexps that match certain part of a string (e.g. (?P<g1>a+) and (?P<g2>b+)), how to split string (e.g. aabcdcb) into such pieces: [{'g1': 'aa'}, {'g2': 'b'}, {'other': ...
0
votes
0answers
53 views

Python Script to add Code statements in Java Methods

I am looking for a python script which will let me use search replace to add 2 lines of code to all java methods in file using Notepad++ Before: public void startProcessing() { ...

1 2 3 4 5 11
15 30 50 per page