3

I am working on a simple CSS parser in Python. Right now I want to extact all values from this string: "1px solid rgb(255, 255, 255)". Right now my pattern (which is not working) is: "\S+[^rgb]+". When I use it with string "1px solid rgb(255, 255, 255)", I get following:

...
>>> re.findall("\S+[^rgb]+", string)
("1px solid", "rgb(255, 255, 255)")

And I want it to be

("1px", "solid", "rgb(255, 255, 255)")

P.S. Also, is there a better way for parsing CSS declaration? Currently my pattern is "[\s]?(\S+)[\s]?:[\s]?(.+)[\s]?;". Parsing "color: red;" gives me:

("color", "red")

2 Answers 2

2

You can try this:

(\S+)[ ]+(?:(\S+)[ ]+)?(rgb\([^)]+\))

http://regex101.com/r/vA4kH1

EDIT: Whatever you're trying to do, this is probably not the right way to handle it, because CSS syntax can be unpredictable. You can use tinycss, the Python CSS parser for something more sane:

http://pythonhosted.org/tinycss/

One last edit...

As per your solution, you're doing a findAll, which puts them in an array separately. You only need rgb() in there once, ignoring the space. This should work for the value pattern, which is cleaner than what you have. And also note, that you don't want to use "." for your rgb() expression. If you have rgb() 1px rgb() on the same line, regexes are greedy by default...it'll match as much as it can. Try this: r"(rgb([^)]+))|(\S+))"

7
  • I am not sure how it is supposed to work. It just extracts all (num, num, num) from the text
    – JadedTuna
    Oct 25, 2013 at 21:09
  • Oh, I thought you meant values as in the numeric values. What exactly do youmean by "values" for your sample string?
    – sdanzig
    Oct 25, 2013 at 21:11
  • Oh, sorry. My fault. Please check my modified answer, I wrote which output do I actually need
    – JadedTuna
    Oct 25, 2013 at 21:12
  • Whoops I ran into another problem. Whenether I try to use it with string "1px rgb(255, 255, 255)" it gives me an empty list.
    – JadedTuna
    Oct 25, 2013 at 21:21
  • Unlucky. If I use more arguments (ex "1px solid blah rgb(255, 255, 255)"), it produces ["solid", "blah", "rgb(255, 255, 255)"] # No '1px' here.
    – JadedTuna
    Oct 25, 2013 at 21:29
1

Ok. I got it working (I hope). Here is the final code.


EDIT

After long and boring reading of the manual I finally got it working properly: "rgb\([^)]*\)|\S+"

2
  • I don't understand why you repeated the rgb() expression, and have those .'s before and after. But yeah, for your case, definitely easier to match one token at a time. I actually did give an attempt to have a more flexible expression out of curiosity, but my effort fell flat: stackoverflow.com/questions/19600204/…
    – sdanzig
    Oct 25, 2013 at 22:14
  • @sdanzig, I repear rgb's two times to make it match rgb(...) before and after other text (like "solid", "1px")
    – JadedTuna
    Oct 25, 2013 at 22:16

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.