As the first person here, it's my responsibility to post this.
You can't parse [X]HTML with regex.
Because HTML can't be parsed by regex.
Regex is not a tool that can be used
to correctly parse HTML. As I have
answered in HTML-and-regex questions
here so many times before, the use of
regex will not allow you to consume
HTML. Regular expressions are a tool
that is insufficiently sophisticated
to understand the constructs employed
by HTML. HTML is not a regular
language and hence cannot be parsed by
regular expressions. Regex queries are
not equipped to break down HTML into
its meaningful parts. so many times
but it is not getting to me. Even
enhanced irregular regular expressions
as used by Perl are not up to the task
of parsing HTML. You will never make
me crack. HTML is a language of
sufficient complexity that it cannot
be parsed by regular expressions. Even
Jon Skeet cannot parse HTML using
regular expressions. Every time you
attempt to parse HTML with regular
expressions, the unholy child weeps
the blood of virgins, and Russian
hackers pwn your webapp. Parsing HTML
with regex summons tainted souls into
the realm of the living. HTML and
regex go together like love, marriage,
and ritual infanticide. The
<center> cannot hold it is too
late. The force of regex and HTML
together in the same conceptual space
will destroy your mind like so much
watery putty. If you parse HTML with
regex you are giving in to Them and
their blasphemous ways which doom us
all to inhuman toil for the One whose
Name cannot be expressed in the Basic
Multilingual Plane, he comes.
HTML-plus-regexp will liquify the
nerves of the sentient whilst you
observe, your psyche withering in the
onslaught of horror. Rege̿̔̉x-based
HTML parsers are the cancer that is
killing StackOverflow it is too
late it is too late we cannot be
saved the trangession of a chi͡ld
ensures regex will consume all living
tissue (except for HTML which it
cannot, as previously prophesied)
dear lord help us how can anyone
survive this scourge using regex
to parse HTML has doomed humanity to
an eternity of dread torture and
security holes using regex as a
tool to process HTML establishes a
breach between this world and
the dread realm of c͒ͪo͛ͫrrupt
entities (like SGML entities, but
more corrupt) a mere glimpse of
the world of regex parsers for
HTML will instantly transport a
programmer's consciousness into
a world of ceaseless screaming,
he comes, the pestilent
slithy regex-infection
will devour your HTML parser,
application and existence for all time
like Visual Basic only worse he
comes he comes do not
fight he com̡e̶s, ̕h̵is
un̨ho͞ly radiańcé destro҉ying all
enli̍̈́̂̈́ghtenment, HTML tags
lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸
̛l̕ik͏e liquid pain, the song
of re̸gular expression
parsing will extinguish
the voices of mortal man from the
sphere I can see it can you see
̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful
the final snuf
fing
of the lies of Man ALL IS
LOŚ͖̩͇̗̪̏̈́T ALL IS
LOST the pon̷y he comes he
c̶̮omes he
comes
the ichor permeates
all MY FACE MY FACE ᵒh god
no NO NOO̼OO NΘ
stop the
an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s
͎a̧͈͖r̽̾̈́͒͑e
not
rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ
ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉
͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘
̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
--- Have you tried using an XML parser instead?