What is the best way to parse html in C#? [closed]

Question

I'm looking for a library/method to parse an html file with more html specific features than generic xml parsing libraries.

This is almost an exact duplicate: stackoverflow.com/questions/100358/looking-for-c-html-parser

Compile This · Accepted Answer · 2008-09-11 10:17:08Z

up vote 76 down vote accepted

I used the HTMLAgilityPack on a project for a previous employer and it was pretty effective. It wasn't foolproof, but it did handle most of the malformed tags, etc. that you find on the web these days.

answered Sep 11 '08 at 10:17

Compile This
3,88621215

Very handy library, thanks... And much easier for me to figure out than the mshtml. – Alex Baranosky Jan 9 '09 at 9:22

1

Is this still the best option, almost two years on from when you answered the question? I'll check it out all the same though. – Drew Noakes Aug 7 '10 at 12:54

1

I'm no longer using this project on a day-to-day basis, but it looks well maintained, with new features such as LINQ to Objects in beta, and under active development. Definitely still worth evaluating. – Compile This Aug 10 '10 at 13:46

Mark Cidade · Answer 2 · 2008-09-19 08:05:44Z

up vote 138 down vote

Html Agility Pack

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

answered Sep 19 '08 at 8:05

Mark Cidade
45.1k10119165

	Excellent, that sounds more appropriate than Zeta – Rahul Sep 19 '08 at 8:09
	The latest beta of HTML Agility Pack looks promising, too. – John Kaster Jan 13 '10 at 6:05
	It's worth noting that it doesn't deal well with self-closing tags like <p> (which it interprets as empty) and really badly with optional end-tags like <li> (which it interprets as missing an end tag, and so nests consecutive li tags). – Eamon Nerbonne May 14 '11 at 16:48
	plus 1 for Html Agility Pack. I have good experience using it – gyurisc Sep 2 '11 at 9:02

Erlend · Answer 3 · 2008-09-11 10:35:12Z

up vote 27 down vote

You could use TidyNet.Tidy to convert the HTML to XHTML, and then use an XML parser.

Another alternative would be to use the builtin engine mshtml:

using mshtml;
...
object[] oPageText = { html };
HTMLDocument doc = new HTMLDocumentClass();
IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
doc2.write(oPageText);

This allows you to use javascript-like functions like getElementById()

answered Sep 11 '08 at 10:35

Erlend
1,766711

1

This is a really good solution. – Frank Krueger Sep 11 '08 at 11:06

4

Call me crazy, but I am having trouble figuring out how to use mshtml. Do you have any good links? – Alex Baranosky Jan 9 '09 at 5:52

1

@Alex you need to include Microsoft.mshtml can find a bit more info here: msdn.microsoft.com/en-us/library/aa290341(VS.71).aspx – Wilfred Knievel Jan 12 '10 at 23:17

I have a blogpost about Tidy.Net and ManagedTidy both are capable of parsing and validating (x)html files. If you do not need to validate stuff. I'd go with the htmlagilitypack. jphellemons.nl/post/… – JP Hellemons Oct 25 '11 at 7:03

Rob Volk · Answer 4 · 2009-12-18 04:51:32Z

up vote 16 down vote

I found a project called Fizzler that takes a jQuery/Sizzler approach to selecting HTML elements. It's based on HTML Agility Pack. It's currently in beta and only supports a subset of CSS selectors, but it's pretty damn cool and refreshing to use CSS selectors over nasty XPath.

http://code.google.com/p/fizzler/

answered Dec 18 '09 at 4:51

Rob Volk
88811117

1

thank you, this looks interesting! i've been surprised, what with jQuery's popularity, that it has been so hard to find a C# project inspired by it. Now if only I could find something where document manipulation and more advanced traversal was also part of the package... :) – Funka May 14 '10 at 1:33

I just used this today and I have to say, it is very easy to use if you know jQuery. – Chi Chan Oct 14 '10 at 20:56

musefan · Answer 5 · 2011-11-22 09:09:46Z

You can do a lot without going nuts on 3rd-party products and mshtml (i.e. interop). use the System.Windows.Forms.WebBrowser. From there, you can do such things as "GetElementById" on an HtmlDocument or "GetElementsByTagName" on HtmlElements. If you want to actually inteface with the browser (simulate button clicks for example), you can use a little reflection (imo a lesser evil than Interop) to do it:

var wb = new WebBrowser()

... tell the browser to navigate (tangential to this question). Then on the Document_Completed event you can simulate clicks like this.

var doc = wb.Browser.Document
var elem = doc.GetElementById(elementId);
object obj = elem.DomElement;
System.Reflection.MethodInfo mi = obj.GetType().GetMethod("click");
mi.Invoke(obj, new object[0]);

you can do similar reflection stuff to submit forms, etc.

Enjoy.

Frank Schwieterman · Answer 6 · 2009-03-08 22:11:01Z

I've written some code that provides "LINQ to HTML" functionality. I thought I would share it here. It is based on Majestic 12. It takes the Majestic-12 results and produces LINQ XML elements. At that point you can use all your LINQ to XML tools against the HTML. As an example:

        IEnumerable<XNode> auctionNodes = Majestic12ToXml.Majestic12ToXml.ConvertNodesToXml(byteArrayOfAuctionHtml);

        foreach (XElement anchorTag in auctionNodes.OfType<XElement>().DescendantsAndSelf("a")) {

            if (anchorTag.Attribute("href") == null)
                continue;

            Console.WriteLine(anchorTag.Attribute("href").Value);
        }

I wanted to use Majestic-12 because I know it has a lot of built-in knowledge with regards to HTML that is found in the wild. What I've found though is that to map the Majestic-12 results to something that LINQ will accept as XML requires additional work. The code I'm including does a lot of this cleansing, but as you use this you will find pages that are rejected. You'll need to fix up the code to address that. When an exception is thrown, check exception.Data["source"] as it is likely set to the HTML tag that caused the exception. Handling the HTML in a nice manner is at times not trivial...

So now that expectations are realistically low, here's the code :)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Majestic12;
using System.IO;
using System.Xml.Linq;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace Majestic12ToXml {
public class Majestic12ToXml {

    static public IEnumerable<XNode> ConvertNodesToXml(byte[] htmlAsBytes) {

        HTMLparser parser = OpenParser();
        parser.Init(htmlAsBytes);

        XElement currentNode = new XElement("document");

        HTMLchunk m12chunk = null;

        int xmlnsAttributeIndex = 0;
        string originalHtml = "";

        while ((m12chunk = parser.ParseNext()) != null) {

            try {

                Debug.Assert(!m12chunk.bHashMode);  // popular default for Majestic-12 setting

                XNode newNode = null;
                XElement newNodesParent = null;

                switch (m12chunk.oType) {
                    case HTMLchunkType.OpenTag:

                        // Tags are added as a child to the current tag, 
                        // except when the new tag implies the closure of 
                        // some number of ancestor tags.

                        newNode = ParseTagNode(m12chunk, originalHtml, ref xmlnsAttributeIndex);

                        if (newNode != null) {
                            currentNode = FindParentOfNewNode(m12chunk, originalHtml, currentNode);

                            newNodesParent = currentNode;

                            newNodesParent.Add(newNode);

                            currentNode = newNode as XElement;
                        }

                        break;

                    case HTMLchunkType.CloseTag:

                        if (m12chunk.bEndClosure) {

                            newNode = ParseTagNode(m12chunk, originalHtml, ref xmlnsAttributeIndex);

                            if (newNode != null) {
                                currentNode = FindParentOfNewNode(m12chunk, originalHtml, currentNode);

                                newNodesParent = currentNode;
                                newNodesParent.Add(newNode);
                            }
                        }
                        else {
                            XElement nodeToClose = currentNode;

                            string m12chunkCleanedTag = CleanupTagName(m12chunk.sTag, originalHtml);

                            while (nodeToClose != null && nodeToClose.Name.LocalName != m12chunkCleanedTag)
                                nodeToClose = nodeToClose.Parent;

                            if (nodeToClose != null)
                                currentNode = nodeToClose.Parent;

                            Debug.Assert(currentNode != null);
                        }

                        break;

                    case HTMLchunkType.Script:

                        newNode = new XElement("script", "REMOVED");
                        newNodesParent = currentNode;
                        newNodesParent.Add(newNode);
                        break;

                    case HTMLchunkType.Comment:

                        newNodesParent = currentNode;

                        if (m12chunk.sTag == "!--")
                            newNode = new XComment(m12chunk.oHTML);
                        else if (m12chunk.sTag == "![CDATA[")
                            newNode = new XCData(m12chunk.oHTML);
                        else
                            throw new Exception("Unrecognized comment sTag");

                        newNodesParent.Add(newNode);

                        break;

                    case HTMLchunkType.Text:

                        currentNode.Add(m12chunk.oHTML);
                        break;

                    default:
                        break;
                }
            }
            catch (Exception e) {
                var wrappedE = new Exception("Error using Majestic12.HTMLChunk, reason: " + e.Message, e);

                // the original html is copied for tracing/debugging purposes
                originalHtml = new string(htmlAsBytes.Skip(m12chunk.iChunkOffset)
                    .Take(m12chunk.iChunkLength)
                    .Select(B => (char)B).ToArray()); 

                wrappedE.Data.Add("source", originalHtml);

                throw wrappedE;
            }
        }

        while (currentNode.Parent != null)
            currentNode = currentNode.Parent;

        return currentNode.Nodes();
    }

    static XElement FindParentOfNewNode(Majestic12.HTMLchunk m12chunk, string originalHtml, XElement nextPotentialParent) {

        string m12chunkCleanedTag = CleanupTagName(m12chunk.sTag, originalHtml);

        XElement discoveredParent = null;

        // Get a list of all ancestors
        List<XElement> ancestors = new List<XElement>();
        XElement ancestor = nextPotentialParent;
        while (ancestor != null) {
            ancestors.Add(ancestor);
            ancestor = ancestor.Parent;
        }

        // Check if the new tag implies a previous tag was closed.
        if ("form" == m12chunkCleanedTag) {

            discoveredParent = ancestors
                .Where(XE => m12chunkCleanedTag == XE.Name)
                .Take(1)
                .Select(XE => XE.Parent)
                .FirstOrDefault();
        }
        else if ("td" == m12chunkCleanedTag) {

            discoveredParent = ancestors
                .TakeWhile(XE => "tr" != XE.Name)
                .Where(XE => m12chunkCleanedTag == XE.Name)
                .Take(1)
                .Select(XE => XE.Parent)
                .FirstOrDefault();
        }
        else if ("tr" == m12chunkCleanedTag) {

            discoveredParent = ancestors
                .TakeWhile(XE => !("table" == XE.Name
                                    || "thead" == XE.Name
                                    || "tbody" == XE.Name
                                    || "tfoot" == XE.Name))
                .Where(XE => m12chunkCleanedTag == XE.Name)
                .Take(1)
                .Select(XE => XE.Parent)
                .FirstOrDefault();
        }
        else if ("thead" == m12chunkCleanedTag
                  || "tbody" == m12chunkCleanedTag
                  || "tfoot" == m12chunkCleanedTag) {


            discoveredParent = ancestors
                .TakeWhile(XE => "table" != XE.Name)
                .Where(XE => m12chunkCleanedTag == XE.Name)
                .Take(1)
                .Select(XE => XE.Parent)
                .FirstOrDefault();
        }

        return discoveredParent ?? nextPotentialParent;
    }

    static string CleanupTagName(string originalName, string originalHtml) {

        string tagName = originalName;

        tagName = tagName.TrimStart(new char[] { '?' });  // for nodes <?xml >

        if (tagName.Contains(':'))
            tagName = tagName.Substring(tagName.LastIndexOf(':') + 1);

        return tagName;
    }

    static readonly Regex _startsAsNumeric = new Regex(@"^[0-9]", RegexOptions.Compiled);

    static bool TryCleanupAttributeName(string originalName, ref int xmlnsIndex, out string result) {

        result = null;
        string attributeName = originalName;

        if (string.IsNullOrEmpty(originalName))
            return false;

        if (_startsAsNumeric.IsMatch(originalName))
            return false;

        //
        // transform xmlns attributes so they don't actually create any XML namespaces
        //
        if (attributeName.ToLower().Equals("xmlns")) {

            attributeName = "xmlns_" + xmlnsIndex.ToString(); ;
            xmlnsIndex++;
        }
        else {
            if (attributeName.ToLower().StartsWith("xmlns:")) {
                attributeName = "xmlns_" + attributeName.Substring("xmlns:".Length);
            }   

            //
            // trim trailing \"
            //
            attributeName = attributeName.TrimEnd(new char[] { '\"' });

            attributeName = attributeName.Replace(":", "_");
        }

        result = attributeName;

        return true;
    }

    static Regex _weirdTag = new Regex(@"^<!\[.*\]>$");       // matches "<![if !supportEmptyParas]>"
    static Regex _aspnetPrecompiled = new Regex(@"^<%.*%>$"); // matches "<%@ ... %>"
    static Regex _shortHtmlComment = new Regex(@"^<!-.*->$"); // matches "<!-Extra_Images->"

    static XElement ParseTagNode(Majestic12.HTMLchunk m12chunk, string originalHtml, ref int xmlnsIndex) {

        if (string.IsNullOrEmpty(m12chunk.sTag)) {

            if (m12chunk.sParams.Length > 0 && m12chunk.sParams[0].ToLower().Equals("doctype"))
                return new XElement("doctype");

            if (_weirdTag.IsMatch(originalHtml))
                return new XElement("REMOVED_weirdBlockParenthesisTag");

            if (_aspnetPrecompiled.IsMatch(originalHtml))
                return new XElement("REMOVED_ASPNET_PrecompiledDirective");

            if (_shortHtmlComment.IsMatch(originalHtml))
                return new XElement("REMOVED_ShortHtmlComment");

            // Nodes like "<br <br>" will end up with a m12chunk.sTag==""...  We discard these nodes.
            return null;
        }

        string tagName = CleanupTagName(m12chunk.sTag, originalHtml);

        XElement result = new XElement(tagName);

        List<XAttribute> attributes = new List<XAttribute>();

        for (int i = 0; i < m12chunk.iParams; i++) {

            if (m12chunk.sParams[i] == "<!--") {

                // an HTML comment was embedded within a tag.  This comment and its contents
                // will be interpreted as attributes by Majestic-12... skip this attributes
                for (; i < m12chunk.iParams; i++) {

                    if (m12chunk.sTag == "--" || m12chunk.sTag == "-->")
                        break;
                }

                continue;
            }

            if (m12chunk.sParams[i] == "?" && string.IsNullOrEmpty(m12chunk.sValues[i]))
                continue;

            string attributeName = m12chunk.sParams[i];

            if (!TryCleanupAttributeName(attributeName, ref xmlnsIndex, out attributeName))
                continue;

            attributes.Add(new XAttribute(attributeName, m12chunk.sValues[i]));
        }

        // If attributes are duplicated with different values, we complain.
        // If attributes are duplicated with the same value, we remove all but 1.
        var duplicatedAttributes = attributes.GroupBy(A => A.Name).Where(G => G.Count() > 1);

        foreach (var duplicatedAttribute in duplicatedAttributes) {

            if (duplicatedAttribute.GroupBy(DA => DA.Value).Count() > 1)
                throw new Exception("Attribute value was given different values");

            attributes.RemoveAll(A => A.Name == duplicatedAttribute.Key);
            attributes.Add(duplicatedAttribute.First());
        }

        result.Add(attributes);

        return result;
    }

    static HTMLparser OpenParser() {
        HTMLparser oP = new HTMLparser();

        // The code+comments in this function are from the Majestic-12 sample documentation.

        // ...

        // This is optional, but if you want high performance then you may
        // want to set chunk hash mode to FALSE. This would result in tag params
        // being added to string arrays in HTMLchunk object called sParams and sValues, with number
        // of actual params being in iParams. See code below for details.
        //
        // When TRUE (and its default) tag params will be added to hashtable HTMLchunk (object).oParams
        oP.SetChunkHashMode(false);

        // if you set this to true then original parsed HTML for given chunk will be kept - 
        // this will reduce performance somewhat, but may be desireable in some cases where
        // reconstruction of HTML may be necessary
        oP.bKeepRawHTML = false;

        // if set to true (it is false by default), then entities will be decoded: this is essential
        // if you want to get strings that contain final representation of the data in HTML, however
        // you should be aware that if you want to use such strings into output HTML string then you will
        // need to do Entity encoding or same string may fail later
        oP.bDecodeEntities = true;

        // we have option to keep most entities as is - only replace stuff like &nbsp; 
        // this is called Mini Entities mode - it is handy when HTML will need
        // to be re-created after it was parsed, though in this case really
        // entities should not be parsed at all
        oP.bDecodeMiniEntities = true;

        if (!oP.bDecodeEntities && oP.bDecodeMiniEntities)
            oP.InitMiniEntities();

        // if set to true, then in case of Comments and SCRIPT tags the data set to oHTML will be
        // extracted BETWEEN those tags, rather than include complete RAW HTML that includes tags too
        // this only works if auto extraction is enabled
        oP.bAutoExtractBetweenTagsOnly = true;

        // if true then comments will be extracted automatically
        oP.bAutoKeepComments = true;

        // if true then scripts will be extracted automatically: 
        oP.bAutoKeepScripts = true;

        // if this option is true then whitespace before start of tag will be compressed to single
        // space character in string: " ", if false then full whitespace before tag will be returned (slower)
        // you may only want to set it to false if you want exact whitespace between tags, otherwise it is just
        // a waste of CPU cycles
        oP.bCompressWhiteSpaceBeforeTag = true;

        // if true (default) then tags with attributes marked as CLOSED (/ at the end) will be automatically
        // forced to be considered as open tags - this is no good for XML parsing, but I keep it for backwards
        // compatibility for my stuff as it makes it easier to avoid checking for same tag which is both closed
        // or open
        oP.bAutoMarkClosedTagsWithParamsAsOpen = false;

        return oP;
    }
}
}

btw HtmlAgilityPack has worked well for me in the past, I just prefer LINQ.
What's the performance like when you add the LINQ conversion? Any idea how it compares with HtmlAgilityPack?
I never did a performance comparison. These days I use HtmlAgilityPack, much less hassle. Unfortunately the code above has lots of special cases I didn't bother to write tests for, so I can't really maintain it.

Grimtron · Answer 7 · 2008-09-19 08:11:52Z

up vote 7 down vote

The Html Agility Pack has been mentioned before - if you are going for speed, you might also want to check out the Majestic-12 HTML parser. Its handling is rather clunky, but it delivers a really fast parsing experience.

answered Sep 19 '08 at 8:11

Grimtron
1,83711125

Murph · Answer 8 · 2008-09-11 10:53:06Z

up vote 5 down vote

I'm not sure about "best" but I'd start here:

Html Agility Pack

This will probably give you what you need.

answered Sep 11 '08 at 10:53

Murph
7,1122926

Frank Krueger · Answer 9 · 2008-09-11 11:12:13Z

up vote 3 down vote

I think @Erlend's use of HTMLDocument is the best way to go. However, I have also had good luck using this simple library:

SgmlReader

answered Sep 11 '08 at 11:12

Frank Krueger
22.4k2080151

Scott Hanselman · Answer 10 · 2008-09-19 08:07:35Z

up vote 2 down vote

Take a look at Chris Lovett's SGML Reader inside DasBlog. It'll turn HTML into an XML document and let you get the elements that way.

answered Sep 19 '08 at 8:07

Scott Hanselman
10.5k44867

Chris · Answer 11 · 2008-09-21 00:32:50Z

Look for the HtmlAgilityPack. It's an open-source library that parses HTML, and will fix errors (e.g. unclosed tags). Once loaded, you can use XPath via the XPathNavigator class to select the specific content you desire. Additionally, it can convert HTML to well-formed X(HT)ML. At work, this library is an VERY critical part of our software. We run hundreds of thousands of ugly third-party HTML documents and XML feeds through it daily, to assure they're well-formed before we attempt to parse data out of it.

Heinnge · Answer 12 · 2011-02-02 19:05:58Z

Anyone been using Fizzler? I just found out about this recently, it uses htmlagilitypack, and support jQuery style selectors. trust me, if you are familiar with jQuery, you won't look for another parser!

think i read about it first time here, Looking for C# HTML parser

majmun · Answer 13 · 2011-06-06 14:46:34Z

No 3rd party lib, WebBrowser class solution that can run on Console, and Asp.net

using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using System.Threading;

class ParseHTML
{
    public ParseHTML() { }
    private string ReturnString;

    public string doParsing(string html)
    {
        Thread t = new Thread(TParseMain);
        t.ApartmentState = ApartmentState.STA;
        t.Start((object)html);
        t.Join();
        return ReturnString;
    }

    private void TParseMain(object html)
    {
        WebBrowser wbc = new WebBrowser();
        wbc.DocumentText = "feces of a dummy";        //;magic words        
        HtmlDocument doc = wbc.Document.OpenNew(true);
        doc.Write((string)html);
        this.ReturnString = doc.Body.InnerHtml + " do here something";
        return;
    }
}

usage:

string myhtml = "<HTML><BODY>This is a new HTML document.</BODY></HTML>";
Console.WriteLine("before:" + myhtml);
myhtml = (new ParseHTML()).doParsing(myhtml);
Console.WriteLine("after:" + myhtml);

Mark Ingram · Answer 14 · 2008-09-11 09:47:26Z

up vote 1 down vote

The trouble with parsing HTML is that it isn't an exact science. If it was XHTML that you were parsing, then things would be a lot easier (as you mention you could use a general XML parser). Because HTML isn't necessarily well-formed XML you will come into lots of problems trying to parse it. It almost needs to be done on a site-by-site basis.

answered Sep 11 '08 at 9:47

Mark Ingram
18.7k1688134

1

Isn't parsing well forming HTML as specified by the W3C as an exact science as XHTML? – J. Pablo Fernández Dec 8 '09 at 12:56

It should be, but people don't do it. – DMan Feb 16 '10 at 3:54

@J. Pablo Not nearly as easy though (and hence the reason for a library :p)... for instance, <p> tags do not need to be explicitly closed under HTML4/5. Yikes! – user166390 Dec 22 '10 at 4:13

Rahul · Answer 15 · 2008-09-19 08:03:00Z

I've used ZetaHtmlTidy in the past to load random websites and then hit against various parts of the content with xpath (eg /html/body//p[@class='textblock']). It worked well but there were some exceptional sites that it had problems with, so I don't know if it's the absolute best solution.

Corin Blaikie · Answer 16 · 2008-09-11 09:39:18Z

up vote 0 down vote

You could use a HTML DTD, and the generic XML parsing libraries.

answered Sep 11 '08 at 9:39

Corin Blaikie
4,82972132

Can you clarify this? – Luke Sep 11 '08 at 9:44

8

Very few real-world HTML pages will survive an XML parsing library. – Frank Krueger Sep 11 '08 at 11:07

Ruben Bartelink · Answer 17 · 2009-11-12 14:53:50Z

up vote 0 down vote

Use WatiN if you need to see the impact of JS on the page [and you're prepared to start a browser]

answered Nov 12 '09 at 14:53

Ruben Bartelink
19.1k35293

Mikos · Answer 18 · 2010-01-03 09:04:29Z

up vote 0 down vote

Depending on your needs you might go for the more feature-rich libraries. I tried most/all of the solutions suggested, but what stood out head & shoulders was Html Agility Pack. It is a very forgiving and flexible parser.

answered Jan 3 '10 at 9:04

Mikos
4,30711334

P M · Answer 19 · 2010-03-22 20:29:03Z

Try this script.

http://www.biterscripting.com/SS_URLs.html

When I use it with this url,

script SS_URLs.txt URL("http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c")

It shows me all the links on the page for this thread.

http://sstatic.net/so/all.css
http://sstatic.net/so/favicon.ico
http://sstatic.net/so/apple-touch-icon.png
.
.
.

You can modify that script to check for images, variables, whatever.

Jonathan Wood · Answer 20 · 2010-12-23 18:19:39Z

I wrote some classes for parsing HTML tags in C#. They are nice and simple if they meet your particular needs.

You can read an article about them and download the source code at http://www.blackbeltcoder.com/Articles/strings/parsing-html-tags-in-c.

There's also an article about a generic parsing helper class at http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class.

asked	4 years ago
viewed	135364 times
active	1 year ago

What is the best way to parse html in C#? [closed]

locked by Kev♦ Nov 15 '11 at 17:09

closed as not constructive by Kev♦ Nov 15 '11 at 17:09

20 Answers

Not the answer you're looking for? Browse other questions tagged c# .net html parsing html-content-extraction or ask your own question.

Community Bulletin

Linked

What is the best way to parse html in C#? [closed]

locked by Kev♦ Nov 15 '11 at 17:09

closed as not constructive by Kev♦ Nov 15 '11 at 17:09

20 Answers

Not the answer you're looking for? Browse other questions tagged c# .net html parsing html-content-extraction or ask your own question.

Community Bulletin

Linked

Related