Check if a string has HTML in C#

Question

I need a function to determine whether or not a string has HTML in it or not so that I can know whether I'm dealing with a plain-text format or HTML format.

It seems simple enough in C#, using HTMLAgilityPack. Recursively go through the tree of nodes, and if any are an element node (or comment too) then we say "Yes, it's HTML"

public static class HTMLUtility
{
    public static bool ContainsHTMLElements(string text)
    {
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(text);

        bool foundHTML =  NodeContainsHTML(doc.DocumentNode);
        return foundHTML;
    }

    private static bool NodeContainsHTML(HtmlNode node)
    {
        return node.NodeType == HtmlNodeType.Element
            || node.NodeType == HtmlNodeType.Comment
            || node.ChildNodes.Any(n => NodeContainsHTML(n));
    }
}

Am I missing anything? Thanks!

Can text be just any HTML element, or is it supposed to be an entire HTML document? — Ron Beyer, Nov 17 '16 at 20:14
It could be any text that may or may not have HTML inside of it. If "text" has value "blah blah blah <div> Hello world! </div>", then then the function should return true — unnknown, Nov 17 '16 at 20:17
If so then for any valid html it should be enough to check whether the first non-empty character is a <. — t3chb0t, Nov 17 '16 at 20:32

RobH · Accepted Answer · 2016-11-18 15:42:25Z

I've had to do this exact thing before, your way is absolutely fine but I went for the other way round - checking if it was all text:

private static bool HtmlIsJustText(HtmlNode rootNode)
{
    return rootNode.Descendants().All(n => n.NodeType == HtmlNodeType.Text);
}

Then you have your public method as:

public static bool ContainsHTMLElements(string text)
{
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(text);
    return !HtmlIsJustText(doc.DocumentNode);
}

I think that makes the code slightly more concise.

I'd argue that your class should really be called HtmlUtility as per guidelines

The PascalCasing convention, used for all identifiers except parameter names, capitalizes the first character of each word (including acronyms over two letters in length), as shown in the following examples:

PropertyDescriptor

HtmlTag

asked	2 months ago
viewed	94 times
active	2 months ago

current community

your communities

more stack exchange communities

Check if a string has HTML in C#

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged c# html .net or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Check if a string has HTML in C#

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c# html .net or ask your own question.

Related

Hot Network Questions