Sign up ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have to make project for school in C# which has to use database and web service. I have made a program which starts a web service and gets a function from there which is used for connecting to a website. I am using HTML Agility Pack for parsing info from that website. Website info is then written in textbox and stored in database.

Here is one function from project, and the rest is here.

    public void HtmlParser()
    {
        Encoding enc = Encoding.UTF8;

        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

        htmlDoc.Load(client.Request(), enc);

        HtmlNode root = htmlDoc.DocumentNode;
        HtmlNodeCollection jobNode = root.SelectNodes("//div[@class='jobBox']");
        HtmlNodeCollection headerNode = root.SelectNodes("//div[@class='jobBox']/h1");
        HtmlNodeCollection contentNode = root.SelectNodes("//div[@class='jobBox']/div[@class='content']");
        HtmlNodeCollection linkNode = root.SelectNodes("//div[@class='jobBox']/a");

        jobs = new List<string>();
        headers = new List<string>();
        content = new List<string>();
        links = new List<string>();
        full_links = new List<Uri>();

        foreach (HtmlNode node in jobNode)
        {
            jobs.Add(node.InnerText);
        }
        foreach (HtmlNode node in headerNode)
        {
            headers.Add(node.InnerText.Trim());
        }
        foreach (HtmlNode node in contentNode)
        {
            content.Add(node.InnerText.Trim());
        }

        foreach (HtmlNode node in linkNode)
        {
            links.Add(node.GetAttributeValue("href", null).Trim());
            Uri temp = new Uri(link + node.GetAttributeValue("href", null).Trim());
            full_links.Add(temp);
        }

        htmlDoc.Save("file.htm", enc);

        StringParser();
    }

Some variables are in Croatian, so don't get confused. I am looking for helpful reviews and tips on how I can make my code better.

share|improve this question

1 Answer 1

up vote 4 down vote accepted

One comment I'd have is that there is a LOT going on in that method... but for the most part each section is the same...

  • GetNodeCollection
  • Loop Through it
  • Add it to a list

You could probably pull that out into a method to shorten your code.

public List<string> ParseSection(HtmlNode root, string nodePath, bool isLink = false)
{
    HtmlNodeCollection nodes = root.SelectNodes(nodePath);
    List<string> parsedNodes = new List<string>();
    foreach (HtmlNode node in nodes)
    {
        if(isLink)
        {
            parsedNodes.Add(node.GetAttributeValue("href", null).Trim());
        }
        else
        {
            parsedNodes.Add(node.InnerText.Trim());
        }
    }
    return parsedNodes;
}

public void HtmlParser()
{
    Encoding enc = Encoding.UTF8;
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
    htmlDoc.Load(client.Request(), enc);
    HtmlNode root = htmlDoc.DocumentNode;
    HtmlNodeCollection linkNode = root.SelectNodes();

    List<string> jobs = ParseSection(root, "//div[@class='jobBox']");
    List<string> headers = ParseSection(root, "//div[@class='jobBox']/h1");
    List<string> content = ParseSection(root, "//div[@class='jobBox']/div[@class='content']");
    List<string> links = ParseSection(root, "//div[@class='jobBox']/a", true); //TRUE
    List<Uri> fullLinks = links.Select(l => new Uri(l));

    htmlDoc.Save("file.htm", enc);
    StringParser();
}
share|improve this answer
    
yea there are a lot of thing going on in that function :D thanks for tips, ill try to change something –  imot01 May 30 '13 at 9:48

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.