Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am trying to parse: [www.neiu.edu/~neiutemp/PhoneBook/alpha.htm] using the TFHPPLE parser and I am looking for the 1st TD (first column) from every TR (row) in a table. Here All the attributes of the TDs are same. I can't differentiate TDs.
I am able to get all of the HTML code, but fail to get 1st TD from each TR. After // 3(in the code) tutorialsNodes is empty. The output of

NSLog(@"Nodes are : %@",[tutorialsNodes description]);

is

Practice1[62351:c07] Nodes are : ().

I can't see what's wrong. Any help would be appreciated. My code to parse this URL:

NSURL *tutorialsUrl = [NSURL URLWithString:@"http://www.neiu.edu/~neiutemp/PhoneBook/alpha.htm"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];

// 2
TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];

// 3
NSString *tutorialsXpathQueryString = @"//TR/TD";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
NSLog(@"Nodes are : %@",[tutorialsNodes description]);
// 4
NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {
    // 5
    Tutorial *tutorial = [[Tutorial alloc] init];
    [newTutorials addObject:tutorial];

    // 6
    tutorial.title = [[element firstChild] content];

    // 7
    tutorial.url = [element objectForKey:@"href"];

    NSLog(@"title is: %@",[tutorial.title description]);
}

// 8
_objects = newTutorials;
[self.tableView reloadData];
share|improve this question
    
Xcode doesn't parse HTML. Is this for OS X os iOS? You should rephrase this with "using Cocoa/Cocoa Touch" accordingly. –  user529758 Apr 16 '13 at 21:08
    
I've rewritten the question. –  Mitra Patel May 6 '13 at 20:24

1 Answer 1

up vote 2 down vote accepted

This should work if you use @"//tr/td" instead of @"//TR/TD".

Looking at your HTML, though, since the author of that apparently doesn't know how to spell CSS, you have font tags buried throughout the source. So, your next line of code, which is obviously taken from the excellent Hpple tutorial by Matt Galloway on Ray Wenderlich's site, says:

tutorial.title = [[element firstChild] content];

But that won't work here, because for most of your entries, the firstChild is not the text, but rather it's a font tag. So you could check to see if it was a font tag like so:

TFHppleElement *subelement = [element firstChild];
if ([[subelement tagName] isEqualToString:@"font"])
    subelement = [subelement firstChild];
tutorial.title = [subelement content];

Or, you could instead just search for @"//tr/td/font" instead of @"//tr/td". Lots of approaches here. The trick (like all HTML parsing) is going to be to make it reasonably robust so you won't be susceptible to minor cosmetic tweaks of the page.

And obviously, your HTML doesn't have URLs there, so that code isn't applicable here.

Anyway, I hope this is enough to get you going.


You report having issues, so I thought I'd just supply a more complete code sample:

NSURL *tutorialsUrl = [NSURL URLWithString:@"http://www.neiu.edu/~neiutemp/PhoneBook/alpha.htm"];
NSData *tutorialsHtmlData = [NSData dataWithContentsOfURL:tutorialsUrl];

TFHpple *tutorialsParser = [TFHpple hppleWithHTMLData:tutorialsHtmlData];

NSString *tutorialsXpathQueryString = @"//tr/td";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];

if ([tutorialsNodes count] == 0)
    NSLog(@"nothing there");
else
    NSLog(@"There are %d nodes", [tutorialsNodes count]);

NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {

    Tutorial *tutorial = [[Tutorial alloc] init];
    [newTutorials addObject:tutorial];

    TFHppleElement *subelement = [element firstChild];
    if ([[subelement tagName] isEqualToString:@"font"])
        subelement = [subelement firstChild];
    tutorial.title = [subelement content];

    NSLog(@"title is: %@", [tutorial.title description]);
}

That yields the following output:

2013-05-10 19:39:42.027 hpple-test[33881:c07] There are 10773 nodes
2013-05-10 19:39:42.028 hpple-test[33881:c07] title is: A
2013-05-10 19:39:46.027 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:46.698 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:47.333 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:47.827 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:48.358 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:49.133 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:49.775 hpple-test[33881:c07] title is: Abay, Hiwet B
2013-05-10 19:39:50.326 hpple-test[33881:c07] title is: H-Abay
2013-05-10 19:39:50.992 hpple-test[33881:c07] title is: 773-442-5140
2013-05-10 19:39:51.597 hpple-test[33881:c07] title is: (null)
2013-05-10 19:39:52.092 hpple-test[33881:c07] title is: Controller
2013-05-10 19:39:52.598 hpple-test[33881:c07] title is: E
2013-05-10 19:39:53.149 hpple-test[33881:c07] title is: 223
2013-05-10 19:39:55.040 hpple-test[33881:c07] title is: Abbruscato, Terence 
2013-05-10 19:39:55.806 hpple-test[33881:c07] title is: T-Abbruscato
2013-05-10 19:39:56.525 hpple-test[33881:c07] title is: 773-442-5339
...
share|improve this answer
    
Thank You so much Rob. It works. But in: NSLog(@"title is: %@",[tutorial.title description]);, i got : 2013-05-10 14:23:15.375 Practice1[78352:c07] title is: (null) as an output. I have used tr/td/font as xpathQueryString. Any more suggestions please!! –  Mitra Patel May 10 '13 at 19:31
    
@MitraPatel I've updated my answer with a more complete code sample. –  Rob May 10 '13 at 23:45
    
I got the same Thanks a lot. –  Mitra Patel May 11 '13 at 3:30
    
@MitraPatel Glad it worked out. –  Rob May 11 '13 at 12:54

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.