HTML parsing refers to the process of converting an HTML document to a tree-based Document Object Model with the goal of extracting information.

learn more… | top users | synonyms (1)

0
votes
1answer
20 views

How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)

Employing DOMDocument, I'm trying to read a portion of an HTML file and displaying it on a different HTML page using the code below. The DIV portion that I'm trying to access has several <p> ...
0
votes
0answers
10 views

Changing input values after row has been removed

I'm trying to create a tool where the user clicks on a button and it adds a label,input and remove button. I have it all working for the most part but there is only couple things missing. If the ...
0
votes
3answers
35 views

How to change/delete string using php called page?

I'm using the current function : function callframe(){ $ch = curl_init("file.html"); curl_setopt($ch, CURLOPT_HEADER, 0); echo curl_exec($ch); curl_close($ch); } Then i call ...
0
votes
0answers
27 views

html form to iframe external booking system

I've got a simple booking form and want to transfer the data into a different 'booking' page in an iframe as we use an external booking system - but I'm having some trouble working this out. I have ...
0
votes
1answer
19 views

Questions about HTML parsing

This is a program we've written for html parsing. It works perfectly. We found a demo program on the net, and we modified it for our needs. But we don't understand how it works. import urllib from ...
0
votes
2answers
299 views

JUnit testing for HTML parsing

I'm trying to set up unit tests on a web crawler and am rather confused as to how I would test them. (I've only done unit testing once and it was on a calculator program.) Here are two example ...
0
votes
2answers
44 views

find word but not in a link

I need a reg expression which will find the target word or words in html (so in amongst tags) but NOT in an anchor or script tag. I have experimented for ages and came up with this ...
0
votes
0answers
36 views

script tag doesn't follow tree structure jquery parseHTML

I'm parsing some HTML through jquery's parseHTML() function: <code> <b>test</b><script>lol</script> </code> Problem is, that when it comes to <script> ...
0
votes
1answer
20 views

From html to xml java api

I want do use some of my own converter from html table to xls table, but I don't know where to start. The google don't show me comprehensive results. I know about Apache tika and poi, but do they have ...
0
votes
1answer
128 views

Access documents on secure web server

I'm trying to build an iPad app to download and display documents (pdf, ppt, doc, etc.) from a web server. Currently it does this by parsing the HTML structure (using hpple) on the server. For ...
0
votes
1answer
32 views

PHP Simple HTML DOM If Website is Down

I'm parsing some news from one of the Turkish Health Ministry's websites. But for instance if website is down now or can't be loaded, my website doesn't load the content after the part that i parse ...
0
votes
0answers
25 views

Web scraping without manual entry? [closed]

I've searched and searched but can't seem to find anything on this so I'm hoping I can get some insight on this. I work a 9 to 5 job where I have to do TONS of research about specific things. For ...
0
votes
1answer
36 views

How to navigate websites using Jsoup in java

How can i navigate (like web crawling) in Jsoup to a different link? For this example I have done the basics to get the title, get links and get texts. But I want to be able to use one of those child ...
0
votes
1answer
28 views

Parsing HTML with DOM

I am trying to parse HTML for example <html> <head> </head> <body> <a href='example.com'>Hello, <span>World</span></a> <ul> ...
0
votes
1answer
24 views

file_get_html() returns garbage

I am using a simple_html_dom parser. The following code is returning garbage output: $opts = array( 'http'=>array( 'method'=>"GET", 'header'=> ...

1 2 3 4 5 168
15 30 50 per page