HTML parsing refers to the process of converting an HTML document to a tree-based Document Object Model with the goal of extracting information.
-1
votes
0answers
19 views
ERROR: 'Namespace for prefix 'collapse;table-layout' has not been declared.' while parsing html
I built application that can download html and show it like browser do.After that, user can parse the table on the html page.I parse the table using jsoup.Jsoup can parse html in String so I need to ...
0
votes
2answers
26 views
XPath: From a node, select all the deepest descendants from the second child on
<?xml version="1.0"?>
<shipTo country="US">
<name><strong>Alice Smith</strong></name>
<street>123 Maple Street</street>
...
-3
votes
0answers
30 views
Detect Rowspan and Colspan in HTML Table with jsoup [closed]
I want to extract table in html using jsoup. I need to parse it and get the highlighted elements (the pink one). This is my table :
http://s22.postimg.org/4yid6co3l/image.jpg
How do I write the ...
0
votes
0answers
13 views
Checking URL for specific text or code
I have a list of nearly 1 Million URLs containg a specific Link to a page. Or atleast I believe so. To verify this I need to scan the pages for the link, also I sometimes check other stuff like "does ...
1
vote
2answers
43 views
Regex to pull match from ordered list
Given this string of text:
$myString = '<details class="myEl" open="open">
<summary>In this article</summary>
<ol>
<li><a ...
1
vote
0answers
8 views
using lxml with beautiful soup
I'm having trouble making lxml work with beautiful soup. Running on osx 10.8.4. To install lxml, i did port install py25-lxml and it installed fine. Now I'm getting this error when I try to use lxml ...
-4
votes
0answers
18 views
Grab data from auth pages (simple html or js, ajax data) [closed]
I want to get data reguraly from site with auth (i have account). Also if it is simple html i would use python with regexp and get data. But also i need to parse js (or ajax) forms.
Which methods i ...
0
votes
1answer
20 views
Page that programmatically is taken is different than normal google page?
We want to programmatically take current google page. we use many techniques with different programmatic languages but we do not achieve to get correct(current) google page.
Java code example
...
-3
votes
2answers
75 views
Scrape links from HTML [closed]
I have been always using preg_match to scrape URLs from HTML files but I wanted to extract only URLs that have .mp3 as their extension. I was told to try DOM and I have been trying to fix a code but ...
-1
votes
1answer
22 views
using simple xml to read parts of an html page returns warning's / notice of non-object, and need to fix this
Okay so I've been working on this little script which is bascially scrapping a page from ted.com everything works as i want it to (meaning I can print out all values that I am interested in), The ...
0
votes
3answers
58 views
PHP Simple HTML DOM Parser, find text inside tags that have no class nor id
I have a http://www.statistics.com/index.php?page=glossary&term_id=703
Specifically in these part:
<b>Additive Error:</b>
<p> Additive error is the error that is added to the ...
1
vote
4answers
68 views
Extract Text from within tags using RegExp PHP
I am trying to extract some strings from the source code of a web page which looks like this :
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
I'm ...
0
votes
1answer
43 views
Simple HTML Parser
<strong class="tb-rmb-num"><em class="tb-rmb">¥</em>39.00</strong>
I'm trying to retrieve the number only without the currency sign
My current code is
$ret = ...
0
votes
0answers
71 views
Using TWebBrowser for parsing HTML data
I got an app that uses TWebBrowser component, and i have link to one site, that opens it using TWebBrowser using "for" statement, by example like this:
adoquery.sql.text = "select top 10 id from ...
0
votes
1answer
43 views
match numbers in multiple lines
I have an HTML text like this
<tr>
<td><strong>Turnover</strong></td>
<td width="20%" class="currency">£348,191</td>
...
0
votes
3answers
38 views
How to parse HTML full page in android
I am calling a HTML page via a web servise . I need to get hole source code of HTML page.
My problem is that, when I convert the http response to string I am getting only some part of HTML page. How ...
0
votes
4answers
41 views
Using regexes to find result from HTML table
I am stuck with some regular expression problem.
I have a huge file in html and i need to extract some text (Model No.) from the file.
<table>......
<td colspan="2" align="center" ...
-1
votes
2answers
48 views
php foreach is very slow [closed]
i have this code which does the work exactly the way i want, it will tell me the all images available in any webpage but i takes more than one minute and the load on server also increases because of ...
0
votes
0answers
14 views
How to load image in webview from a blog one by one
I have an android app in which ,i have a webview on the top ,and previous,next buttons
in below, of the layout.I am aware of how to load an url in webview.I have a blog(blogspot) which contains ...
0
votes
0answers
26 views
Whats the best practise for HTML tags in Forms?
I do not want to allow HTML Tags in any input fields on my ASP.NET MVC4 page.
By default there comes a System.Web.HttpRequestValidationException.
The simplest way seems to show an error on the ...
0
votes
3answers
36 views
PHP Regular expression: Get all urls with question mark
I have this regular expression:
preg_match_all("/<a\s.*?href\s*=\s*['|\"](.*?)(?=#|\"|')/si", $data, $matches);
to find all urls, it works fine, BUT how can I modificate it to find urls with ...
0
votes
0answers
12 views
c# HtmlAgilityPack HTML parsing issue
I have this html
<div class="postrow firs">
<h2 class="title icon">
This is the title
</h2>
<div class="content">
<div ...
0
votes
0answers
27 views
Changing input values after row has been removed
I'm trying to create a tool where the user clicks on a button and it adds a label,input and remove button. I have it all working for the most part but there is only couple things missing. If the ...
0
votes
0answers
34 views
html form to iframe external booking system
I've got a simple booking form and want to transfer the data into a different 'booking' page in an iframe as we use an external booking system - but I'm having some trouble working this out. I have ...
0
votes
1answer
25 views
Questions about HTML parsing
This is a program we've written for html parsing.
It works perfectly.
We found a demo program on the net, and we modified it for our needs.
But we don't understand how it works.
import urllib
from ...
0
votes
3answers
36 views
How to change/delete string using php called page?
I'm using the current function :
function callframe(){
$ch = curl_init("file.html");
curl_setopt($ch, CURLOPT_HEADER, 0);
echo curl_exec($ch);
curl_close($ch);
}
Then i call ...
0
votes
0answers
38 views
script tag doesn't follow tree structure jquery parseHTML
I'm parsing some HTML through jquery's parseHTML() function:
<code>
<b>test</b><script>lol</script>
</code>
Problem is, that when it comes to <script> ...
-2
votes
1answer
28 views
From html to xml java api [closed]
I want do use some of my own converter from html table to xls table, but I don't know where to start. The google don't show me comprehensive results. I know about Apache tika and poi, but do they have ...
0
votes
1answer
31 views
How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)
Employing DOMDocument, I'm trying to read a portion of an HTML file and displaying it on a different HTML page using the code below. The DIV portion that I'm trying to access has several <p> ...
0
votes
1answer
34 views
PHP Simple HTML DOM If Website is Down
I'm parsing some news from one of the Turkish Health Ministry's websites. But for instance if website is down now or can't be loaded, my website doesn't load the content after the part that i parse ...
0
votes
0answers
31 views
Web scraping without manual entry? [closed]
I've searched and searched but can't seem to find anything on this so I'm hoping I can get some insight on this.
I work a 9 to 5 job where I have to do TONS of research about specific things. For ...
1
vote
1answer
57 views
How to navigate websites using Jsoup in java
How can i navigate (like web crawling) in Jsoup to a different link?
For this example I have done the basics to get the title, get links and get texts. But I want to be able to use one of those child ...
0
votes
2answers
46 views
find word but not in a link
I need a reg expression which will find the target word or words in html (so in amongst tags) but NOT in an anchor or script tag.
I have experimented for ages and came up with this
...
0
votes
1answer
30 views
Parsing HTML with DOM
I am trying to parse HTML for example
<html>
<head>
</head>
<body>
<a href='example.com'>Hello, <span>World</span></a>
<ul>
...
0
votes
1answer
32 views
file_get_html() returns garbage
I am using a simple_html_dom parser.
The following code is returning garbage output:
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>
...
0
votes
0answers
42 views
Parsing links using ColdFusion
I have a utility that lets users store custom fields (first name, favorite color, address, URL, etc).
One of those fields is used to store the contact's work/personal URL.
In addition, users can send ...
0
votes
1answer
58 views
HTML Parsing and removing anchor tags while preserving inner html using Jsoup
I have to parse some html and remove the anchor tags , but I need to preserve the innerHTML of anchor tags
For example, if my html text is:
String html = "<div> <p> some text <a ...
0
votes
3answers
75 views
Using regex to find keyword in http response
I asked a similar question earlier for which Nokogiri was recommended as a solution. I've used Nokogiri and it certainly works fine.
But due to certain reasons, I must use regex to extract a keyword ...
3
votes
1answer
34 views
Find All text within 1 level in HTML using Beautiful Soup - Python
I need to use beautiful soup to accomplish the following
Example HTML
<div id = "div1">
Text1
<div id="div2>
Text2
<div id="div3">
Text3
</div>
</div>
...
0
votes
1answer
26 views
javascript changing tag attributes not being executed in the right order?
I have an html file that consists of a table with fields inside. Some fields are marked by xxx, which should be converted to textareas. That has been successfully done as:
function raplace() {
...
-1
votes
2answers
42 views
Understanding HTML code for a specific web page [closed]
I am trying to see the code the following URL - http://www.chilis.com/EN/Pages/menuitem.aspx
I am seeing some weird things on the page that I cannot understand. I was hoping some one could explain it ...
-5
votes
2answers
59 views
Multiline Regular Expression (Regex) with C# [closed]
I've been trying to do this for some time, this is a part of text, and I want to get some values from it, to be exact:
<td align="Left">
<a href="randomUrl">Something</a> //Need ...
1
vote
0answers
23 views
How to use Perl HTML::TableExtract with rowspans and also spans within cells
I apologize for the length here, but I thought it made sense to include my small amount of progress in addition to a description of my problem!
I want to extract data from some html pages that have ...
1
vote
1answer
40 views
PHP: Regular expression / preg_match() until EOL
In an HTML page is a line like this:
<p><strong>State:</strong> <a href="/state/show/Ohio">Ohio</a></p>
What I'm looking for is a regex which gets the content ...
3
votes
3answers
69 views
preg_replace all <img> parameters
I've upgraded a WYSIWYG editor from an old version to the newest. There is a difference to how image dimensions are saved. The old version of the editor used to add width and height parameters to the ...
2
votes
1answer
39 views
how to add target blank to anchor tag to open in new window using preg_match or preg_replace
How to add target blank to anchor tag to open in new window using preg_match or preg_replace?
$contentss = ...
0
votes
1answer
28 views
Detect the actual content in a web page (ignore header, footer, navigation etc.)
Looking for a way (client-side or server-side) to detect the actual content part of a web page and remove its header, footer & navigation. Something similar to the way the Amazon's "Send to ...
0
votes
0answers
31 views
How to parse img tag and ul tag in java using htmlparser?
I want to parse following with htmlparser.I wrote code for title and its working fine.i tried for following tag but nothing is working.please help i am doing this kind of programmming for the first ...
1
vote
2answers
52 views
Regex formula assistance
I'm trying to find a regex formula for these HTML nodes:
first: Need the inner html value
<span class="profile fn">Any Name Here</span>
second: need the title value
<abbr ...
0
votes
2answers
38 views
Change html tag with same dimensions
I have a long table with different entries, some of which are marked as "xxx".
When the html code is generated, the row with the xxx looks like this(generated by a plugin):
<td ...