The parsing tag has no wiki summary.
1
vote
2answers
27 views
Load data from HTML tables into OpenRefine?
One answer to the question Wikipedia table to JSON (or other machine-readable format) suggested to use OpenRefine for handling HTML tables.
All I could find regarding OpenRefine and HTML tables were ...
1
vote
1answer
31 views
What data source for cloud coverage available with forecast and how to parse it?
I need to find good and reliable data source of cloud coverage (to use GIS) with forecast for few days and be able to parse it.
Do you know good one? I know that NOAA provides weather data, but ...
5
votes
3answers
121 views
What does OpenRefine offer that other data-parsing tools don't?
I see OpenRefine mentioned a lot here, but I don't see it doing much that R and others can't. What capabilities does it offer that I'm not seeing in the promo page that R or other data packages ...
3
votes
3answers
77 views
How does one parse weather data?
Weather data is often cited as a "huge success" for the open data community. Where does this data sit, and how does one parse the data into something readable, like the 5 day forecast?
9
votes
5answers
190 views
Extracting tables from multiple PDFs
What's the best practice of extracting tables from a large number of PDF, which may be formatted differently?
For example, I have a series of PDFs like this one, and I would like to extract the ...
11
votes
7answers
267 views
Good tools to parse repetitive unstructured data
I'm looking to parse a large number of lines of repetitive but unstructured data. This is a task that happens at least once every project, in my experience, so I'm looking for a tool to transform ...
3
votes
4answers
91 views
What server and technologies can I use to extract data out of Wikipedia's infoboxes (e.g., ATC code for drugs)
Wikipedia is a significant source of data. Data from Wikipedia may be available in various formats via other servers (e.g., SPARQL end point).
For a task of extracting all pairs of "ATC code" - ...