Analyzing (un)structured data to convert it into a structured, normalized format.
2
votes
2answers
75 views
Resources for writing a parser combinator library
I often use parser combinator libraries, but I've never written one. What are good resources for getting started? In case it matters, I'm using Julia, a functional (but not lazy) language.
1
vote
2answers
153 views
Should I use a formal grammar for my interpreted scripting language
I have a scripting engine I just published as an open source project. It's been sitting on my harddrive waiting for about a year. My engine of course isn't complete in any way, but it does work for ...
12
votes
3answers
292 views
Implementing the Visitor Pattern for an Abstract Syntax Tree
I'm in the process of creating my own programming language, which I do for learning purposes. I already wrote the lexer and a recursive descent parser for a subset of my language (I currently support ...
1
vote
2answers
75 views
How are line/column position data dealt with in parser combinator libraries?
I'm building a parser using a parser combinator library. I need to keep track of where AST nodes started and ended in the textual input -- line and column numbers.
How is this problem approached ...
6
votes
3answers
546 views
Generic file parser design in Java using the Strategy pattern
I am working on a product in which the responsibility of one of the modules is to parse XML files and dump the required content in a database. Even though the present requirement is only to parse XML ...
0
votes
2answers
106 views
Creating a text input simplifaction tool
I have been working on a project for several months now to write a web-based tool that will help me at work. I work at a call center and the CRM software that is used is wretched, we basically have to ...
9
votes
4answers
644 views
Using a “dead man's switch” to manage time-sensitive code
In our software environment, we often run a/b tests, as is probably good practice. However, our environment is set up such that, in very short order, the code starts to become very crufty with dead ...
4
votes
1answer
729 views
First and Follow Sets for a Grammar
I'm studying for a Compiler Construction module I'm doing and I have a sample question as follows:
Calculate the FIRST and FOLLOW sets for the following grammar..
S -> uBDz
B -> Bv
B -> w
D ...
3
votes
1answer
202 views
Showing a grammar is ambiguous
I have the following question taken from a compilers course exam:
Show that the following grammar is ambiguous.
S = XcY
X = a
Y = b | Z
Z = bW
W = d | ϵ
I drew the following tree:
Am I correct ...
1
vote
2answers
127 views
How to extract operators from the grammar productions for conflict resolution in LALR parser?
Is there some standardized or widely accepted algorithm for picking up operators in shift/reduce conflicts in LALR parser? The question is naive, my problem is not with implementing my solution, but ...
6
votes
2answers
193 views
Language parsing to find important words
I'm looking for some input and theory on how to approach a lexical topic.
Let's say I have a collection of strings, which may just be one sentence or potentially multiple sentences. I'd like to ...
1
vote
2answers
97 views
Extracting useful information from free text
We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use ...
3
votes
1answer
152 views
How to add precedence to LALR parser like in YACC?
Please note, I am asking about writing LALR parser, not writing rules for LALR parser.
What I need is...
...to mimic YACC precedence definitions. I don't know how it is implemented, and below I ...
2
votes
1answer
198 views
LL(∞) and left-recursion [closed]
I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little ...
3
votes
1answer
196 views
Extracting color profile information from JPEG files
I'm trying to look up info about reading JPEG's color profile info and to my surprise there's very little open specific how-to information on that regard, but rather lots of general explanation on ...
8
votes
4answers
238 views
How should I implement a command processing application?
I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number.
Example:
I start with 1. Then I write "add 2", it gives me 3. Then I ...
11
votes
5answers
312 views
How can I best manage making open source code releases from my company's confidential research code?
My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a ...
4
votes
1answer
173 views
Scripting custom drawing in Delphi application with IF/THEN/ELSE statements?
I'm building a Delphi application which displays a blueprint of a building, including doors, windows, wiring, lighting, outlets, switches, etc. I have implemented a very lightweight script of my own ...
7
votes
2answers
259 views
Persisting natural language processing parsed data
I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a ...
1
vote
1answer
110 views
Does JAXP natively parse HTML?
So, I whip up a quick test case in Java 7 to grab a couple of elements from random URIs, and see if the built-in parsing stuff will do what I need.
Here's the basic setup (with exception handling etc ...
4
votes
4answers
713 views
Can the csv format be defined by a regex?
A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, ...
2
votes
3answers
268 views
How do I translate user input into a fictitious language?
For experimental reasons, I am trying to convert user input into a fictitious language. All of the translation can be 1:1.
I would prefer if I could accomplish this with PHP.
Should I use gettext ...
0
votes
1answer
112 views
How do I parse a header with two different version [ID3] avoiding code duplication?
I really hope you can give me some interesting viewpoints for my situation, because I am not satisfied with my current approach.
I am writing an MP3 parser, starting with an ID3v2 parser.
Right now ...
2
votes
2answers
141 views
parsing terminology: comments+whitespaces vs actual code
In languages like c/c++ spacing and comments are ignored and only actual code gets into compiler.
I'm interested if there is accepted way of naming these two things?
comments & spacing
...
5
votes
4answers
564 views
Programming Language Parser (in Java) - What would be a better design alternative for a special case?
Background
I'm currently designing my own programming language as a research project.
I have most of the grammar done and written down as context-free grammar, and it should be working as is. - Now ...
2
votes
1answer
280 views
Picture Parsing
If I open a picture file, lets say with an PNG extension, I will see bunch of code. Now let say I want to get some information from the picture mechanically. So the question here is what is the first ...
6
votes
2answers
177 views
Parsing multiple file formats/protocols
We are starting a project where we will need to write parsers for a bunch of binary file formats, each of them representing very similar data (time-value series from different measurement devices).
...
1
vote
2answers
292 views
How to create a Semantic Network like wordnet based on Wikipedia?
I am an undergraduate student and I have to create a Semantic Network based on Wikipedia. This Semantic Network would be similar to Wordnet(except for it is based on Wikipedia and is concerned with ...
5
votes
4answers
285 views
How are comments expressed in programming language grammars?
I'm learning how to build parsers using grammars, but I got stuck trying to express comments, because they can appear almost anywhere.
This indicates that comments can be stripped from the token ...
0
votes
2answers
179 views
Using a parser to locate faulty code
Lately I've been working a lot in PHP and have run into an abnormally large number of parsing errors. I realize these are my own fault and a result of sloppy initial coding on my part, but it's ...
6
votes
5answers
2k views
Getting data from a webpage in a stable and efficient way
Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action.
So my question is simple: What then, is the best / most efficient and ...
6
votes
6answers
762 views
Best way to parse a file
I'm trying to find a better solution for making a parser to some of the famous file formats out there such as: EDIFACT and TRADACOMS.
If you aren't familiar with these standards then check out this ...
5
votes
1answer
441 views
What is this algorithm for converting strings into numbers called?
I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their ...
3
votes
4answers
945 views
What is the simplest human readable configuration file format?
Current configuration file is as follows:
mainwindow.title = 'test'
mainwindow.position.x = 100
mainwindow.position.y = 200
mainwindow.button.label = 'apply'
mainwindow.button.size.x = 100
...
0
votes
2answers
214 views
How do I capture information from a website that doesn't provide an API?
Do you know any good tutorials, frameworks, anything that can help me to write code that captures information from a website that don't have a public API, or hasn't been written in a RESTful way?
...
0
votes
3answers
202 views
What is a good parsing reference?
I am working on a project that needs parsing and text processing functionality.
I searched the web about parsing and I found that my best choice for parsing is python.
What is a good, fast, and ...
1
vote
1answer
190 views
Extracting text from various file formats
I want to extract text from various files. I used Apache POI for parsing Microsoft documents. It's working and now I want to parse PDFs and extract text from them.
Is there a Java API that I could ...
3
votes
1answer
304 views
How can I test a parser for a bespoke XML schema?
I'm parsing a bespoke XML format into an object graph using .NET 4.0. My parser is using the System.XML namespace internally, I'm then interrogating the relevant properties of XmlNodes to create my ...
5
votes
5answers
888 views
Are separate parsing and lexing passes good practice with parser combinators?
When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just ...
22
votes
6answers
931 views
What are the arguments against parsing the Cthulhu way?
I have been assigned the task of implementing a Domain Specific Language for a tool that may become quite important for the company. The language is simple but not trivial, it already allows nested ...
12
votes
12answers
2k views
How to write a command interpreter/parser?
Problem: Run commands in the form of a string.
command example:
/user/files/ list all;
equivalent to:
/user/files/ ls -la;
another one:
post tw fb "HOW DO YOU STOP THE TICKLE MONSTER?;"
...
2
votes
1answer
231 views
Idea of an algorithm to detect a website's navigation structure?
Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS.
While the downloading the files is solved successfully, ...
3
votes
3answers
309 views
Any good reason to open files in text mode?
(Almost-)POSIX-compliant operating systems and Windows are known to distinguish between 'binary mode' and 'text mode' file I/O. While the former mode doesn't transform any data between the actual file ...
2
votes
3answers
908 views
What programming language is most suitable for handling unstructured data?
I'm trying to automate the application of metadata to huge amount of text, but I'm not sure what language would make this task easier (if there is one).
What programming language is most suitable ...
0
votes
2answers
1k views
Fastest C++ XML parsing library
I have thousands of .xml files from size 1MB-45MB (no DTDs). I need to parse and further manipulate these XML files before generating separate .xml files with the results of my regex.
What the ...
6
votes
3answers
383 views
C++ XML Parsing: Suggestions on Approach for Parsing and Storing data
I am looking into developing a C++ application to parse xml (using the rapidxml framework), and I would like some advice on how to approach this.
The file I want to parse is a XML game file that ...
6
votes
2answers
975 views
Algorithm for formating SQL code
I need a tool (for in house usage) that will format SQL code (SQL Server/MySQL).
There are various 3rd party tools and online web sites that do it but no exactly how I need it.
So I want to write my ...
12
votes
3answers
590 views
How should I specify a grammar for a parser?
I have been programming for many years, but one task that still takes me inordinately long is to specify a grammar for a parser, and even after this excessive effort, I'm never sure that the grammar ...
3
votes
3answers
554 views
Parsing scripts that use curly braces
To get an idea of what I'm doing, I am writing a python parser that will parse directx .x text files.
The problem I have deals with how the files are formatted. Although I'm writing it in python, I'm ...
7
votes
1answer
192 views
Should I let my users write BnfExpressions to extend my grammar?
Preface
I'm designing a templating language (please skip the don't/why?? speech). One of the major goals of this language is to be extensible. There are 2 main elements in my language. "Tags" and ...