Analyzing (un)structured data to convert it into a structured, normalized format.
43
votes
12answers
10k views
Should I use a parser generator or should I roll my own custom lexer and parser code?
What specific advantages and disadvantages of each way to working on a programming language grammar?
Why/When should I roll my own? Why/When should I use a generator?
33
votes
6answers
2k views
Why was strict parsing not chosen for HTML?
I am pretty much unaware about the detailed computing history that surrounds internet but it has always made me think why strict parsing was not chosen when creating HTML. Browsers accept any rubbish ...
27
votes
2answers
2k views
Do modern languages still use parser generators?
I was researching about the gcc compiler suite on wikipedia here, when this came up:
GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written ...
26
votes
1answer
1k views
The Inglish parser (for The Hobbit 1982)
Was fascinated to read about the text adventure game The Hobbit which featured an incredibly robust parser called "Inglish":
...Inglish allowed one to type advanced sentences such as "ask Gandalf ...
23
votes
6answers
1k views
What are the arguments against parsing the Cthulhu way?
I have been assigned the task of implementing a Domain Specific Language for a tool that may become quite important for the company. The language is simple but not trivial, it already allows nested ...
22
votes
5answers
3k views
Name for this type of parser, OR why it doesn't exist
Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments ...
17
votes
4answers
1k views
Generic rule parser for RPG board game rules - how to do it?
I want to build a generic rule parser for pen and paper style RPG systems. A rule can involve usually 1 to N entities 1 to N roles of a dice and calculating values based on multiple attributes of an ...
16
votes
12answers
5k views
How to write a command interpreter/parser?
Problem: Run commands in the form of a string.
command example:
/user/files/ list all;
equivalent to:
/user/files/ ls -la;
another one:
post tw fb "HOW DO YOU STOP THE TICKLE MONSTER?;"
...
14
votes
3answers
6k views
Implementing the Visitor Pattern for an Abstract Syntax Tree
I'm in the process of creating my own programming language, which I do for learning purposes. I already wrote the lexer and a recursive descent parser for a subset of my language (I currently support ...
13
votes
5answers
434 views
How can I best manage making open source code releases from my company's confidential research code?
My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a ...
12
votes
5answers
3k views
Are separate parsing and lexing passes good practice with parser combinators?
When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just ...
12
votes
2answers
918 views
What does scannerless parsing have to do with the “Dangling Else Problem”?
I do not understand this sentence from the Wikipedia article on the Dangling Else problem:
[The Dangling Else problem] is a problem that often comes up in compiler construction, especially ...
11
votes
2answers
2k views
What's the simplest example out there to explain the difference between Parse Trees and Abstract Syntax Trees?
To my understanding, a parser creates a parse tree, and then discards it thereafter. However, it can also pop out an abstract syntax tree, which the compiler supposedly makes use of.
I'm under the ...
10
votes
5answers
6k views
Can the csv format be defined by a regex?
A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, ...
10
votes
3answers
1k views
How should I specify a grammar for a parser?
I have been programming for many years, but one task that still takes me inordinately long is to specify a grammar for a parser, and even after this excessive effort, I'm never sure that the grammar ...
10
votes
3answers
1k views
Writing a Compiler Compiler - Insight on Use and Features
This is part of a series of questions which focuses on the sister project to the Abstraction Project, which aims to abstract the concepts used in language design in the form of a framework. The ...
9
votes
5answers
2k views
unit tests for a csv parser
What tests should I use to unit test a csv parser?
I have a simple csv parser in C#, and I want to be sure I have good unit test coverage of all the common (and uncommon) edge cases. What tests ...
9
votes
5answers
16k views
Getting data from a webpage in a stable and efficient way
Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action.
So my question is simple: What then, is the best / most efficient and ...
9
votes
5answers
2k views
Coming up with tokens for a lexer
I'm writing a parser for a markup language that I have created (writing in python, but that's not really relevant to this question -- in fact if this seems like a bad idea, I'd love a suggestion for a ...
9
votes
3answers
7k views
Generic file parser design in Java using the Strategy pattern
I am working on a product in which the responsibility of one of the modules is to parse XML files and dump the required content in a database. Even though the present requirement is only to parse XML ...
8
votes
6answers
3k views
Techniques for parsing XML
I've always found XML somewhat cumbersome to process. I'm not talking about implementing an XML parser: I'm talking about using an existing stream-based parser, like a SAX parser, which processes the ...
8
votes
4answers
808 views
Using a “dead man's switch” to manage time-sensitive code
In our software environment, we often run a/b tests, as is probably good practice. However, our environment is set up such that, in very short order, the code starts to become very crufty with dead ...
8
votes
4answers
266 views
How should I implement a command processing application?
I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number.
Example:
I start with 1. Then I write "add 2", it gives me 3. Then I ...
8
votes
3answers
3k views
What issues tend to arise when working with HL7 messages?
I'm testing a product for health care businesses, and we're working with HL7 messages. I saw people groaning on another question about the issues with HL7 but not mentioning specifics. Can someone ...
8
votes
2answers
849 views
Persisting natural language processing parsed data
I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a ...
8
votes
3answers
530 views
What the correct algorithm to invert italics in a mixed text?
The question motivations was depicted in the section below.
There are many ways to make text italic,
so, perhaps, there are more than one good
"swap italics algorithm".
The problem reveals some ...
7
votes
3answers
877 views
What is a real-world use case of using a Chomsky Type-I (context-sensitive) grammar
I have been having some fun lately exploring the development of language parsers in the context of how they fit into the Chomsky Hierarchy.
What is a good real-world (ie not theoretical) example of a ...
7
votes
2answers
297 views
Parsing multiple file formats/protocols
We are starting a project where we will need to write parsers for a bunch of binary file formats, each of them representing very similar data (time-value series from different measurement devices).
...
7
votes
2answers
720 views
Language parsing to find important words
I'm looking for some input and theory on how to approach a lexical topic.
Let's say I have a collection of strings, which may just be one sentence or potentially multiple sentences. I'd like to ...
7
votes
2answers
2k views
Algorithm for formating SQL code
I need a tool (for in house usage) that will format SQL code (SQL Server/MySQL).
There are various 3rd party tools and online web sites that do it but no exactly how I need it.
So I want to write my ...
7
votes
1answer
249 views
Should I let my users write BnfExpressions to extend my grammar?
Preface
I'm designing a templating language (please skip the don't/why?? speech). One of the major goals of this language is to be extensible. There are 2 main elements in my language. "Tags" and ...
6
votes
4answers
521 views
How are comments expressed in programming language grammars?
I'm learning how to build parsers using grammars, but I got stuck trying to express comments, because they can appear almost anywhere.
This indicates that comments can be stripped from the token ...
6
votes
6answers
3k views
Best way to parse a file
I'm trying to find a better solution for making a parser to some of the famous file formats out there such as: EDIFACT and TRADACOMS.
If you aren't familiar with these standards then check out this ...
6
votes
6answers
3k views
What is the simplest human readable configuration file format? [closed]
Current configuration file is as follows:
mainwindow.title = 'test'
mainwindow.position.x = 100
mainwindow.position.y = 200
mainwindow.button.label = 'apply'
mainwindow.button.size.x = 100
...
6
votes
2answers
429 views
In layman's terms, what is left recursion?
According to one page on code.google.com, "left recursion" is defined as follows:
Left recursion just refers to any recursive nonterminal that, when it produces a sentential form containing ...
6
votes
3answers
394 views
Clarification about Grammars , Lexers and Parsers
Background info (May Skip): I am working on a task we have been set at uni in which we have to design a grammar for a DSL we have been provided with. The grammar must be in BNF or EBNF. As well as ...
6
votes
2answers
352 views
Slight extension for SQL prepared statements syntax. Need advice
In my database abstraction library I am extending SQL prepared statements syntax to hint a parser with expected literal type. I take it as a very essential improvement, my reasoning you can read here. ...
6
votes
2answers
539 views
Parser combinator that looks like BNF
Is it possible to construct a parser combinator library that reads like a BNF grammar? I don't know of any, so I started wondering if there are reasons it's impossible or undesirable to do so. It ...
6
votes
3answers
711 views
C++ XML Parsing: Suggestions on Approach for Parsing and Storing data
I am looking into developing a C++ application to parse xml (using the rapidxml framework), and I would like some advice on how to approach this.
The file I want to parse is a XML game file that ...
6
votes
2answers
394 views
Idea for a domain specific language or DLR port?
I have my undergraduate final year project coming up and am very interested in lexers, parsers, compilers and so on. I would like to use the DLR (.NET 4.0 dynamic language runtime) for my ...
5
votes
3answers
2k views
Parsing scripts that use curly braces
To get an idea of what I'm doing, I am writing a python parser that will parse directx .x text files.
The problem I have deals with how the files are formatted. Although I'm writing it in python, I'm ...
5
votes
2answers
1k views
What is this algorithm for converting strings into numbers called?
I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their ...
5
votes
1answer
215 views
Does a GPL Bison grammar infect my application?
I am thinking about using a GPL Bison grammar for my own compiler.
Will the grammar "infect" my parser such that it needs to be open source?
The grammar - in terms of - the input of bison is GPL.
5
votes
4answers
1k views
Programming Language Parser (in Java) - What would be a better design alternative for a special case?
Background
I'm currently designing my own programming language as a research project.
I have most of the grammar done and written down as context-free grammar, and it should be working as is. - Now ...
5
votes
1answer
496 views
Scripting custom drawing in Delphi application with IF/THEN/ELSE statements?
I'm building a Delphi application which displays a blueprint of a building, including doors, windows, wiring, lighting, outlets, switches, etc. I have implemented a very lightweight script of my own ...
4
votes
4answers
723 views
What is the responsibility or benefit of a Tokenizer?
Suppose I had a grammar like:
object
{ members }
members
pair
pair
string : value
value
number
string
string
" chars "
chars
char
char chars
number
digit
...
4
votes
3answers
3k views
What programming language is most suitable for handling unstructured data?
I'm trying to automate the application of metadata to huge amount of text, but I'm not sure what language would make this task easier (if there is one).
What programming language is most suitable ...
4
votes
2answers
283 views
Should I commit my generated parser to source control?
I'm using a parser generator to build a compiler. Should I commit the source files produced by the parser generator?
I want to commit them to avoid a dependency on the parser generator during the ...
4
votes
2answers
580 views
Learning YACC nowadays, does it make sense? [closed]
I have a huge project that is using YACC and I would need to fix a bug in it.
I might ask someone else who wrote that to fix it but I'm interested in how compilers work. Does it make sense to learn ...
4
votes
1answer
477 views
Why did GCC switch from Bison to a recursive descent parser for C++ and C?
Was there a language change that required it or some practical reason why Bison was no longer appropriate or optimal?
I saw on wikipedia that they switched, referring to the GCC 3.4 and GCC 4.1 ...