Analyzing (un)structured data to convert it into a structured, normalized format.
0
votes
0answers
45 views
Matching groups of similar lines on a generic matching algorithm
I have to write a program to search through a file containing lines and find lines that match to a degree of tolerance but are not necessarily the same. So for example the following lines would match:
...
2
votes
1answer
110 views
Recursively parse without resorting to ugly design patterns
I'm currently building a crochet pattern parser in Java, and I've hit upon some trouble. I'll call the language used for input Crochet Pattern Code (CPC).
I have a rather large writeup on the ...
1
vote
0answers
59 views
Calculating uncompressed file size without uncompressing file in zlib
I am writing a python program which parses zip (currently only zlib, using DEFLATE compression) files and verifies the correctness of their headers and data. One of the things I'm trying to achieve is ...
2
votes
2answers
117 views
When to use ANTLR and when to use a parsing library
I've always wanted to learn how to write a compiler - I've decided to use ANTLR, and am currently reading through the book (its very good by the way)
I'm pretty new to this, so go easy, but the jist ...
0
votes
0answers
58 views
Writing Z80 table based assembler/disassembler
I have a long-term project: DIY computer with various processors. One of my wishes not only make hardware, but software too.
So I started from assembler/disassembler for Linux, though there is a lot ...
1
vote
1answer
55 views
Can table of content be parsed using some formal grammar?
A table of content can look like:
Preface
Table of Content
Chapter 1 ...
1.1 ...
1.1.1 ...
1.1.2 ....
1.2 ...
Summary
Exercises
Chapter 2 ...
...
Appendix ...
A ...
A.1 ...
A.2 ...
B ...
References
...
2
votes
0answers
63 views
Loop Unfolding and Named Significant Bits
I've been writing a Parser Compiler for the last seven or so years, and I recently got to the point (yet again, never satisfied) of structuring the portion dealing with the portions of the language ...
0
votes
3answers
216 views
Why Double.parseDouble(“ABC”) not returns Double.NaN?
This code:
Double.parseDouble("ABC")
throws a NumberFormatException.
Why is it wrong to expect a Double.NaN (NaN is literally Not-A-Number).
A working example is this:
public static void ...
-2
votes
2answers
431 views
What is the simplest and universal algorithm for parsing C++ code? [closed]
I need to do some project specific automatic checking of source codes written in C++
Limitations:
Algorithm and its implementation should be simple, easily maintainable, extendable and ...
1
vote
0answers
258 views
Best practices to parse a log file using Python
I'm writing a Python tool to parse a log file from game server. The log file is of format:
ms:classname::id::method::arg1::arg2....
There are a lot of classes, and a lot of methods for each class, ...
2
votes
0answers
268 views
Haskell, Rust, or D for POSIX shell implementation? [closed]
I am planning on writing my own Bourne shell. It will be full-featured, capable of being used as a system /bin/sh.
The shell will be implemented very differently from other Bourne shells, however. ...
1
vote
0answers
59 views
Pre-Compilation Processor:
What I want to do:
Parse source code, search for a beginning and closing tag of my own definition (one that does not conflict with any defined patterns in the programming language), and then replace ...
3
votes
2answers
112 views
Process to generate hierarchical structure based on relational data?
I have a csv of employee ids, names, and a reference column with the id of their direct manager, say something like this
emp_id, emp_name, mgr_id
1,The Boss,,
2,Manager Joe,1
3,Manager Sally,1
...
-2
votes
2answers
91 views
file quantity limit in a directory on a linux file server and why?
What is a good limit to use on the quantity of files in a directory, and why?
EDIT:
Why shouldn't someone create a system that puts hundreds of thousands of files in the same directory?
Why I ask:
...
0
votes
2answers
107 views
BNF parsing rule for left associativity
Can someone please assist me with the following question.
Write a BNF rule to parse into
C -> E
C -> E && E
C -> E && E && E
so that C generates as many E ...
22
votes
5answers
3k views
Name for this type of parser, OR why it doesn't exist
Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments ...
2
votes
2answers
226 views
Building a string parser for user command and control?
My goal is to build a command parser that has basic syntax and multiple possible branches at each point. These commands come from users of the system and are text input (no GUI). The basic syntax is ...
6
votes
2answers
446 views
In layman's terms, what is left recursion?
According to one page on code.google.com, "left recursion" is defined as follows:
Left recursion just refers to any recursive nonterminal that, when it produces a sentential form containing ...
0
votes
1answer
53 views
What is latent parsing in NLP?
I'm reading a paper describing a NLP work, and I hardly catch the concept of a term, "latent parsing".
Original paper: http://web.stanford.edu/~angeli/papers/2013-acl-temporal.pdf
The figure in the ...
4
votes
1answer
530 views
Why did GCC switch from Bison to a recursive descent parser for C++ and C?
Was there a language change that required it or some practical reason why Bison was no longer appropriate or optimal?
I saw on wikipedia that they switched, referring to the GCC 3.4 and GCC 4.1 ...
3
votes
3answers
712 views
How exactly is an Abstract Syntax Tree created?
I think I understand the goal of an AST, and I've built a couple of tree structures before, but never an AST. I'm mostly confused because the nodes are text and not number, so I can't think of a nice ...
3
votes
1answer
70 views
How to parse different number types with LALR(1)
Consider a LALR(1) parser for a file format that allows integer numbers and floating point numbers.
As usual, something like 42 shall be a valid integer and a valid float (with some automagic ...
0
votes
1answer
37 views
Rule order for Parsing Lists with LALR(1)
When creating the grammar for parsing a list (something like “ITEM*”) with a LALR(1) parser, this basically can be done in two ways:
list
: list ITEM
|
;
or
list
: ITEM list
|
...
28
votes
2answers
2k views
Do modern languages still use parser generators?
I was researching about the gcc compiler suite on wikipedia here, when this came up:
GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written ...
0
votes
2answers
94 views
Large csv's to html report
I am working on a web front end + front end services.
I receive good sized csv files (10k lines). My service processes them and condenses them into one larger csv file (up to 300k lines).
This ...
3
votes
2answers
128 views
Need overview of concepts and tools to translate a DSL to regular expressions
I'm looking for a little guidance. Until this morning, this was all over my head. After spending today researching Wikipedia, StackOverflow, etc., I'd say I've got my nose above the water. I'm tasked ...
-1
votes
1answer
48 views
How to keep AST for feature access?
Consider such code (let's say it is C++)
Foo::Bar.get().X
How one should keep the AST for this -- as "tree" with root at left Foo(Bar(get(X)), or with root at right (((Foo)Bar)get)X? Or maybe as a ...
1
vote
3answers
349 views
Parsing mathematical expressions with two values that have parentheses and minus signs
I'm trying to parse equations like the ones below, which only have two values or the square root of a certain value, from a text file:
100+100
-100-100
-(100)+(-100)
sqrt(100)
by the minus ...
0
votes
0answers
33 views
Parsing a website's source [duplicate]
I want to create an application and maybe upload it to the play store but I am not sure if that what my app does is legal or not.
I am downloading a page's source from a website to get some ...
0
votes
2answers
167 views
High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)
Disclaimer On Absence of Code:
I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on ...
4
votes
4answers
775 views
What is the responsibility or benefit of a Tokenizer?
Suppose I had a grammar like:
object
{ members }
members
pair
pair
string : value
value
number
string
string
" chars "
chars
char
char chars
number
digit
...
1
vote
2answers
224 views
Lexer/Parser for multidimensional Languages
How does Lexer/Parser work in a 2D programming language like Funciton in order to transform such an unusual source-code to the correct AST?
0
votes
3answers
267 views
Do markup languages have the equivalent concept of `semantics` that you can find in C or C++?
Maybe I miss something, but do the so called markup languages have the equivalent concept of semantics that you can find in C or C++?
Judging from how you parse the language, you don't really have ...
2
votes
1answer
214 views
How and when should I design a simple mark-up language parser? [closed]
I want to write a simple markup language with its rendering engine.
First, I am not completely sure when I should try this... I am only 12... But I am competent in C++ having learned through the Web ...
0
votes
2answers
137 views
Implementing a first basic interpreter: what should I learn first? [duplicate]
I'm about to implement my own very simple programming language, and an interpreter to execute code in that language.
The language will be very basic. Example code:
var x = 3
if x > 2 print x
if x ...
1
vote
2answers
231 views
Traversing an AST using Visitors
I'm writing a compiler for a C-like language, and I'm looking for an elegant way to traverse my abstract syntax tree. I'm trying to implement the Visitor pattern, although I'm not convinced that I'm ...
0
votes
0answers
40 views
Dedupe while or after write
I have a summary tool written in python that reads input files, and writes them into a summary file. I have the following stipulations:
No duplicates.
If it exists, add a count to it.
Is it ...
0
votes
2answers
90 views
Storing tokens during lexing stage
I am currently implementing a lexer that breaks XML files up into tokens, I'm considering ways of passing the tokens onto a parser to create a more useful data structure out of said tokens - my ...
-2
votes
1answer
293 views
What is the right way to parse HTML? [closed]
I've heard, that parsing HTML using the Cthulhu way is not very good. But what are the right ways to parse HTML? Or is it possible to parse it at all?
0
votes
0answers
55 views
Using JavaScript to find the correct offset in bundled files
I am currently making multiple parsers using PEGjs and have implemented my own partial preprocessor which using a RegExp finds and replaces '#include' directives with the desired files, resulting in a ...
6
votes
2answers
357 views
Slight extension for SQL prepared statements syntax. Need advice
In my database abstraction library I am extending SQL prepared statements syntax to hint a parser with expected literal type. I take it as a very essential improvement, my reasoning you can read here. ...
0
votes
1answer
129 views
Making a sldprt to PDB file converter?
I wanted to create a parser that can read a solidworks file and turn it into a protein data bank file. This has already been done in a program called DiamondCAD. ...
1
vote
1answer
218 views
About AST construction in LL1 non recursive parser
I have implemented a LL1 parser in a non recursive approach with a explicit stack.
The following algorithm is from the Dragon Book:
set zp to point to the first symbol of w;
set X to the top stack ...
4
votes
1answer
265 views
How to line-break an email address?
After some discussion, I have come across a rather complicated situation. Say I intend to display an email address. I have, obviously, limited space available on the screen - be that browser or ...
1
vote
1answer
1k views
Using python to parse log files? [closed]
My first useful projects as a programmer has been python scripts that parse out relevant information from log files and do some analysis. I've bumped around and found my way to some functional ...
0
votes
0answers
287 views
Parsing Razor-style Templates
I want to build a template engine (ITT not another template engine...) based on Razor.
I've been at it for quite a long time not getting anywhere and quite frankly I'm at my limit. I've tried rolling ...
-1
votes
1answer
167 views
Suggestion on how to fill a web form (several times) [closed]
I need to fill a form using data from a CSV file. I was planning to use CURL+PHP to do it, but then I realized the form has several steps (one on each page), plus it uses javascript to fill hidden ...
1
vote
2answers
189 views
Parser and interpreter knowledge as a way to gauge programmer ability [closed]
This is only anecdotal evidence but from my past encounters with programmers at various workplaces the programmers that understand the fundamentals of parsing and interpreting seem to be overall ...
6
votes
3answers
422 views
Clarification about Grammars , Lexers and Parsers
Background info (May Skip): I am working on a task we have been set at uni in which we have to design a grammar for a DSL we have been provided with. The grammar must be in BNF or EBNF. As well as ...
1
vote
0answers
48 views
Compound container for storing output from a parser
Consider a hierarchical structure like this
directory
[
[item_0;item_1;...;item_n]
[item_0;item_1;inner_directory[...];item_m]
other_directory
[...]
]
...