Analyzing (un)structured data to convert it into a structured, normalized format.
-3
votes
0answers
27 views
How to detect repetitive blocks in html text [on hold]
Is there an algorithm to detect repetitive blocks in html text (using the order of tags maybe )?
For example here :
<table>
<tbody>
<tr>
<td>Shady ...
3
votes
1answer
42 views
quantitatively comparing AST shapes
How could one compare the shape of abstract syntax trees of similar source code programs (C, C++, Go, or anything compiled with GCC...)?
I guess that plagiarism detection on source code would use ...
3
votes
6answers
198 views
In which layer I should implement file parsing?
in a simple multi layer architecture, in which layer do I have to implement something like parsing file.
For example: I have a file and I have to extract specific information into an object.
I think ...
1
vote
1answer
69 views
I need to parse and verify syntax of an expression [closed]
I'm trying to create a parser to construct objects from a command line.
My first approach was with StreamTokenizer, but I'm very lost with too many conditionals and I think my code is very ugly and ...
1
vote
1answer
44 views
How should I handle the fetching of cached data in iOS
I'm developing an app at work, this is my first big application and in my smaller projects I didn't use caching at all.
What's currently happening
When the user logs on for their very first time ...
1
vote
2answers
88 views
Is recursive-descent parsing a panacea for DoS threats posed by 'Evil' regexes? Or does evilness stem from the grammar?
ReDos attacks exploit characteristics of some (otherwise useful) regular expressions ... essentially causing an explosion of possible paths through the graph defined by the NFA.
So does using a ...
5
votes
1answer
210 views
How should I test the HTML output my class creates?
As a learning project, I am trying to create something similar to the WebGrid that comes with ASP.NET MVC. Now this component MyGrid<T> looks like this:
public class MyGrid<T> where T : ...
2
votes
2answers
114 views
Incorporating functions into a Shunting-Yard algorithm implementation
tl;dr What would be a simple way of incorporating functions into a Shunting-Yard algorithm implementation?
If only expressions like function(arg1, arg2, arg3) were allowed (where function is some ...
-1
votes
1answer
55 views
Global counter using iOS and Parse? [closed]
I'm very new to Parse and trying to set up what is basically a voting app where I can collect data on how many times all users have pressed a button. I found some information on Atomic Increment ...
2
votes
2answers
107 views
Algorithm to go from infix notation to a tree
I've been trying to figure out an algorithm to go from an infix equation to a syntax tree, like so:
(1+3)*4+5
+
* 5
+ 4
1 3
However, I don't just want it to handle operators, I ...
2
votes
4answers
103 views
Clean Abstract Syntax Tree
I'm writing a toy compiler for fun.
Basically, my problem is that I don't want to clutter the AST with stuff like debug information (symbol tokens, locations of tokens, etc) as well as data that the ...
2
votes
1answer
77 views
Mapping different XML and CSV feeds
Not sure if this is the right venue to be asking this but here goes.
A little background.
I'm trying to build an ecommerce app that would allow sellers from other venues--like, amazon and newegg--to ...
1
vote
2answers
121 views
Language/Programming term for paired delimiters [closed]
Can someone help me find the language and/or programming term for delimiters (?) that must be paired?
Quotes, parenthesis, angle-brackets, square-brackets, etc. are often used to symbolize these ...
2
votes
2answers
164 views
“Hand written” recursive descent parser with “catch all” rule
I'm trying to write a (scannerless) recursive descent parser with a "catch all" rule for the following "Mustache template" grammar (simplified here):
content : (variable_tag | section_tag | ...
1
vote
1answer
71 views
File validation rules
I have an application that can accept CSV files to run some operations. The files look like:
CREATE USER:username,last_name,first_name,age
user1,Smith,John,23
user2,Poppins,Mary,257
There are a ...
2
votes
0answers
63 views
Truncating HTML content at specific content blocks
I have HTML content in my DB and I would like to present a list of these individual items but truncate each of them so they're not fully displayed. I would like to keep truncated items ...
0
votes
1answer
148 views
Is it bad to implement a language in other two languages? [closed]
Ok, so I have some understanding about parsers and compilers, at least the basics of how it works, and i've written a calculator and a really small toy language that compiles to another high-level ...
2
votes
0answers
50 views
Argument over performance using Convert.ChangeType in Web Applications [duplicate]
A debate has been going on at work about using Convert.ChangeType.
A couple of fundamental assumptions to this discussion are delineated below:
1. The discussion is within the context of web ...
1
vote
2answers
74 views
Passing context around AST nodes
I have various objects inside my AST, such as IfBlock, FunctionBlock, LogicExpression, etc. All of those objects share a context, which is basically a hashmap with some variables. It's a very simple ...
1
vote
1answer
102 views
Writing a parser on top of an XML-based AST: am i doing it right?
I have a sort of AST defined in XML that i'm trying to parse and evaluate. The XML tree contains the tokens and all the information i need. However, i'm finding it difficult to do it "properly". ...
1
vote
1answer
90 views
Run a c++ program under lots of different data maps
I want to run a c++ program to process a lot of data from different xml files and output results. I run the program once per file and potentially have around 50 different files.
The trouble is each ...
0
votes
0answers
74 views
Matching groups of similar lines on a generic matching algorithm
I have to write a program to search through a file containing lines and find lines that match to a degree of tolerance but are not necessarily the same. So for example the following lines would match:
...
3
votes
1answer
177 views
Recursively parse without resorting to ugly design patterns
I'm currently building a crochet pattern parser in Java, and I've hit upon some trouble. I'll call the language used for input Crochet Pattern Code (CPC).
I have a rather large writeup on the ...
1
vote
0answers
196 views
Calculating uncompressed file size without uncompressing file in zlib
I am writing a python program which parses zip (currently only zlib, using DEFLATE compression) files and verifies the correctness of their headers and data. One of the things I'm trying to achieve is ...
2
votes
2answers
323 views
When to use ANTLR and when to use a parsing library
I've always wanted to learn how to write a compiler - I've decided to use ANTLR, and am currently reading through the book (its very good by the way)
I'm pretty new to this, so go easy, but the jist ...
0
votes
0answers
201 views
Writing Z80 table based assembler/disassembler
I have a long-term project: DIY computer with various processors. One of my wishes not only make hardware, but software too.
So I started from assembler/disassembler for Linux, though there is a lot ...
1
vote
1answer
56 views
Can table of content be parsed using some formal grammar?
A table of content can look like:
Preface
Table of Content
Chapter 1 ...
1.1 ...
1.1.1 ...
1.1.2 ....
1.2 ...
Summary
Exercises
Chapter 2 ...
...
Appendix ...
A ...
A.1 ...
A.2 ...
B ...
References
...
2
votes
0answers
67 views
Loop Unfolding and Named Significant Bits
I've been writing a Parser Compiler for the last seven or so years, and I recently got to the point (yet again, never satisfied) of structuring the portion dealing with the portions of the language ...
0
votes
3answers
411 views
Why Double.parseDouble(“ABC”) not returns Double.NaN?
This code:
Double.parseDouble("ABC")
throws a NumberFormatException.
Why is it wrong to expect a Double.NaN (NaN is literally Not-A-Number).
A working example is this:
public static void ...
-1
votes
2answers
921 views
What is the simplest and universal algorithm for parsing C++ code? [closed]
I need to do some project specific automatic checking of source codes written in C++
Limitations:
Algorithm and its implementation should be simple, easily maintainable, extendable and ...
1
vote
0answers
1k views
Best practices to parse a log file using Python
I'm writing a Python tool to parse a log file from game server. The log file is of format:
ms:classname::id::method::arg1::arg2....
There are a lot of classes, and a lot of methods for each class, ...
1
vote
0answers
68 views
Pre-Compilation Processor:
What I want to do:
Parse source code, search for a beginning and closing tag of my own definition (one that does not conflict with any defined patterns in the programming language), and then replace ...
3
votes
2answers
309 views
Process to generate hierarchical structure based on relational data?
I have a csv of employee ids, names, and a reference column with the id of their direct manager, say something like this
emp_id, emp_name, mgr_id
1,The Boss,,
2,Manager Joe,1
3,Manager Sally,1
...
-2
votes
2answers
104 views
file quantity limit in a directory on a linux file server and why?
What is a good limit to use on the quantity of files in a directory, and why?
EDIT:
Why shouldn't someone create a system that puts hundreds of thousands of files in the same directory?
Why I ask:
...
0
votes
2answers
185 views
BNF parsing rule for left associativity
Can someone please assist me with the following question.
Write a BNF rule to parse into
C -> E
C -> E && E
C -> E && E && E
so that C generates as many E ...
25
votes
5answers
3k views
Name for this type of parser, OR why it doesn't exist
Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments ...
2
votes
2answers
285 views
Building a string parser for user command and control?
My goal is to build a command parser that has basic syntax and multiple possible branches at each point. These commands come from users of the system and are text input (no GUI). The basic syntax is ...
7
votes
2answers
659 views
In layman's terms, what is left recursion?
According to one page on code.google.com, "left recursion" is defined as follows:
Left recursion just refers to any recursive nonterminal that, when it produces a sentential form containing ...
0
votes
1answer
63 views
What is latent parsing in NLP?
I'm reading a paper describing a NLP work, and I hardly catch the concept of a term, "latent parsing".
Original paper: http://web.stanford.edu/~angeli/papers/2013-acl-temporal.pdf
The figure in the ...
7
votes
1answer
971 views
Why did GCC switch from Bison to a recursive descent parser for C++ and C?
Was there a language change that required it or some practical reason why Bison was no longer appropriate or optimal?
I saw on wikipedia that they switched, referring to the GCC 3.4 and GCC 4.1 ...
7
votes
3answers
5k views
How exactly is an Abstract Syntax Tree created?
I think I understand the goal of an AST, and I've built a couple of tree structures before, but never an AST. I'm mostly confused because the nodes are text and not number, so I can't think of a nice ...
3
votes
1answer
74 views
How to parse different number types with LALR(1)
Consider a LALR(1) parser for a file format that allows integer numbers and floating point numbers.
As usual, something like 42 shall be a valid integer and a valid float (with some automagic ...
0
votes
1answer
42 views
Rule order for Parsing Lists with LALR(1)
When creating the grammar for parsing a list (something like “ITEM*”) with a LALR(1) parser, this basically can be done in two ways:
list
: list ITEM
|
;
or
list
: ITEM list
|
...
31
votes
2answers
3k views
Do modern languages still use parser generators?
I was researching about the gcc compiler suite on wikipedia here, when this came up:
GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written ...
0
votes
2answers
147 views
Large csv's to html report
I am working on a web front end + front end services.
I receive good sized csv files (10k lines). My service processes them and condenses them into one larger csv file (up to 300k lines).
This ...
3
votes
2answers
143 views
Need overview of concepts and tools to translate a DSL to regular expressions
I'm looking for a little guidance. Until this morning, this was all over my head. After spending today researching Wikipedia, StackOverflow, etc., I'd say I've got my nose above the water. I'm tasked ...
-1
votes
1answer
53 views
How to keep AST for feature access?
Consider such code (let's say it is C++)
Foo::Bar.get().X
How one should keep the AST for this -- as "tree" with root at left Foo(Bar(get(X)), or with root at right (((Foo)Bar)get)X? Or maybe as a ...
1
vote
3answers
639 views
Parsing mathematical expressions with two values that have parentheses and minus signs
I'm trying to parse equations like the ones below, which only have two values or the square root of a certain value, from a text file:
100+100
-100-100
-(100)+(-100)
sqrt(100)
by the minus ...
0
votes
2answers
262 views
High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)
Disclaimer On Absence of Code:
I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on ...
4
votes
4answers
1k views
What is the responsibility or benefit of a Tokenizer?
Suppose I had a grammar like:
object
{ members }
members
pair
pair
string : value
value
number
string
string
" chars "
chars
char
char chars
number
digit
...