The text-processing tag has no wiki summary.
3
votes
3answers
473 views
Why is quantity in software still written as “1 result(s)”?
Lately, I've been noticing that a lot of software, be it a website, a client application, or a video game, often write a representation of quantity as follows: "1 result(s)". Now, I can understand why ...
-1
votes
1answer
32 views
how do i mine concepts and relations from text document
I am doing project on text mining for ontology learning.I have to extract concept and relations from text. I am trying to implement this in java. Eventhough i am able to extract concepts and ...
0
votes
0answers
36 views
text tag extraction algorythms
I want smart ways that I can generate tags(or main keywords) of texts by application. I don't want a simple way that remove stop words then select keyword by their reputations.
One example that I ...
3
votes
1answer
56 views
How advanced are author-recognition methods?
From a written text by an author if a computer program analyses the text, how much can a computer program tell today about the author of some (long enough to be statistically significant) texts?
Can ...
1
vote
1answer
160 views
Custom Alphabetic Sorting of Array in Java
I have a requirement to read a text file with lines in tag=value format and then output the file with specific tags listed first and the rest sorted alphabetically. The incoming file is randomly ...
0
votes
2answers
273 views
How does Facebook strip html/apostrophes for XSS but also display it?
I'm not quite sure if this is a question for programmers.se rather than stackoverflow, but here goes. So Facebook [or any other large company] when given something like an apostrophe or html, can ...
2
votes
3answers
215 views
Domain-specific language for text search/processing?
I work for an organization that does a lot of work with government data. We have a couple of different projects where we've abstracted out common text search/manipulation operations into reusable ...
3
votes
1answer
177 views
How to process an endless XML data stream
There is an endless data stream of XML messages (and "heartbeats"), that I receive via a telnet connection and through a site-to-site VPN IPsec tunnel.
I'm still pondering. What is the best/most ...
6
votes
1answer
295 views
Finding occurrences of a useful words and phrases in strings
I am building an app that analyzes posts by people by pulling their Tweets and Facebook posts. I need to process all the posts and find useful phrases. What I mean by useful is that, any word or ...
8
votes
4answers
249 views
How should I implement a command processing application?
I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number.
Example:
I start with 1. Then I write "add 2", it gives me 3. Then I ...
-1
votes
2answers
171 views
Can non-IT people learn and take advantage of regular expressions? [closed]
Often times, not-IT people has to deal with massive text data, clean it, filter it, modify it. Often times normal office tools like Excel lack the tools to make complex search and replace operations ...
2
votes
1answer
675 views
Best Practice - XML To Excel
I've to read a big XML file with a lot of information. Afterwards I extract the needed information (~20 Points(columns) / ~80 relevant Data (rows, some of them with subdatasets) and write them out in ...
6
votes
1answer
186 views
Tools for modelling data and workflows using structured text files
Consider a case when I want to try some idea of an application. But I want to avoid investing a lot of effort in coding UI/work flows/database schema etc before I see that it's going to be useful to ...
3
votes
3answers
545 views
Separating words in a string
How do I separate words in a string?
In the following I have a random sample of words in a string extracted from text file with over a million words.
Here's the string:
"intervene Pockets ...
-1
votes
1answer
196 views
Algorithm to garble text based upon a weight [closed]
Say I have a weighted range of 0 - 10. 0 being no garbled, 10 being 100% garbled.
I'm looking for an algorithm that will garble plain text based upon this weight. The garbling doesn't need to be ...
4
votes
1answer
392 views
Text comparison algorithm using java-diff-utils
One of the features in our project is to implement a comparison algorithm between two versions of text and provide a % change between the two versions. While I was researching, I came across google ...
1
vote
4answers
127 views
Possible applications of algorithm devised for differentiating between structured vs random text
I have written a program that can rapidly (within 5 sec on a 2GB RAM desktop, 2.33 Ghz CPU) differentiate between structured text (e.g English text) and random alphanumeric strings. It can also ...
2
votes
6answers
1k views
Which programming language for text editing?
I need a programming language for text editing and processing (replace, formatting, regular expressions, string comparison, word processing, text analysis, etc.). Which programming language is more ...
0
votes
3answers
221 views
What is a good parsing reference? [closed]
I am working on a project that needs parsing and text processing functionality.
I searched the web about parsing and I found that my best choice for parsing is python.
What is a good, fast, and ...
3
votes
2answers
266 views
Available options for classifying words in text?
I am researching ways to classify words in text and I'm wondering what options there are and which are best suited to this job. I'm mostly interested in keywords which are most often nouns.
So far I ...
3
votes
2answers
583 views
Product classifying algorithm - text classification - C# - algorithm suggestions
Alright people. Finally with the help of stackoverflow community i have gathered 20 commercial product selling websites product pages with the following features
Product URL
Product Price
Product ...
4
votes
5answers
2k views
How can I extract words from a sentence and determine what part of speech each is? [closed]
I want to write something that takes a sentence and identifies each word it contains and defines what part of speech each word is.
For example
Hello World, I am a sentence
would return this
...
0
votes
1answer
742 views
How to load text files into memory-mapped files
I have a number of large text files that I need to manipulate in a highly performant manner. I've decided to look into using Memory Mapped files in C# (.NET 4). However, I can't find any examples of ...
3
votes
2answers
210 views
Best algorithm to correlate similar articles
which is the best way to correlate and group similar articles?
I mean something like Google News, which groups under a single topic different articles from different sources.
I'm not interested in ...
2
votes
3answers
1k views
What programming language is most suitable for handling unstructured data?
I'm trying to automate the application of metadata to huge amount of text, but I'm not sure what language would make this task easier (if there is one).
What programming language is most suitable ...
1
vote
2answers
2k views
Fastest C++ XML parsing library
I have thousands of .xml files from size 1MB-45MB (no DTDs). I need to parse and further manipulate these XML files before generating separate .xml files with the results of my regex.
What the ...
2
votes
2answers
204 views
Inserting copyright notice
What is the easiest way to insert copyright notice in lots of PHP files.
It's not possible to do it manually.
0
votes
2answers
195 views
Optimizing sorting large amounts of text stored in a database
How would you store text information - in a relational database or maybe using NoSQL? The problem is that the text should be divided into various parts, each of which satisfies to some requirement, ...