Newest 'data-processing' Questions

3

votes

4answers

90 views

How to read 4GB file on 32bit system

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit ...

asked 6 hours ago

bioky
161

1

vote

1answer

23 views

Rounding with awk -0.0

I am using awk to round floating values in a csv file using (in a pipe) awk '{$0=sprintf("%.2f",$1)}1' This works basically fine, but has the problem that it produces both "0.00" and "-0.00" ...

csv awk data-processing

asked Aug 2 at 23:29

user1583209
697

1

vote

2answers

39 views

read in first 3 (out of 8) columns using lapply() in R

I know there have been similar questions answered but i cannot apply them to the following: I have text files I am trying to read into R: filelist = list.files(pattern = paste0("*_",str_sub(stock1, ...

r import lapply read.table data-processing

asked Jul 20 at 3:17

Rime
506

0

votes

0answers

19 views

clustering missing value indicator values to capture missing value patterns

I am doing some data preparation with Python using Pandas and I am working with a dataset that has about 80 variables with missing values and I want to capture any patterns of missingness to cut down ...

python pandas data-processing

asked Jul 19 at 13:13

statsnewb
5619

-2

votes

1answer

23 views

How send json's data via curl?

I have some simple code, like this: import json from bottle import route, request,run @route('/process_json',methods='POST') def data_process(): data = json.loads(request.data) username = ...

python json curl data-processing

asked Jul 14 at 10:28

user3330256
61

0

votes

1answer

19 views

Mapping financial data from multiple vendors to match internal formats and naming convention

I have a concern which I believe might be a good subject for the archives, as I imagine many people may encounter a similar problem at some point in their careers. I'm looking for any/all suggestions, ...

sql-server database excel data-processing data-cleansing

asked Jul 11 at 16:14

user3610077
32

2

votes

1answer

52 views

Extracting an html table in another language using R

I am using R to extract HTML Tables from a website. However, the language for the HTML Table is in Hindi and the text is displayed as unicodes. Any way where I can set/install the font family and get ...

r data-processing

asked Jul 7 at 7:08

user2866631
386

-2

votes

0answers

18 views

Parse large data files with javascript

I want to extract data from large plaintext files which from electronic structure simulations for a browser based visualization. The files are extremely in-homogenous since log-messages and actual ...

javascript parsing data-processing

asked Jul 1 at 15:04

sonium
483

1

vote

4answers

401 views

data processing pipeline tool for research

I'm wondering if there is a tool for automating complex data processing pipelines on large datasets. Sort of like shell command piping (e.g. cmd1 | cmd2 | cmd3 > file), but supporting more than ...

multithreading nlp pipeline research data-processing

asked Jun 29 at 19:26

user3788259
61

0

votes

0answers

21 views

Node.js data processing distribution

I'm in need of a strategy to distribute data processing using node.js. I'm trying to figure out if using a worker pool and isolate groups of tasks in these workers is the best way, or using a ...

multithreading node.js parallel-processing etl data-processing

asked Jun 27 at 5:31

Skoog
1479

0

votes

0answers

33 views

What are the available missing values treatment method in Weka?

Currently I am only able to find three types of missing values treatment methods under the "Preprocess" stage in Weka. They are the "ReplaceMissingValues", "ReplaceMissingWithUserConstant" and ...

machine-learning data-mining weka missing-data data-processing

asked Jun 26 at 7:26

user3778195
11

1

vote

0answers

25 views

Is there a way to check if an integer's string representation contains a zero by using bitwise operations?

In C# I've been trying to come up with an interesting way to basically accomplish the following, but without using the string representation. private static bool HasZeroDigit(int value) { string ...

language-agnostic bit-manipulation data-processing

asked Jun 23 at 22:30

Michael J. Gray
3,59941941

0

votes

3answers

56 views

Low level file processing in ruby/python

So I hope this question already hasn't been answered, but I can't seem to figure out the right search term. First some background: I have text data files that are tabular and can easily climb into ...

python r dataset fortran data-processing

asked Jun 18 at 20:21

lswim
3671310

0

votes

1answer

81 views

data processing pipeline python

I am working on the following problem. Lets say I have data (say image values RGB as integers) in a file per line. I want to read 10000 of these lines and make a frame object (image frame containing ...

python pipeline data-processing

asked Jun 5 at 1:29

kaaliakahn
31

1

vote

1answer

25 views

Generate popular subjects from collection of post titles

I have a content aggregator website. I'd like to process the post titles to generate a list of the most popular post subjects. A subject could be "software development" however an important point is ...

php algorithm data-mining data-processing

asked May 29 at 23:27

iamjonesy
4,1331771143

3

votes

1answer

75 views

Custom Floating Point Representation

I'm trying to write a parser that will read a particular file type, and I need to map the different data types to C# equivalents. Most of them aren't that difficult, but I'm having trouble wrapping my ...

c# .net parsing floating-point data-processing

asked May 28 at 2:43

Abion47
144113

1

vote

1answer

52 views

Pandas Dataframe selecting groups with minimal cardinality

I have a problem where I need to take groups of rows from a data frame where the number of items in a group exceeds a certain number (cutoff). For those groups, I need to take some head rows and the ...

python pandas dataframes data-processing

asked May 18 at 6:28

Run2
728

0

votes

2answers

50 views

Lexicon dictionary for synonym words

There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all ...

dictionary nlp stanford-nlp data-processing text-classification

asked May 17 at 10:27

Programming_crazy
556114

0

votes

1answer

132 views

Data processing with adding columns dynamically in Python Pandas Dataframe

I have the following problem. Lets say this is my CSV id f1 f2 f3 1 4 5 5 1 3 1 0 1 7 4 4 1 4 3 1 1 1 4 6 2 2 6 0 .......... So, I have rows which can be grouped by id. I want to ...

python pandas dataframes data-processing

asked May 10 at 10:40

Run2
728

11

votes

3answers

426 views

Plotting many lines as a heatmap

I have a large number (~1000) of files from a data logger that I am trying to process. If I wanted to plot the trend from a single one of these log files I could do it using ...

matlab plot data-processing logfile-analysis

asked Apr 29 at 5:30

Hugoagogo
618415

0

votes

1answer

27 views

How to use supervised machine learning methods working on variant input dimensions?

So basically I am dealing with a training and test data set (a bunch of arrays) with unequal length like these: a: {true, [1,3, 4, 5, 5, 8 ,10 ,10]} b: {true, [1,3, 25, 18 ,1 ,10]} c: {false, [1, 8 ...

machine-learning data-processing lcs supervised-learning

asked Apr 28 at 3:13

computereasy
809413

1

vote

3answers

104 views

Need Better Algorithm to Scrub SQL Server Table with Java

I need to scrub an SQL Server table on a regular basis, but my solution is taking ridiculously long (about 12 minutes for 73,000 records). My table has 4 fields: id1 id2 val1 val2 For every group of ...

java sql-server query-performance data-processing data-scrubbing

asked Apr 14 at 19:54

Palladium
277

0

votes

1answer

57 views

How to write awk command to group line data and dump to file

One data file consists of multiple line data. A quick look of data file is like: ./gc_string/datadata.distr 10 1273377106 2 ./gc_string/datadata.distr 10 -540812264 2 ...

linux bash awk data-processing

asked Apr 12 at 0:42

shijie xu
70110

1

vote

0answers

38 views

MATLAB remove lead and lag data from variable

I am working in a MATLAB loop with data variables (in column form) that are pulled in from excel files. Each iteration of the loop opens a new file and repeats its process. Inside each file, I have ...

matlab data-processing

asked Apr 8 at 18:40

Nate B.
235

0

votes

2answers

63 views

Data normalization for new inputs into a trained neural network

I have a backpropagation neural network that I have created and coded it in Q with a Kdb+ database. I am pre-processing data into the network with normalization into the form of [0,1], the network is ...

neural-network normalization theory data-processing neural-network-tuning

asked Apr 3 at 9:22

teabigs
1

0

votes

1answer

55 views

Can Datomic simplify querying data contained in dynamically accessed HTML documents?

I need to write an API which would provide access to data being served as HTML documents from a web server. I need for my users to be able to perform queries over the data. Say on a web site there is ...

clojure screen-scraping data-processing datomic

asked Apr 1 at 20:10

user7610
820820

0

votes

2answers

123 views

How to check is there any error in DhtmlxGrid using Dataprocessor?

I want to send data from DhtmlxGrid in my MVC project. I have set some basic validation on grid cells which are working fine. But before submitting i want to check if is there any error in the grid. ...

validation dhtmlx data-processing

asked Apr 1 at 5:27

Anupam Roy
181111

4

votes

1answer

41 views

PHP Data Processing Failing With Ambiguous Error

The user requests a Product Category and says what quantity they want of it, ie Sugar 7 lbs. In the search results, from the database, I have the following items (which are each different products): ...

php data-processing

asked Feb 26 at 21:36

Jen Born
191111

0

votes

1answer

80 views

Processing lots of data in python, should I use multiple threads/processes?

I am writing a program to process a huge file (~1.5GB). I am running Python 2.7 on a Windows 7 computer with a pretty good cpu (8 cores). Would it be more efficient in any way to use multiple threads ...

python multithreading python-2.7 multiprocessing data-processing

asked Feb 21 at 23:43

ethg242
998

0

votes

1answer

77 views

When to apply Data whitening

Data Whitening (features scaling and mean normalization) is very useful when we use features that represent different characteristics and are on very different scales (eg number of rooms in a house ...

machine-learning data-mining data-processing

asked Feb 18 at 10:10

teaLeef
3587

1

vote

1answer

93 views

Can I speed up a large data set operation in SQLite / Python?

I have a data set in the size range 1-5 billion 'box' objects stored in an SQLite database file in the format: [x1,y1,z1,x2,y2,z2,box_id] and currently I have an operation in a python script that ...

python sqlite large-data data-processing

asked Jan 29 at 11:20

Jenny_Winters
153110

2

votes

1answer

112 views

I Need To Search a “dirty” text file in R and count the instances of a certain character

The data is called homicides.txt I need to make a function count <- function(cause=Null) which returns a certain integer There are only a few acceptable causes, which if not present the function is ...

string r parsing counting data-processing

asked Jan 28 at 1:13

user3242673
111

0

votes

1answer

97 views

What is a Data warehouse in this use case

I'm trying to figure out the difference (between tools/services/programs) between Data Warehouse, Clustered Data Processing and the tools/infrastructure for querying a Data Warehouse So Let's say I ...

hadoop mapreduce hive data-warehouse data-processing

asked Jan 27 at 4:32

uhsarp
4881025

0

votes

1answer

27 views

Speeding up document processing and loading into database

I have a few million documents. What I am trying to do is simple, process the documents to extract the information I need and load it into a database. I am doing it in Python and using SQLAlchemy. ...

python relational-database data-processing data-loading

asked Jan 20 at 19:20

y2p
1,06241533

0

votes

0answers

91 views

Appropriate data processing design pattern?

I'm looking for an appropriate design pattern to accomplish the following: I want to extract some information from some "ComplexDataObject" (e.g. an Image) and save the relevant information in a more ...

python design-patterns data-processing

asked Jan 20 at 9:13

Mikael Call
5517

2

votes

2answers

396 views

Aggregate Functions over a List in JAVA

I have a list of Java Objects and I need to reduce it applying Aggregate Functions like a select over a DataBase. NOTE: The data were calculated from multiples Databases and services calls. I expect ...

java data mapreduce data-processing

asked Jan 9 at 12:40

Diego D
550620

1

vote

2answers

65 views

OrderBy when a parent-value maybe null

Assume I want to order table q in T, by column q.As.OrderByDescending(p => p.Beginning).FirstOrDefault().B.C. However, q.As.OrderByDescending(p => p.Beginning).FirstOrDefault() or a.B may be ...

c# linq linq-to-sql order data-processing

asked Jan 8 at 15:21

DatVM
2,15032561

1

vote

3answers

456 views

Hibernate out of memory exception while processing large collection of elements

I am trying to process collection of heavy weight elements (images). Size of collection varies between 8000 - 50000 entries. But for some reason after processing 1800-1900 entries my program falls ...

java performance hibernate out-of-memory data-processing

asked Jan 1 at 14:13

nikopol86
21929

16

votes

3answers

450 views

How to smooth a curve in the right way?

Lets assume we have a dataset which might be given approximately by import numpy as np x = np.linspace(0,2*np.pi,100) y = np.sin(x) + np.random.random(100) * 0.2 Therefore we have a variation of ...

python numpy scipy signal-processing data-processing

asked Dec 16 '13 at 19:06

varantir
447416

0

votes

1answer

26 views

Running code (loop) server side and retrieving output later on

I am trying to do a simple program that keeps track of some internet data. I can get the data from a public JSON object, so that's not really the problem. I would like to automize the process as much ...

multithreading data-processing

asked Dec 8 '13 at 4:18

Fiire
483210

0

votes

0answers

53 views

Trying to process raw string (rank of countries by GDP) with python for other uses

I'm pretty new to this so sorry if this is a dumb question. I'm trying to sort some data. Here's a rank of countries by GDP for example, that I'd like to find percentages of, add up certain amounts ...

python data-processing

asked Nov 29 '13 at 15:25

user3049865
1

0

votes

1answer

138 views

calculate min/avg/max/std-dev for ICMP time stamp data from hping [closed]

What's the best way to calculate min/avg/max/std-dev for some random data in shell? What if one has several columns per line, and needs to calculate the statistics for each one? Sample input (based ...

perl data awk ping data-processing

asked Nov 25 '13 at 7:41

cnst
1,604728

2

votes

2answers

668 views

solutions for cleaning/manipulating big data (currently using Stata)

I'm currently using a 10% sample of a very large dataset (10 vars, over 300m rows) which amounts to over 200 GB of data when stored in .dta format for the full dataset. Stata is able to handle ...

sql r bigdata stata data-processing

asked Nov 21 '13 at 17:39

user3018549
314

1

vote

2answers

435 views

Conditional merge for CSV files using python (pandas)

I am trying to merge >=2 files with the same schema. The files will contain duplicate entries but rows won't be identical, for example: file1: store_id,address,phone 9191,9827 Park st,999999999 ...

python csv pandas data-processing

asked Nov 19 '13 at 0:04

zengr
14.5k1964132

0

votes

2answers

137 views

Is a relational database appropriate for SAS like processing?

Currently I have a program that processes raw data in SAS, running queries like the following: /*this code joins the details onto the spine, selecting the details that have the lowest value2 that ...

sql relational-database sas data-processing

asked Nov 1 '13 at 3:05

dwjohnston
6821225

1

vote

1answer

241 views

C# Signal Processing Plotting Rapid Data

I have a circuit that sends me two different data from sensors. Data is coming as packets. First data is '$' to separate one packet to another. After '$' it sends 16 bytes microphone data and 1 byte ...

c# plot signal-processing zedgraph data-processing

asked Oct 11 '13 at 8:42

Blast
13812

0

votes

2answers

83 views

Tools to do data processing from Java

I've got a legacy system that uses SAS to ingest raw data from the database, cleanse and consolidate it, and then score the outputted documents. I'm wanting to move to a Java or similar object ...

java sql hadoop bigdata data-processing

asked Oct 11 '13 at 1:37

dwjohnston
6821225

1

vote

2answers

129 views

Convert python dictionary to flowchart

I have a program that will generate a very large dictionary-style list that would look something like this: {"a":"b", "b":"c", "C":"d", "d":"b", "d":"e"} I would like to create a program using ...

python list dictionary bigdata data-processing

asked Sep 28 '13 at 2:30

TheDoctor
2841213

0

votes

1answer

265 views

How to handle time series data with other attributes in machine learning?

I'm working on a binary classification problem, and if each data instance has several time series of different metrics and there're also some other attributes. How to deal with the time series, treat ...

machine-learning data-mining data-processing

asked Sep 17 '13 at 15:13

user1552372
314

0

votes

1answer

114 views

How do I perform koyck lag transformations in PMML?

I'm using PMML to transfer my models (that I develop in R) between different platforms. One issue I often face is that given input data I need to do a lot of pre-processing. Most times this is rather ...

r data-processing pmml

asked Sep 9 '13 at 16:44

Dr. Mike
867320

your communities

Tagged Questions

Related Tags