The tag has no wiki summary.

learn more… | top users | synonyms

1
vote
3answers
36 views

multi-column factorize in pandas

The pandas factorize function assigns each unique value in a series to a sequential, 0-based index, and calculates which index each series entry belongs to. I'd like to accomplish the equivalent of ...
1
vote
1answer
33 views

Fill in missing pandas data with previous non-missing value, grouped by key

I am dealing with pandas DataFrames like this: id x 0 1 10 1 1 20 2 2 100 3 2 200 4 1 NaN 5 2 NaN 6 1 300 7 1 NaN I would like to replace each NAN 'x' with the ...
1
vote
1answer
21 views

notepad++: keep regex (multi occurence per line) and line structure, remove other characters

I have a 130k line text file with patent information and I just want to keep the dates (regex "[0-9]{4}-[0-9]{2}-[0-9]{2} ") for subsequent work in Excel. For this purpose I need to keep the line ...
3
votes
3answers
80 views

Performing Operations on a Subset Using Data Table

I have a survey data set in wide form. For a particular question, a set of variables was created in the raw data to represent different the fact that the survey question was asked on a particular ...
0
votes
2answers
36 views

Completely stripping certain HTML Tags in Django forms

I have a ModelForm that posts news items to a database, and it uses a javascript textarea to allow the authorized poster to insert certain pieces of HTML to style text, like bold and italics. However, ...
0
votes
2answers
68 views

How to remove hyperlinks, email ids, etc from a text document using regex?

I have some text documents which contains: Different types of emails addresses: I mean public domain such as gmail, yahoo, etc and private emails as well such as [email protected]... Different ...
0
votes
0answers
36 views

'cleaning' data for automated SQL insertion via php

I'm inserting data in an SQL table, via php, that is being pulled from a third party data source. Occasionally this third party source will contain some character like a single quote that will cause ...
2
votes
2answers
143 views

How can I subset rows in a data frame in R based on a vector of values?

I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a ...
0
votes
0answers
37 views

Data-cleansing trigger help for MySQL, please

I am extremely new to the MySQL environment and databases in general. I am currently working on a project for work and am having a great deal of difficulty trying create a MySQL trigger. I'm sure ...
0
votes
1answer
51 views

Data cleaning of dollar values and percentage in R

I've been searching for a number of packages in R to help me in converting dollar values to nice numerical values. I don't seem to be able to find one (in plyr package for example). The basic thing ...
0
votes
1answer
171 views

Looking for dictionary words in text file using dictionary in python

I read the how to check dictionary words And I got the idea to check my text file using dictionaries. I have read the pyenchant instructions, and I thought that if I use get_tokenizer to give me ...
1
vote
1answer
69 views

Google refine cross-reference between row and column

I'm not sure if this can be achieved in Google Refine at all. But basically, I have data like this. The first table is the table of all the users. The second table show all the friends. However, ...
0
votes
3answers
388 views

Word 2007 - Macro to clean text

I'm new to VBA and am trying to write a macro that will format some text for me. I can't seem to figure it out. This is what the original data looks like: This is sentence one of paragraph one. ...
0
votes
1answer
102 views

Code for “finding and deleting” complete strings but not substrings in R?

I am trying to find a way of quickly cleaning large datasets based on the occurrence of certain strings. I have a data.frame that looks like this: created_at actor_attributes_email type 3/11/12 ...
2
votes
1answer
113 views

fingerprinting entire iPhone music library with echoprint

I'm wondering how intensive it would be to fingerprint an iphone 4+'s entire music library with echoprint. how long should I expect it take to analyze 2-3k songs? Is this even reasonable?

1 2 3 4
15 30 50 per page