python strings to int/float in an efficient way [on hold]

Question

I have an array (in numpy, or in pandas) containing (non-unique) strings. Some of them are ints written as strings, some comprise of both digits and letters. What I would like to do is to map these strings onto (some) int or float values, in order to process them further.

I don't mean simple int(string,base). I mean a procedure that would, say go through all the strings, and then say "Aha, so lets's assign to this string such and such 'int/float-key'".

What's the most efficient way of doing that?

How would getting an int or float from a string containing digits and letters work? Ignore the letters? Parse them in some way? You haven't told us enough to answer this. Also, you should show us your current code and where it fails (doesn't produce the result you need, or throws and exception).
@Lattyware At a stretch some of them are ints written as strings could even cover "twelve" :)
It's not clear form your question if you're asking how to convert a string to an int or how to get a unique integer for each arbitrary string. For example, let's say you have ['1', 'a5', 'cde9', '1', 'cde9']. Do you want the result to be [1, 5, 9, 1, 9] or [0, 1, 2, 0, 2]?
@SimonRighley - Sorry, the edits are still unclear. Can you give a concrete example?

Joe Kington · Accepted Answer · 2013-06-26 17:14:51Z

It sounds like you have a pandas DataFrame with various strings that you want to convert to indexed values such that each unique string has a unique integer value.

numpy.unique does what you need. (You already mentioned that you were using numpy, so I'm going to post a numpy solution.)

For example:

import numpy as np
import pandas

df = pandas.DataFrame(dict(x=['1', 'a5', 'cde9', '1', 'cde9']))

unique_vals, df['keys'] = np.unique(df.x, return_inverse=True)

print df

Brian · Answer 2 · 2013-06-26 17:24:44Z

In case anyone viewing this has a similar need but with a normal list of strings like:

x = ['1', 'a5', 'cde9', '1', 'cde9']

You can use a dictionary comprehension to build a dictionary mapping strings to a unique id like so:

x_set = set(x)
dict = {z:id for z,id in zip(x_set,range(len(x_set)))}

set(x) gets you the unique values in x and range(len(x_set)) provides unique ids from 0 through len(x_set)-1. Use any sequence of ids you want.

Example:

>>> x = ['1', 'a5', 'cde9', '1', 'cde9']
>>> x_set = set(x)
>>> x_set
set(['1', 'cde9', 'a5'])
>>> dict = {z:id for z,id in zip(x_set,range(len(x_set)))}
>>> dict
{'1': 0, 'cde9': 1, 'a5': 2}

asked	5 days ago
viewed	64 times
active	5 days ago

python strings to int/float in an efficient way [on hold]

put on hold as unclear what you're asking by Lattyware, Gerrat, Henry Keiter, C. Ross, Graviton 22 hours ago

2 Answers

Not the answer you're looking for? Browse other questions tagged python string int type-conversion or ask your own question.

Community Bulletin

python strings to int/float in an efficient way [on hold]

put on hold as unclear what you're asking by Lattyware, Gerrat, Henry Keiter, C. Ross, Graviton 22 hours ago

2 Answers

Not the answer you're looking for? Browse other questions tagged python string int type-conversion or ask your own question.

Community Bulletin

Related