I'm trying to extract either the first (or only) floating point or integer from strings like these:
str1 = np.asarray('92834.1alksjdhaklsjh')
str2 = np.asarray'-987___-')
str3 = np.asarray'-234234.alskjhdasd')
where, if parsed correctly, we should get
var1 = 92834.1 #float
var2 = -987 #int
var3 = -234234.0 #float
Using the "masking" property of numpy arrays I come up with something like for any of the str_
variables, e.g.:
>> ma1 = np.asarray([not str.isalpha(c) for c in str1.tostring()],dtype=bool)
array([ True, True, True, True, True, True, True, False, False,
False, False, False, False, False, False, False, False, False,
False, False], dtype=bool)
>> str1[ma1]
IndexError: too many indeces for array
Now I've read just about everything I can find about indexing using boolean arrays; but I can't get it to work.
It's simple enough that I don't think hunkering down to figure out a regex for is worth it, but complex enough that it's been giving me trouble.
''.join([c for c in s if not c.isalpha()])
. But please note this in no way takes out the first float/int if there are multiple places where digits exist in the string.^.*?([+-]?\d*\.?\d+)
regex here. Does it work for you?-234234
is an int, not a float. You asked to extract either integer or floats. If you only need floats, use Kasra's version.