python pandas overflow error dataFrame

Question

I am rather new to python and I am using the pandas library to work with data frames. I created a function "elevate_power" that reads in a data frame with one column of floating point values (example x) , and a degree (example lambda), and outputs a dataframe where each column contains a power of the original column (example: output is x,x^2,x^3)

The problem is when I have a degree that is above 30, I get overflow error. Is there a way around this problem ?

I am not particularly worried about the precision, so I would not mind loosing some precision.

However, (and this is important), I need the output to be type float because I then call some numpy subroutines that give me errors if I change the type.

I have tried several tricks: for example I tried using decimal inside the function but then I cannot get the format back to floats, which is a problem because then I get errors when I call dot product and linear algebra solvers from numpy.

Any suggestion will be greatly appreciated,

This is the test code (which I ran with a low degree value so it won't crash):

def elevate_power(column, degree):
    df = pd.DataFrame(column)
    dfbase=df
    if degree > 0:
        for power in range(2, degree+1): 
            # first we'll give the column a name:
            name = 'power_' + str(power)
            df[name]= 0           
            df[name] = dfbase.apply(lambda x: x**power , axis=1)
    return(df)

   import pandas as pd
   import numpy as np
   test= pd.Series([1., 2., 3.])
   test2=pd.DataFrame(test)
   degree=5
   print elevate_power(test2, degree )
   np.dot(test2['power_2'],test2['power_3'])

The printout is :

   0  power_2  power_3  power_4  power_5
0  1        1        1        1        1
1  2        4        8       16       32
2  3        9       27       81      243

276.0

For the overflow I recommend to use the identity exp(ln(a^b)) = exp(b * ln(a)). Hence compute the matrix b*ln(a). Later you take the elementwise exponential if you really have to... — tschm
– tschm, Commented Mar 18, 2016 at 14:52

tschm · Accepted Answer · 2016-03-18 14:49:43Z

1

How about

import pandas as pd
import numpy as np
series = [1., 2., 3.]
degree = 5

a = pd.DataFrame({"power_" + str(power): np.power(series, power) for power in range(1, degree+1)})
print(a)
print(a.dtypes)

results in floats for me

answered Mar 18, 2016 at 14:49

tschm

2,9657 gold badges37 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user3177938 Over a year ago

Thanks, that works perfectly well! i can increase the degree, and there is no overflow error. Actually, I don't really understand why the overflow error disappeared with this way. But I am happy it is works. Thank you very much.

user3177938 Over a year ago

One last question, though Is it possible to do it in a loop; because I noticed the order of the dataframe is power_1 power_10 power_11... .. power_2 power_21 instead of power_1 power_2 power_3 ... power_10 That means I would need to do some kind of sort now, because I do need them in the true order how they came in

tschm Over a year ago

Well, you can get away with str(power).zfill(2)

tschm Over a year ago

Are you doing this to fit polynomials of degree n to your data? There is np.vander. Watch out for this. This will construct your matrix....

user3177938 Over a year ago

the .zfill(2) did the trick. Thanks again! Yes, I'm fitting polynomials. And I just looked up the fuction vander you mentioned. This is indeed what I am doing. Thanks for pointing that out.

|

Collectives™ on Stack Overflow

python pandas overflow error dataFrame

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related