I am quite new to Python and I have trouble finding the roots of a function.
The ultimate goal is to find $$y=(a_1,p_0,\theta)$$ such that they minimize the function $$-\sum\limits_{j=1}^{n} \ln(p_{1j}(a_1,p_0,\theta)).$$
For each possibility of \$a_1, p_0, \theta\$, I compute \$p_{1j}\$ in the following way. I look for \$W_j\$ which solves:
$$f_2({W_j}) = \sum\limits_{i=1}^{m}\frac{1}{ \left(\dfrac{p_0}{1-p_0}\right)^{1-a_1\over a_1} \left( \dfrac{1-\exp(-\theta R_{ij})-W_j\theta}{W_j\theta-1+\exp(\theta)}\right)^{1\over a_1}}-1 = 0 $$
Then I compute \$p_{1j}\$ which equals:
$$p_{1j}(a_1,p_0,\theta)=\frac{1}{ \left(\dfrac{p_0}{1-p_0}\right)^{1-a_1\over a_1} \left( \dfrac{1-\exp(-\theta R_{1j})-W_j\theta}{W_j\theta-1+\exp(\theta)}\right)^{1\over a_1}} $$
I do this procedure for each \$j\$ and can then compute the function I want to minimize.
Here is what the data look like, each line is a different \$j\$:
R1 R2 R3 R4 R5 R6 n
1 30.3 4.1 10.2 2.5 10.8 6
0.9 10.4 4.1 6.3 3.3 NaN 5
So the sum until \$m\$ can go from 1 to 5 until 1 to 6 depending on \$j\$. In reality the data is much bigger (the number of \$j\$ is like 25,000 and \$m\$ goes up to 24).
from __future__ import division
import numpy as np
from itertools import izip
import datetime
import pandas
import scipy
from scipy import optimize
import timing
import matplotlib.pyplot as plt
# import data from csv and take a look
# Function f2
def f2(x, a1, p0, atheta, odd):
term = 1/(1+(((p0/(1-p0))**(1-a1))*((1-np.exp(-atheta*odd)-x)/(x-1+np.exp(atheta))))**(1/a1))
return np.sum(term) - 1
# find w and return the function to minimize
def f(y):
a1, p0, atheta = y[:3]
ns = data['n'].values.astype('int')
vars = data.loc[:, ('R1', 'R2', 'R3', 'R4', 'R5', 'R6')].values
lf = 0
for n, var in izip(ns, vars):
if n == 6:
odd = var
elif n<6:
odd = var[:n]
A = min(1-np.exp(atheta),1-np.exp(-atheta*min(odd)))
B = max(1-np.exp(atheta),1-np.exp(-atheta*min(odd)))
print (a1,p0,atheta)
odd= np.array(odd, dtype=np.float)
if -f2(1-np.exp(-atheta*min(odd)), a1, p0, atheta, odd)>0 :
break
else:
sol = scipy.optimize.brentq(f2, A, B, args=(a1,p0,atheta,odd))
p1 = 1/(1+(((p0/(1-p0))**(1-a1))*((1-np.exp(-atheta*odd[0])-sol)/(sol-1+np.exp(atheta))))**(1/a1))
lf += np.log(p1)
print -lf
return -lf
# bounds of a1, p0, atheta
bnds= ((0,None),(0,1),(None,None))
# this is the minimisation. It gives back p0, a1 and atheta
def main():
x0 = [1,0.5,-0.01]
result = scipy.optimize.minimize(f, x0, method = 'SLSQP', bounds=bnds)
return result
if __name__ == "__main__":
print main()
When \$a_1\$ is close to zero \$f_2\$ behaves strangely because its denominator is very big. The same when \$\theta\$ is big or \$p0\$ is close to 1.
How can I do the same thing in a better way?