Implementation of Logistic Regression

Question

Is this kind of vectorized operations the most efficient way to do this in matlab? Any critics about my code? Am I doing something wrong (i tested several times, I think it works). Notice that I use J to store the history of the cost function so I can see how well it is converging (by plotting a graph for instance).

main function

function [theta, J_history] = logRegGradientDescent(X, y, theta, alpha, num_iters)
% Given a matrix X where the columns are features and a matrix Y of targets
% we apply the gradientDescent to minimize the cost function and find its
% local optimum. Alpha is the learning rate on which we look for a local
% minimum and num_iters is the amount of times we repeat the learning step.

J_history = zeros(num_iters);

for iter = 1:num_iters

    % Derivative of the cost function used, the square error in that case.
    dLogisticCostFunction = (1/m) * X' * (logisticFunction(X,theta) - y);

    % Learning step
    theta = theta - alpha * dLogisticCostFunction;

    % Save the cost function for convergence analysis
    J_history(iter) = logRegCostFunction(X,y,theta);
end
end

logistic function

function h = logisticFunction(X,theta)
% Compute the logistic function.
% If X is a matrix such as:
%
%    x1_ x2_ x3_ .. xn_;
%  [ x11 x12 x13 .. x1n;
%    x21 x22 x23 .. x2n;
%    ..  ..  ..  .. .. ;
%    xn1 xn2 xn3 .. xnn; ]
%
% and thetha' is a vector:
%  [ t0, t1, t3 .. tn ]
%
% We calculate the logistic function:
% 1/ ( 1 + e^(-sum(x*theta)))

h = 1 ./ ( 1  + exp(-X*theta) );
end

logistic cost function

function J = logRegCostFunction(X,y,theta)
% Compute a convex cost function to the Logistical Regression where
% if y = 1 and the logistic function predicts y = 0, cost -> inf
% and if y = 0 and the logistic fucntion predicts y = 1, cost -> inf

% Calculates number of Features
m = length(y);

% Calculates the case where if y = 1, Cost = -log(h(x)) 
ify1 =  log(logisticFunction(X,theta)).*y;

% Calculates the case where if y = 0, Cost = -log(1-h(x)) 
ify0 =  log(1 - logisticFunction(X,theta)).*(y-1);

% Calculates the cost function
J = - (ify1 + ify0) / m;
end

DJanssens · Accepted Answer · 2015-04-13 18:00:39Z

Your code looks fine and is vectorized! You could've written a single line logistic cost function, but I believe your approach is more readable. Good job!

I don't think there is much more optimizations that you can do related to the basic form of logistic regression.

A possible addition however is to add regularization. This helps for the scenario of overfitting, where a high order polynomial function tries to fit perfectly according to the dataset in such a way that it doesn't make sense anymore.

An example: enter image description here

regularization tries to prevent this by reducing the magnitude of the parameters θ (penalizing the parameters). You will end up with a smoother curve which fits the data and gives a better hypothesis.

To accomplish this you will have to introduce another parameter, λ (regularization parameter).

I suggest you to look further into this yourself, perhaps here is a good starting point.

I hope this helps a bit.

asked	7 months ago
viewed	45 times
active	4 months ago

current community

your communities

more stack exchange communities

Implementation of Logistic Regression

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged performance matlab machine-learning or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Implementation of Logistic Regression

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged performance matlab machine-learning or ask your own question.

Related

Hot Network Questions