Gradient Descent problem in Titanic dataset



Hi Everyone,

I am trying to implement gradient descent without regularization in titanic dataset using reference of Andrew NG’s machine learning course.

Here is the cost function given by him for logistic regression -

J(theta) = -ylog(h(x)) - (1-y)log(1-h(x)) (summed over all records)

Gradient descent algo -

thetaj = thetaj - (alpha/#records)(h(x) - y)(xj) (j=1…all features)

h(x) = 1/1+e(-np.multiple(thetaTranspose,X))

Following is my implementation in python3 -

iteration = 1000
thetanew = np.random.randint(0,10,size=(titanicTrain_gradient_X().shape[0],1))
theta = thetanew
errorLog = np.empty(iteration)
alpha = 0.0001
epsilon = 0.1
for i in range(iteration):
    thetaX =,titanicTrain_gradient_X())
    eThetaX = expit(-thetaX)
    denom = np.add(1,eThetaX)
    hx = np.divide(1,denom)
    diff = np.subtract(hx,titanicTrain_gradient_Y.transpose())    
    derivative =,titanicTrain_gradient_X().transpose())
    theta = np.subtract(thetanew, np.multiply((alpha/titanicTrain_gradient_X().shape[1]),derivative.transpose()))
    thetaXNew =,titanicTrain_gradient_X())
    eThetaXNew = expit(-thetaXNew)
    denomNew = np.add(1,eThetaXNew)
    hxNew = np.divide(1,denomNew)
   #To avoid divide by zero error in log
    hxNewLogSafe = np.subtract(hxNew,epsilon)
   # print (hxNew)
    costerror = np.divide(np.add(np.multiply(titanicTrain_gradient_Y.transpose(),np.log(hxNewLogSafe)),
    #print (costerror)
    errorLog[i] = costerror.sum()
    thetanew = theta

However when I plot cost error and iteration, I don’t get a consistent curve when I execute this code multiple times. Sometimes, the error increases and sometimes it decreases with each iteration. Below are some plots of the same-
1 2

Can anyone suggest me what is going wrong here? Its supposed to decrease with every iteration. I tried different values of alpha (0.1,0.01,.001,.0001 etc) but no difference.


I just noticed, when I initialize theta with random values between (0,1) then I am getting consistent curve (but NOT converging) but when I increase the range beyond 1 then again same issue.Any thoughts?4


Hi @harshitmohan!

You are right. In the actual algorithm, the curves will be different for different instances. And the intial cost totally depends on the random number. However, when implementing this in production a random seed , say random seed = 10 is set so that the results remain consistent.

Again, the results may vary for GBM trees as the random number generation happens at the splits too. To get consistent results you too can set random seed with np.random.seed(13). Let us know what the results are like with this change. Thanks!


Thanks @Shaz13. I missed this point and you really brought me peace by pointing it out! Thank you so much for that.
Now I see a consistent curve but the cost function is ever increasing no matter what alpha or initial theta I choose.
Here are the columns for my titanicTrain_gradient_X -

[‘Fare’, ‘Pclass__1’, ‘Pclass__2’, ‘Sex__female’, ‘SibSp__0’, ‘SibSp__1’,
‘SibSp__2’, ‘SibSp__3’, ‘SibSp__4’, ‘Parch__0’, ‘Parch__1’, ‘Parch__2’,
‘Parch__3’, ‘Embarked__C’, ‘Embarked__Q’]

I also imputed missing values with median and did outlier treatment. What could be going wrong here? Any suggestions?


Glad you found it helpful @harshitmohan
Its really hard to debug this. As this is a raw implementation a single line can make great changes in results. But, for staters I advise you to implement the same in modules. Alike. for thetaX() --> write a function down and use in for loop. This way you could debug faster. Also, try doing costs experiments on only first 5-10 rows. So, that you can calculate mathematically and see if the functions are returning the same or near to numbers.


Thanks for tips Shaz13. I figured out the issue and now gradient descent is converging nicely. However, when I test my algorithm I get only 69% accuracy whereas simple logistic regression model gives 79%. I havent done any regularization yet.
Is this a normal scenario that gradient descent(GD) underperforms compared to python’s library implementation of logistic regression?
Just to highlight, titanic dataset is too less to sufficiently train any algorithm. Could this be another reason behind less than expected performance of GD?