Typo error in code of article "Build a Recurrent Neural Network from Scratch in Python – An Essential Read for Data Scientists"

I was trying to implement an RNN for series forecasting by reading your very well written article (https://www.analyticsvidhya.com/blog/2019/01/fundamentals-deep-learning-recurrent-neural-networks-scratch-python/) and noticed that there is a possible error in Step 2.1 in line:

loss = loss / float(y.shape[0])

where I believe it should have been:

loss = loss / float(Y.shape[0])

also in Step 2.2 line:

val_loss = val_loss / float(y.shape[0])

I believe should have been:

val_loss = val_loss / float(Y_val.shape[0])

I hope I got it right!

Respectfully yours,


Thanks for starting this thread. I have several questions regarding the code written in the above mentioned article (Here: https://www.analyticsvidhya.com/blog/2019/01/fundamentals-deep-learning-recurrent-neural-networks-scratch-python/)

dV_t = np.dot(dmulv, np.transpose(layers[t]['s']))
            dsv = np.dot(np.transpose(V), dmulv)
            ds = dsv*
            dadd = add * (1 - add) * ds*

In the above segment of code, I understand the term add(1-add) is the derivative of the sigmoid function but the derivative of a sigmoid function S(x) is S(x)(1-S(x)) but in this case, add is the pre-sigmoid output of hidden layer. I cannot understand why they use add and not s (which is sigmoid activation applied output)?

Another question on the same piece of code is the use of add (which is just the last pre-activation output of the previous loop) instead of using the output at time t in the new loop. I am not sure if I am able to explain this clearly.

Secondly, In the part where back propagation in time is performed,

for i in range(t-1, max(-1, t-bptt_truncate-1), -1):
                ds = dsv + dprev_s
                dadd = add * (1 - add) * ds

                dmulw = dadd * np.ones_like(mulw)
                dmulu = dadd * np.ones_like(mulu)

                dW_i = np.dot(W, layers[t]['prev_s'])
                dprev_s = np.dot(np.transpose(W), dmulw)

                new_input = np.zeros(x.shape)
                new_input[t] = x[t]
                dU_i = np.dot(U, new_input)
                dx = np.dot(np.transpose(U), dmulu)

                dU_t += dU_i
                dW_t += dW_i

Why is the input always x[t] and not x[i]? As I understand, shouldn’t the input be of the previous time step in the backpropagation? I think the same applied to the the line,

        dW_i = np.dot(W, layers[t]['prev_s'])

Lastly, running the algorithm repeatedly, the predictions seem to change (some variation is, of course, expected due to the randomness of the weights). The change is not just mild change, the phase of the wave also changes and so so does the amplitude. I do not know if someone here tried to run the algorithm several times (like 5-6 times)

Can anyone please guide me as to what I am understanding incorrectly (if I am)?

Thanks for your time.

© Copyright 2013-2020 Analytics Vidhya