let’s suppose my data has 1000 sentences, to make it easy all are equal length and hence the equal timestamp of 10. Now, I have 10 batches with each batch state is reset before taking the next batch. if I reset the hidden and state cells of the model for every batch (stateful = I reset after every 100 lines of training or stateless it does that by itself using Keras), now how does the previously learned hidden states of the first batch are helping the model to learn after reset? so how does iterative training is helping? No weights / hidden states/cells/ gates are being reused in the 2nd batch. which means the second batch is fresh learning starting from square zero. in such case how does learning improve over batches and then over epochs?