Seeking Ideas for Data Preparation for unique dataset

I am now working on a publicly available data set. This data has mortgage payment details of various customers for multiple years. Hence we have multiple rows of the same customer with some variables constant (like variable values at the time of origination) over all the corresponding rows and some variables like balance_time, time, LTV_time etc which changes.

How do we prepare this data for a model to be built. One way is to add columns for varying variables for individual customers but the no. of rows that sees changes vary from customer to customer. Also by doing this, the no. of columns will become too many.

The other alternative is to use the latest row for each customer but with that there is danger of losing information.

Please advise.

Link to where the data is available is

The mortgage tab/dataset is the one I am referring to.


It really depends on what you are trying to predict here. Do you want to predict the probability of a default on the mortgage payment?

Yes. Also do a survival analysis.

© Copyright 2013-2019 Analytics Vidhya