I am now working on a publicly available data set. This data has mortgage payment details of various customers for multiple years. Hence we have multiple rows of the same customer with some variables constant (like variable values at the time of origination) over all the corresponding rows and some variables like balance_time, time, LTV_time etc which changes.
How do we prepare this data for a model to be built. One way is to add columns for varying variables for individual customers but the no. of rows that sees changes vary from customer to customer. Also by doing this, the no. of columns will become too many.
The other alternative is to use the latest row for each customer but with that there is danger of losing information.
Link to where the data is available is http://www.creditriskanalytics.net/datasets-private.html
The mortgage tab/dataset is the one I am referring to.