I am working on retail data where I have to cluster customers based on their behavior. In the data set I have approx 56800 rows, where 34228 values are missing for Transaction date & time. Rest all columns doesn’t have missing values. Please suggest how to proceed with this problem set.


Your data has a very high number of missing values (60%). I would reconcile using this variable or trying to get the data in another way.

Usually to handle the date they are converted to number. That is to say calculate the number of seconds (or minutes) that passed from a base date, such as 1-Jan-1900 and then manipulate it … estimating a number is much easier. Example:
bdate <- mdy_hms(“01/01/1900 00:00:01”)
myDF$Date.InSeconds <- as.integer(seconds(MyDF$MYDATE)) - as.integer(seconds(bdate))


