Regarding Date Variable in Redate your Data

date
categorical
hackathon

#1

Hi

This is regarding the contest redate your data. http://datahack.analyticsvidhya.com/contest/re-date-your-data-learning-contest

Here there are 2 files train.csv and test.csv , In these files there is a date variable “Earliest_Start_Date”

In both train and test csv files, I notice that after certain no of rows the date column becomes an integer, there are values like this 42006, 41986, 42111 etc. This happens in both the train and test files,

What does this mean? How can date column have any values other than dd-mm-yyyy ?
And how do you deal with date column? Do I convert it into an integer? And what does these values, 42004, 41989 etc. mean? Why do they occur in the date column ?


#2

That is a formatting issue in the Original dataset. Check out the note on this link http://datahack.analyticsvidhya.com/contest/re-date-your-data-learning-contest#seca
about how to resolve it in Excel itself