Hi everyone! I’m new to Data Science and this community.
I have metrics(say, Lines of Code), concerning a project that lasted for 15 years=Jan 2000-Oct 2015. I also have a ‘date’ column in the format YYYY-MM-DD. How do I build the testing and training data set (TATDS, for brevity ) in R , using:
i) 50:50 ratio ,i.e divide into TATDS, using exactly the middle of the split date, about 7.5 years since the start
ii) 80:20 ratio , i.e the testing and training data should comprise the 1st 80% and last 20% of the data respectively.
Your help is much appreciated!!
Hi @shaw38,

Let me call your dataset “df” (Assuming that it is of class data frame)

i) split_data <- 1:(nrow(df)/2)
training_data <- df[split_data,]
testing data <- df[-split_data,]

ii) split_data <- 1:(nrow(df)*0.8)
training_data <- df[split_data,]
testing_data <- df[-split_data,]

Hope this helps.