Keep only the rows with latest dates in dataframe

data_frame
machine_learning

#1

Hi,
I have a requirement that consider the dataframe has the values

 Sheet       Updated date    name
SOURCE       12/11/2018      test
SOURCE_RDH   12/11/2018      test1
SOURCE       12/13/2018      test
SOURCE_RDH   12/13/2018      test1
SOURCE       12/15/2018      test

Now I need only the rows with date which is latest in dataframe

Sheet        Updated date      name
SOURCE       12/15/2018        test
SOURCE_RDH   12,13,2018,       test1

Any leads

Thanks in advance
Sachin


#2

Hi sachin,
My idea is

  1. convert “Updated date” column into datetime type with pd.to_datetime()
  2. sort the datetime (asc/desc both work)
  3. groupby “name”
  4. take last/first value of each group depending on asc/desc in step 2

#3

Hi @sachin123456,

I think @kagglehan has answered your question, I’d like to add one thing to the above answer. Suppose you want the dataframe to have dates after a certain point, let’s say after 2/15/2018 Then you can create a loop that compares each row and if the date is greater than 2/15/2018, store it in a new dataframe.

(PS: the comparison between dates is possible only after you have converted it to the datetime format.)