Extract Date from text in Python(Text Mining)

text_mining

#1

Hello Everyone,
I want to extract dates from a text file and sort them in ascending chronological order
Assumptions:
Assume all dates in xx/xx/xx format are mm/dd/yy
Assume all dates where year is encoded in only two digits are years from the 1900’s (e.g. 1/5/89 is January 5th, 1989)
If the day is missing (e.g. 9/2009), assume it is the first day of the month (e.g. September 1, 2009).
If the month is missing (e.g. 2010), assume it is the first of January of that year (e.g. January 1, 2010).

Here are the 10 lines of the text file.

0         03/25/93 Total time of visit (in minutes):\n
1                       6/18/85 Primary Care Doctor:\n
2    sshe plans to move as of 7/8/71 In-Home Servic...
3                7 on 9/27/75 Audit C Score Current:\n
4    2/6/96 sleep studyPain Treatment Pain Level (N...
5                    .Per 7/06/79 Movement D/O note:\n
6    4, 5/18/78 Patient's thoughts about current su...
7    10/24/89 CPT Code: 90801 - Psychiatric Diagnos...
8                          3/7/86 SOS-10 Total Score:\n
9             (4/10/71)Score-1Audit C Score Current:\n

Thank you for the response.


#2

Hi @rock_bt

Use this regex pattern “\d+/\d+/\d+”

For example,

import re
re.findall("\d+/\d+/\d+", "03/25/93 Total time of visit (in minutes):\n") 

Output: [‘03/25/93’]