Skip to content Skip to sidebar Skip to footer

After Using Parsedatetime To Get A Time Structure From The Input String, How Does One Slice The Rest Of The String Out?

I'm wondering how to use parsedatetime for Python to return both the timestruct and the rest of the input string with just the date/time input removed. Example: import parsedatetim

Solution 1:

The only method of Calendar that returns that info is nlp() (which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:

import parsedatetime

calendar = parsedatetime.Calendar()

def parse(string, source_time = None):
    ret = []
    parsed_parts = calendar.nlp(string, source_time)
    if parsed_parts:
        last_stop = 0for part in parsed_parts:
            dt, status, start, stop, segment = part
            if start > last_stop:
                ret.append((None, 0, string[last_stop:start]))
            ret.append((dt, status, segment))
            last_stop = stop
        iflen(string) > last_stop:
            ret.append((None, 0, string[last_stop:]))
    return ret

for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

Output:

Soccer with@homiesat Payne Whitney tomorrow at2 pm to4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
(None, 0, '!')

Soccer with@homiesat Payne Whitney tomorrow starting at2 pm to4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
(None, 0, '!')

Soccer with@homiesat Payne Whitney tomorrow starting at3 pm to5 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
(None, 0, ' to ')
(datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
(None, 0, '!')

The status tells you whether the associated datetime is actually a date (1), a time (2), a datetime (3) or neither (0). In the first two cases, the missing fields are taken from the source_time, or from the current time if that is None.

But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.

An alternative library is dateparser. It looks more powerful, but has its own problems. The dateparser.parse.search_dates() function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates() to yield an output similar to the above, but without the status of each part:

from dateparser.search import search_dates

def parse(string: str):
    ret = []
    parsed_parts = search_dates(string)
    if parsed_parts:
        last_stop = 0for part in parsed_parts:
            segment, dt = part
            start = string.find(segment, last_stop)
            stop = start + len(segment)
            if start > last_stop:
                ret.append((None, string[last_stop:start]))
            ret.append((dt, segment))
            last_stop = stop
        iflen(string) > last_stop:
            ret.append((None, string[last_stop:]))
    return ret


for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
          "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
    print()
    print(s)
    result = parse(s)
    for part in result:
        print(part)

Output:

Soccer with@homiesat Payne Whitney tomorrow at2 pm to4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with@homiesat Payne Whitney tomorrow starting at2 pm to4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')

Soccer with@homiesat Payne Whitney tomorrow starting at3 pm to5 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
(None, '!')

I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime is a problem.

Post a Comment for "After Using Parsedatetime To Get A Time Structure From The Input String, How Does One Slice The Rest Of The String Out?"