Skip to content Skip to sidebar Skip to footer

How To Calculate Number Of Dates Within A Year Of A Date In Pandas

I have the following dataframe and I need to calculate the amount of ER visit Dates with a score of 1 that are one year after the PheneDate for that pheneDate for a given subject.

Solution 1:

try this:

import pandas as pd
import numpy as np
from io import StringIO

inputtxt = StringIO("""
PheneVisit  PheneDate   Score   ER Date    SubjectID
N/A             N/A     0       10/25/05   phchp003
phchp003v1  11/23/05    0       N/A         phchp003
N/A             N/A     1       5/5/06     phchp003
phchp003v2  5/10/06     0       N/A        phchp003
N/A             N/A     0       6/22/06    phchp003
N/A             N/A     1       8/5/06     phchp003
phchp003v4  2/7/14      0       N/A        phchp003
N/A             N/A     1       10/13/14   phchp003
N/A             N/A     0       2/15/15    phchp003
N/A             N/A     1       8/14/15    phchp003
phchp004v2  4/27/12     0       N/A        phchp004
phchp004v3  8/15/12     0       N/A        phchp004
N/A             N/A     1       5/18/13    phchp004
N/A             N/A     0       6/21/13    phchp004
phchp004v4  6/3/15      0       N/A        phchp004
N/A             N/A     0       8/27/15    phchp004
N/A             N/A     1       9/3/15     phchp004
N/A             N/A     1       8/22/16    phchp004
N/A             N/A     1       11/19/16   phchp004
phchp005v1  2/8/06      0       N/A        phchp005
N/A             N/A     1       3/24/06    phchp005
N/A             N/A     1       4/16/06    phchp005
N/A             N/A     1       4/25/06    phchp005
N/A             N/A     1       5/18/06    phchp005
N/A             N/A     0       5/25/06    phchp005
N/A             N/A     0       6/2/06     phchp005
""")

df = pd.read_csv(inputtxt, sep='\s\s+', engine='python')

df['PheneDate'] = pd.to_datetime(df['PheneDate'], format='%m/%d/%y')

df['ER Date'] = pd.to_datetime(df['ER Date'], format='%m/%d/%y')

df['pi'] = pd.IntervalIndex.from_arrays(df['PheneDate'], df['PheneDate'] + pd.DateOffset(years=1))
df
def f(x):
    x = x.set_index('pi')
    x['Number of First Year'] = np.sum(np.vstack([x.index.contains(i) for i in x.loc[x['Score'] == 1, 'ER Date']]), 0)
    return x.reset_index(drop=True)

df.groupby('SubjectID').apply(f).groupby('PheneVisit')['Number of First Year'].transform('sum')

Output:

SubjectID   
phchp003   0    NaN
           1    2.0
           2    NaN
           3    1.0
           4    NaN
           5    NaN
           6    1.0
           7    NaN
           8    NaN
           9    NaN
phchp004   0    0.0
           1    1.0
           2    NaN
           3    NaN
           4    1.0
           5    NaN
           6    NaN
           7    NaN
           8    NaN
phchp005   0    4.0
           1    NaN
           2    NaN
           3    NaN
           4    NaN
           5    NaN
           6    NaN
Name: Number of First Year, dtype: float64

Post a Comment for "How To Calculate Number Of Dates Within A Year Of A Date In Pandas"