Skip to content Skip to sidebar Skip to footer

Is Scipy.stats Doing Wrong Calculation For Iqr?

i am coding on a dataset [23,25,28,28,32,33,35] according to wiki and scipy doc IQR = Q3 − Q1 = 33 - 25 = 8 when I run IQR on a dataset, the result (6) is not as expected (8). I

Solution 1:

scipy.stats.iqr doesn't seem to follow the recursive algorithm documented in Wikipedia. Instead it simply does np.percentile(x, 75) - np.percentile(x, 25) This is not exclusive of the median, it is inclusive, so you get (32 + 33)/2 - (25 + 28)/2 = 6

If you want to use the algorithm in wikipedia you'd need to do something like:

def iqr_(m):
    m = np.array(m)
    n = m.size//2
    m_ = np.partition(m.ravel(), n + 1)
    return np.median(m_[n + m.size%2:]) - np.median(m_[:n])

iqr_([23,25,28,28,32,33,35])
8.0

EDIT: On the talk page of wikipedia it is brought up that the algorithm presented is not definitive, and in fact the method of scipy.stats.iqr is also acceptable. See the three methods for determining quartiles Here


Solution 2:

Daniel's answer is amazing. For me, if the length of data is even, I will use stats.iqr, like

d = [21, 23,25,28,28,32,33,35]
# Check the length of the dataset
>>> len(d)
8
>>> Q1 = np.percentile(d, 25,interpolation='midpoint')
>>> Q3 = np.percentile(d, 75,interpolation='midpoint')
>>> Q3-Q1
8.5
# When use stats.iqr
>>> stats.iqr(d, interpolation='midpoint')
8.5

Thus, the even length of dataset can directly use stats.iqr. The odd number of dataset, we might use Daniel's method, Cuz stats.iqr is not exclusive of the median, it is inclusive.


Post a Comment for "Is Scipy.stats Doing Wrong Calculation For Iqr?"