Is Scipy.stats Doing Wrong Calculation For Iqr?
Solution 1:
scipy.stats.iqr
doesn't seem to follow the recursive algorithm documented in Wikipedia. Instead it simply does np.percentile(x, 75) - np.percentile(x, 25)
This is not exclusive of the median, it is inclusive, so you get (32 + 33)/2 - (25 + 28)/2 = 6
If you want to use the algorithm in wikipedia you'd need to do something like:
def iqr_(m):
m = np.array(m)
n = m.size//2
m_ = np.partition(m.ravel(), n + 1)
return np.median(m_[n + m.size%2:]) - np.median(m_[:n])
iqr_([23,25,28,28,32,33,35])
8.0
EDIT: On the talk page of wikipedia it is brought up that the algorithm presented is not definitive, and in fact the method of scipy.stats.iqr
is also acceptable. See the three methods for determining quartiles Here
Solution 2:
Daniel's answer is amazing. For me, if the length of data is even, I will use stats.iqr
, like
d = [21, 23,25,28,28,32,33,35]
# Check the length of the dataset
>>> len(d)
8
>>> Q1 = np.percentile(d, 25,interpolation='midpoint')
>>> Q3 = np.percentile(d, 75,interpolation='midpoint')
>>> Q3-Q1
8.5
# When use stats.iqr
>>> stats.iqr(d, interpolation='midpoint')
8.5
Thus, the even length of dataset can directly use stats.iqr. The odd number of dataset, we might use Daniel's method, Cuz stats.iqr
is not exclusive of the median, it is inclusive.
Post a Comment for "Is Scipy.stats Doing Wrong Calculation For Iqr?"