Python, Statistics: The Anderson-Darling normality test.
We have already shown the Jarque-Berra test for normality. That test is quite simple to use requiring only the ability to compute the Skew and Kurtosis for the sample. Another popular test is the Anderson-Darling test which has been found by experimenters to be the most sensitive test among others. It is normally used for sample sizes less than 25 but has been found to perform fairly well even for sample sizes as large as 200.
Wikipedia has a nice explanation, in fact we implemented one of the fomulas for computing the Anderson Darling test.
Here is the Python code including a sample test routine for significance levels in [0.10, 0.05, 0.025, 0.01].
The function adstatistic(X) computes the Anderson Darling statistic given the sample X. The sample is standardized or converted to z-scores and sorted in a newvector Y.
The AndersonDarlingTest performs the test, given the sample X and the significance level alpha. If alpha is NOT in the standard values [0.10, 0.05, 0.025, 0.01], it will raise an error. Otherswise it will return the
result of comparison A-D test statistic
critical value. If it returns True, the Null hypothesis that the sample is normal is NOT rejected, other wise, the Null hypothesis is rejected and the one sided alternative that the sample is not normal is accepted.
The routines depend on the modules scipy and numpy. The function to compute the cdf of the normal probability function is given by scipy.stats.norm.cdf. In the test portion of our Python module, we use the
scipy.stats.norm.rvs function to generate a normal random variable.
# -*- coding: utf-8 -*- """ File andersondarling.py author Dr. Ernesto P. Adorio UPDEPP, UP Clarkfield ernesto.adorio @ gmail. com revisions oct. 26, 2009 Version 0.0.1 release license Citation requested when using this work in research. """ import scipy.stats as stat from numpy import mean, var # you may use the already posted version of mean and variance Python routines. from math import * pnorm = stat.norm.cdf def adstatistic(X): """ Returns the Anderson darling test statistic. """ n = len(X) Y = X[:] ybar = mean(Y) yvar = var(Y) ysd = sqrt(yvar) Y = [(y- ybar)/ysd for y in Y] A2 = -n S =0.0 Y.sort() # don't forget this!!! for i, y in enumerate(Y): j = i+1 p = pnorm(y) q = 1- p S += (j+j - 1)*log(p)+ (2 *(n-j)+1)* log(q) A2 -= S/n A2 *= (1.0 + 4.0/n - 25.0/n**2) return A2 def AndersonDarlingTest(X, alpha = 0.05): alphas = [0.10, 0.05, 0.025, 0.01] critvalue = [0.632,0.751, 0.870, 1.029] try: for i, a in enumerate(alphas): if abs(alpha - a) < 1.0e-4: crit = critvalue[i] teststat = adstatistic(X) print teststat, crit return teststat < crit except: raise Exception("Signifance level not in return None if __name__ == "__main__": print pnorm(3.0) n = 10 X = [ stat.norm.rvs() for i in range(n)] print AndersonDarlingTest(X, alpha = 0.05) |
For further references, visit the wikipedia citation above and the following
NIST Statistics Handbook
It is to be noted that scipy already comes with Anderson Darling tests not only for normal, but also for
Gumbel, exponential distributions. You may find various statistical tests in the scipy.stats.morestats module.
Fo more details visit Scipy.stats.morestats
The reason we are posting the above code is for teaching purposes only.
We will start incorporating more of Scipy in the future!







