Measures of central location

There are many measures of central location and the most well known is of course the arithmetic mean \overline {x}=\frac{\sum x_i}{n}. This mean depends on all data values and is therefore sensitive to extreme values.

Other means,together with the arithmetic mean, with their formulas are shown in the following table. The lower and upper limits of the index of summation are not shown explicitly.

\begin{array}{ll}HM= \frac{n}{\sum \frac{1}{x_i}} & harmonic\, mean \\ GM= (\Pi_{i = 1}^n x_i)^ {1/n}    & geometric\, mean \\ AM= \frac{\sum x_i}{n}               & arithmetic\, mean \\RMS= \sqrt{\frac{x_i^2}{n}}        &root mean\, square \\\end {array}

These means satisfy the order relation HM \le GM \le AM \le RMS.

Your statistics teacher will explain the specific problem domain for using the various means above.

As additional measures of central location, the median and mode also find use. Here is a way to determine the median:
Let the starting index be one(adjustment necessary for zero based indexing). The median is computed as

odd sample size n   Q_2 = X_{med}= X_{(n+1)/2}
even sample size n   Q_2 = X_{med}= \frac{X_{{n/2}} + X_{(n/2)+1}} {2}

The mode is another simple mean and may be applied to categorical data. It is the most “popular” value and may or may not exist for all samples: Consider \{M, M, M, F, F, F\}. as a sample of the gender of 6 students. what is the mode? One can say that both M and F are modes or may not exist at all. Ask your teacher what rules to adopt for this not so uncommon case. For unimodal data, the mean, mode, and median are related :Mean-Mode= 3(mean-Median)

Here are other means, designed to complement the arithmetic mean sensisitivity to extreme values.
trimmed mean: a certain percentage of the sample values are taken out at the end of the sample values.
mid interquartile values: this is defined as \frac{(Q_1 + 2 Q_2 + Q_3)}{ 4} where the values Q_1, Q_2, Q_3 are the first, second(median) and third quartiles.

Sone softwares define Q_0 and Q_4 as the minimum and maximum values of all sample values respectively, and the five numbers form the values of the five-number summary for the sample.

Arrange the numbers in increasing order. The value of Q_0 is at X_1 and the value of Q_4 is at X_n. The determination of the median Q_2 has been described before. All that is left is the determination of Q_1 and Q_3. To determine these values:
For Q_1, compute L_{0.25} = n(0.25).
Similarly for Q_2, compute L_{0.75} = n(0.75).
If these values are integers, then the quartile is the mean of X_{L} and X_{L+1}.
Otherwise if these values have fractional digits, then roundup to the nearest integer(ceiling function) and the quartile is X_{\lceil L \rceil}.

For more complicated options for computing quartiles which are just variations of quantiles or percentiles: Computing quantiles (with Python program) /?p=125

The mean of a linear transform (scaling) of variables.

Given a data array or vector X, the mean of a linear transform Y = a +bX is given by \overline{Y} = a + b \overline{X}. The linear transform W =d (X - c) has mean \overline{W} = d (\overline{X} - c). Notice that is is zero if c = \overline{X}! Ask your teacher the easy proof of this.

  • Share/Bookmark

Leave a Reply

Digital explorations is Digg proof thanks to caching by WP Super Cache