Mathematics Variance and Standard Deviation

Limitations of mean deviation

`color{green} ✍️` In a series, where the degree of variability is very high, the median is not a representative central tendency.

`color{green} ✍️` Thus, the mean deviation about median calculated for such series can not be fully relied.

`color{green} ✍️` The sum of the deviations from the mean (minus signs ignored) is more than the sum of the deviations from median.

`color{green} ✍️` Therefore, the mean deviation about the mean is not very scientific.Thus, in many cases, mean deviation may give unsatisfactory results.

`color{green} ✍️` Also mean deviation is calculated on the basis of absolute values of the deviations and therefore, cannot be subjected to further algebraic treatment.

`color{green} ✍️` This implies that we must have some other measure of dispersion. Standard deviation is such a measure of dispersion.

Variance and Standard Deviation

`color{green} ✍️` Recall that while calculating mean deviation about mean or median, the absolute values of the deviations were taken. The absolute values were taken to give meaning to the mean deviation, otherwise the deviations may cancel among themselves.

`color{green} ✍️` Another way to overcome this difficulty which arose due to the signs of deviations, is to take squares of all the deviations.

`color{green} ✍️` Obviously all these squares of deviations are non-negative. Let `color{navy}(x_1, x_2, x_3, ..., x_n)` be `n` observations and `color{navy}(barx)` be their mean.

Then `color{navy}((x_1-barx)^2 +(x_2-barx)^2 = sum_(i=1)^(n) (x_i-barx)^2)`

`color{green} ✍️` If this sum is zero, then each `color{navy}((x_i barx))` has to be zero.

`color{green} ✍️` This implies that there is no dispersion at all as all observations are equal to the mean `color{navy}(barx)`

`color{green} ✍️` If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is small , this indicates that the observations `color{navy}(x_1, x_2, x_3, ..., x_n)` close to the mean `barx` and therefore, there is a lower degree of dispersion.

`color{green} ✍️` On the contrary, if this sum is large, there is a higher degree of dispersion of the observations from the mean `color{navy}(barx)` Can we thus say that the sum `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is a reasonable indicator of the degree of dispersion or scatter?

`color(red)(=>"Let us take the set A of six observations 5, 15, 25, 35, 45, 55." )`

The mean of the observations is `color{navy}(barx=30)` The sum of squares of deviations from `barx` for this set is

`color{navy}(sum_(i=1)^(6) (x_i-barx)^2 = (5-30)^2 +(15-30)^2 + (25-30)^2 + (35-30)^2 + (45-30)^2 + (55-30)^2)`

`" " color{navy}(= 625 + 225 + 25 + 25 + 225 + 625 = 1750)`

`color(red)(=>"Let us now take another set B of 31 observations")`

`color{navy}(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)`.

The mean of these observations is `color{navy}(bary = 30)`

Note that both the sets `A` and `B` of observations have a mean of `30.`

Now, the sum of squares of deviations of observations for set B from the mean `color{navy}(bary)` is given by

`color{navy}(sum_(i=1)^(31) (y_i-bary)^2 = (15–30)^2 +(16–30)^2 + (17–30)^2 + ...+ (44–30)^2 +(45–30)^2)`

`" " color{navy}(= (–15)^2 +(–14)^2 + ...+ (–1)^2 + 0^2 + 1^2 + 2^2 + 3^2 + ...+ 14^2 + 15^2)`

`" " color{navy}(= 2 [15^2 + 14^2 + ... + 1^2])`

`" "color{navy}(= 2 xx (15xx(15+1)(30+1))/6 = 5 × 16 × 31 = 2480)`

(Because sum of squares of first `n` natural numbers `color{navy}(= (n(n+1)(2n+1))/6)` (Here `color{navy}(n = 15)`)

If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is simply our measure of dispersion or scatter about mean, we will tend to say that the set `A` of six observations has a lesser dispersion about the mean than the set `B` of `31` observations, even though the observations in set `A` are more scattered from the mean (the range of deviations being from `–25` to `25`) than in the set `B` (where the range of deviations is from `–15` to `15`).

This is also clear from the following diagrams.

Thus, we can say that the sum of squares of deviations from the mean is not a proper measure of dispersion. To overcome this difficulty we take the mean of the squares of the deviations, i.e., we take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` In case of the set A, we have

Mean `color{navy}(= 1/6 xx 1750= 291.67)` and in case of the set B, it is `color{navy}(1/31 xx 2480 = 80)`

This indicates that the scatter or dispersion is more in set `A` than the scatter or dispersion in set `B`, which confirms with the geometrical representation of the two sets.

`color{green} ✍️` Thus, we can take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` as a quantity which leads to a proper measure of dispersion.

`color{green} ✍️` This number, i.e., mean of the squares of the deviations from mean is called the variance and is denoted by `color{navy}(σ^2)` (read as sigma square).

Therefore, the variance of `n` observations `color{navy}(x_1, x_2,........, x_n)` is given by

`" " color{red}(σ^2) = 1/n sum_(i=1)^(n) (x_i-barx)^2`

Standard deviation of a continuous frequency distribution

The given continuous frequency distribution can be represented as a discrete frequency distribution by replacing each class by its mid-point.

Then, the standard deviation is calculated by the technique adopted in the case of a discrete frequency distribution.

If there is a frequency distribution of n classes each class defined by its mid-point `color{navy}(x_i)` with frequency `color{navy}(f_i,)` the standard deviation will be obtained by the formula

`" " color{navy}(σ = sqrt(1/N sum_(i=1)^(n)f_i (x_i-barx)^2))`

where `color{navy}(barx)` is the mean of the distribution and `color{navy}(N= sum_(i=1)^(n) f_i)`

`color(green)(ul"Another formula for standard deviation")` We know that

Variance `color{green}((σ^2) = 1/N f_i sum_(i=1)^(n) (x_i-barx)^2 = 1/N sum_(i=1)^(n) f_i (x_(i)^(2) + barx^2 - 2barx x_i))`

` " " color{}(= 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + sum_(i=1)^(n) barx^2 f_i - sum_(i=1)^(n) 2barxf_ix_i])`

` " " color{}( = 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + barx^2sum_(i=1)^(n) f_i - 2barxsum_(i=1)^(n)f_ix_i])`

` " " color{}(= 1/N [sum_(i=1)^(n)f_ix_i +barx^2N - 2barx.Nbarx]`

`" " [Here 1/N sum_(i=1)^(n)x_i f_i= barx or sum_(i=1)^(n)x_i f_i = Nbarx]`

` " " color{}(= 1/N sum_(i=1)^(n)f_i x_(i)^(2) +barx^2 - 2barx^2 `

`" " = 1/N sum_(i=1)^(n)f_i x_(i)^(2)-barx^2`

or `color{navy}(sigma^2 = 1/N sum_(i=1)^(n)f_ix_(i)^(2) - ((sum_(i=1)^(n)f_ix_(i))/N)^2 `

`" " = 1/(N^2) [ N sum_(i=1)^(n)f_ix_(i)^(2) - (sum_(i=1)^(n)f_ix_(i))^2])`

Thus, standard deviation `color{red}((sigma) = 1/N sqrt(Nsum_(i=1)^(n)f_i x_(i)^(2) - (sum_(i=1)^(n)f_ix_i)^2)`

Shortcut method to find variance and standard deviation

Sometimes the values of `color{navy}(x_i)` in a discrete distribution or the mid points xi of different classes in a continuous distribution are large and so the calculation of mean and variance becomes tedious and time consuming.

By using step-deviation method, it is possible to simplify the procedure.

Let the assumed mean be ‘`A’` and the scale be reduced to `color{navy}(1//h)` times (`h` being the width of class-intervals).

Let the step-deviations or the new values be `color{navy}(y_i.)`

i.e. ` " " y_i = (x_i-A)/h" " or " " x_i = A+hy_i` ......................(1)

We know that `color{green}(barx =( sum_(i=1)^(n) f_i x_i)/N)` ..........................................(2)

Replacing `color{navy}(x_i)` from `(1)` in `(2),`

`" " color{navy}(barx= ( sum_(i=1)^(n) f_i(A+hy_i))/N)`

` " " = 1/N (sum_(i=1)^(n) f_i A + sum_(i=1)^(n) h f_i y_i ) `

`" " = 1/N (A sum_(i=1)^(n) f_i + h sum_(i=1)^(n) f_i y_i)`

`" " = A. N/N + h (sum_(i=1)^(n)f_i y_i)/N " " ("because" sum_(i=1)^(n)f_i= N)`

Thus `" " color{navy}(barx = A + hbary)` ..............................(3)

Now Variance of the variable `color{navy}(x, sigma_(x)^(2) = 1/N sum_(i=1)^(n)f_i (x_i-barx)^2)`

`" " color{navy}(= 1/N sum_(i=1)^(n)f_i(A+hy_i -A-h bary)^2)` (Using (1) and (3))

`" " color{navy}(= 1/N sum_(i=1)^(n)f_i h^2 (y_i-bary)^2)`

`" " color{navy}(=(h^2)/(N) sum_(i=1)^(n)f_i (y_i-bary)^2 = h^2 × "variance of the variable" \ \ y_i)`

i.e. ` color{navy}(sigma_(x)^(2) = h^2sigma_(y)^(2))`

or `color{navy}(sigma_x = hsigma_y)`....................................(4)

From (3) and (4), we have

`color{navy}(sigma_x = h/N sqrt(N sum_(i=1)^(n)f_i y_(i)^(2) - ( sum_(i=1)^(n)f_i y_i)^2))`

Variance and Standard Deviation

Topics covered

Limitations of mean deviation

Variance and Standard Deviation

Standard Deviation

Standard deviation of a discrete frequency distribution

Standard deviation of a continuous frequency distribution

Shortcut method to find variance and standard deviation