Mathematics Variance and Standard Deviation

### Topics covered

star Limitations of mean deviation
star Variance and Standard Deviation
star Standard Deviation
star Standard deviation of a discrete frequency distribution
star Standard deviation of a continuous frequency distribution
star Shortcut method to find variance and standard deviation

### Limitations of mean deviation

color{green} ✍️ In a series, where the degree of variability is very high, the median is not a representative central tendency.

color{green} ✍️ Thus, the mean deviation about median calculated for such series can not be fully relied.

color{green} ✍️ The sum of the deviations from the mean (minus signs ignored) is more than the sum of the deviations from median.

color{green} ✍️ Therefore, the mean deviation about the mean is not very scientific.Thus, in many cases, mean deviation may give unsatisfactory results.

color{green} ✍️ Also mean deviation is calculated on the basis of absolute values of the deviations and therefore, cannot be subjected to further algebraic treatment.

color{green} ✍️ This implies that we must have some other measure of dispersion. Standard deviation is such a measure of dispersion.

### Variance and Standard Deviation

color{green} ✍️ Recall that while calculating mean deviation about mean or median, the absolute values of the deviations were taken. The absolute values were taken to give meaning to the mean deviation, otherwise the deviations may cancel among themselves.

color{green} ✍️ Another way to overcome this difficulty which arose due to the signs of deviations, is to take squares of all the deviations.

color{green} ✍️ Obviously all these squares of deviations are non-negative. Let color{navy}(x_1, x_2, x_3, ..., x_n) be n observations and color{navy}(barx) be their mean.

Then color{navy}((x_1-barx)^2 +(x_2-barx)^2 = sum_(i=1)^(n) (x_i-barx)^2)

color{green} ✍️ If this sum is zero, then each color{navy}((x_i barx)) has to be zero.

color{green} ✍️ This implies that there is no dispersion at all as all observations are equal to the mean color{navy}(barx)

color{green} ✍️ If color{navy}(sum_(i=1)^(n) (x_i-barx)^2) is small , this indicates that the observations color{navy}(x_1, x_2, x_3, ..., x_n) close to the mean barx and therefore, there is a lower degree of dispersion.

color{green} ✍️ On the contrary, if this sum is large, there is a higher degree of dispersion of the observations from the mean color{navy}(barx) Can we thus say that the sum color{navy}(sum_(i=1)^(n) (x_i-barx)^2) is a reasonable indicator of the degree of dispersion or scatter?

color(red)(=>"Let us take the set A of six observations 5, 15, 25, 35, 45, 55." )

The mean of the observations is color{navy}(barx=30) The sum of squares of deviations from barx for this set is

color{navy}(sum_(i=1)^(6) (x_i-barx)^2 = (5-30)^2 +(15-30)^2 + (25-30)^2 + (35-30)^2 + (45-30)^2 + (55-30)^2)

" " color{navy}(= 625 + 225 + 25 + 25 + 225 + 625 = 1750)

color(red)(=>"Let us now take another set B of 31 observations")

color{navy}(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45).

The mean of these observations is color{navy}(bary = 30)

Note that both the sets A and B of observations have a mean of 30.

Now, the sum of squares of deviations of observations for set B from the mean color{navy}(bary) is given by

color{navy}(sum_(i=1)^(31) (y_i-bary)^2 = (15–30)^2 +(16–30)^2 + (17–30)^2 + ...+ (44–30)^2 +(45–30)^2)

" " color{navy}(= (–15)^2 +(–14)^2 + ...+ (–1)^2 + 0^2 + 1^2 + 2^2 + 3^2 + ...+ 14^2 + 15^2)

" " color{navy}(= 2 [15^2 + 14^2 + ... + 1^2])

" "color{navy}(= 2 xx (15xx(15+1)(30+1))/6 = 5 × 16 × 31 = 2480)

(Because sum of squares of first n natural numbers color{navy}(= (n(n+1)(2n+1))/6) (Here color{navy}(n = 15))

If color{navy}(sum_(i=1)^(n) (x_i-barx)^2) is simply our measure of dispersion or scatter about mean, we will tend to say that the set A of six observations has a lesser dispersion about the mean than the set B of 31 observations, even though the observations in set A are more scattered from the mean (the range of deviations being from –25 to 25) than in the set B (where the range of deviations is from –15 to 15).

This is also clear from the following diagrams.

Thus, we can say that the sum of squares of deviations from the mean is not a proper measure of dispersion. To overcome this difficulty we take the mean of the squares of the deviations, i.e., we take color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2) In case of the set A, we have

Mean color{navy}(= 1/6 xx 1750= 291.67) and in case of the set B, it is color{navy}(1/31 xx 2480 = 80)

This indicates that the scatter or dispersion is more in set A than the scatter or dispersion in set B, which confirms with the geometrical representation of the two sets.

color{green} ✍️ Thus, we can take color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2) as a quantity which leads to a proper measure of dispersion.

color{green} ✍️ This number, i.e., mean of the squares of the deviations from mean is called the variance and is denoted by color{navy}(σ^2) (read as sigma square).

Therefore, the variance of n observations color{navy}(x_1, x_2,........, x_n) is given by

" " color{red}(σ^2) = 1/n sum_(i=1)^(n) (x_i-barx)^2
Q 3156167974

Find the Variance of the following data:
6, 8, 10, 12, 14, 16, 18, 20, 22, 24

Solution:

From the given data we can form the following Table 15.7. The mean is
calculated by step-deviation method taking 14 as assumed mean. The number of
observations is n = 10

Therefore Mean bar x = assumed mean 1+ (sum_(i=1)^n d_i)/n xx h= 14 +5/10 xx 2 =15

and Variance (σ^2 ) = /n sum_(i=1)^10 (x_i - bar x )^2 =1/10 xx 330 =33

Thus Standard deviation (σ ) = sqrt (33) = 5.74

### Standard Deviation

color{green} ✍️ In the calculation of variance, we find that the units of individual observations x_i and the unit of their mean color{navy}(barx) are different from that of variance, since variance involves the sum of squares of color{navy}((x_i barx )).

For this reason, the proper measure of dispersion about the mean of a set of observations is expressed as positive square-root of the variance and is called standard deviation.

Therefore, the standard deviation, usually denoted by color{navy}(σ) , is given by

color{blue}(σ = sqrt(1/n sum_(i=1)^(n) (x_i-barx)^2))

### Standard deviation of a discrete frequency distribution

color{green} ✍️ Let the given discrete frequency distribution be

color{navy}(x : " " x_1 " " x_2" " x_3 ,. . . , x_n)

color{navy}(f : " " f_1 " " f_2" " f_3 ,. . . , f_n)

In this case standard deviation color{blue}(σ = sqrt(1/N sum_(i=1)^(n) f_i (x_i-barx)^2))

where color{navy}(N= sum_(i=1)^(n) f_i)
Q 3116167979

Find the variance and standard deviation for the following data:

Solution:

Presenting the data in tabular form (Table 15.8), we get

N = 30 , sum_(i=1)^7 f_i x_i = 420 , sum_(i=1)7 f_i (x_i - bar x )^2 =1374

Therefore  bar x = (sum_(i=1)^7 f_i x_i )/N=1/30 xx 420 =14

Hence variance  (σ^2 )=1/N sum_(i=1)^7 f_i (x_i - bar x )^2

=1/30 xx 1374 =45.8

and Standard deviation (σ )= sqrt (45.8) = 6.77

### Standard deviation of a continuous frequency distribution

The given continuous frequency distribution can be represented as a discrete frequency distribution by replacing each class by its mid-point.

Then, the standard deviation is calculated by the technique adopted in the case of a discrete frequency distribution.

If there is a frequency distribution of n classes each class defined by its mid-point color{navy}(x_i) with frequency color{navy}(f_i,) the standard deviation will be obtained by the formula

" " color{navy}(σ = sqrt(1/N sum_(i=1)^(n)f_i (x_i-barx)^2))

where color{navy}(barx) is the mean of the distribution and color{navy}(N= sum_(i=1)^(n) f_i)

color(green)(ul"Another formula for standard deviation") We know that

Variance color{green}((σ^2) = 1/N f_i sum_(i=1)^(n) (x_i-barx)^2 = 1/N sum_(i=1)^(n) f_i (x_(i)^(2) + barx^2 - 2barx x_i))

 " " color{}(= 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + sum_(i=1)^(n) barx^2 f_i - sum_(i=1)^(n) 2barxf_ix_i])

 " " color{}( = 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + barx^2sum_(i=1)^(n) f_i - 2barxsum_(i=1)^(n)f_ix_i])

 " " color{}(= 1/N [sum_(i=1)^(n)f_ix_i +barx^2N - 2barx.Nbarx]

" " [Here 1/N sum_(i=1)^(n)x_i f_i= barx or sum_(i=1)^(n)x_i f_i = Nbarx]

 " " color{}(= 1/N sum_(i=1)^(n)f_i x_(i)^(2) +barx^2 - 2barx^2

" " = 1/N sum_(i=1)^(n)f_i x_(i)^(2)-barx^2

or color{navy}(sigma^2 = 1/N sum_(i=1)^(n)f_ix_(i)^(2) - ((sum_(i=1)^(n)f_ix_(i))/N)^2

" " = 1/(N^2) [ N sum_(i=1)^(n)f_ix_(i)^(2) - (sum_(i=1)^(n)f_ix_(i))^2])

Thus, standard deviation color{red}((sigma) = 1/N sqrt(Nsum_(i=1)^(n)f_i x_(i)^(2) - (sum_(i=1)^(n)f_ix_i)^2)
Q 3186178077

Calculate the mean, variance and standard deviation for the following
distribution :
Class 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 3 7 12 15 8 3 2

Solution:

From the given data, we construct the following Table 15.9.

Thus Mean bar x = 1/N sum_(i=1)^7 f_i x_i = 3100/50 = 62

Variance (σ^2 )=1/N sum_(i=1)^7 f_i (x_i -bar x)^2

 =1/50 xx 10050 = 201

and Standard deviation (σ )  = sqrt (201) =14.18
Q 3146278173

Find the standard deviation for the following data :

Solution:

Let us form the following Table 15.10:

Now, by formula (3), we have

 σ =1/N sqrt(N sum f_i x_(i)^2 - (sum f_i x_i)^2)

=1/48 sqrt (48 xx 9652 - (614)^2 )

= 1/48 sqrt (463296-376996)

=1/48 xx 293.77 = 6.12

Therefore, Standard deviation (σ ) = 6.12

### Shortcut method to find variance and standard deviation

Sometimes the values of color{navy}(x_i) in a discrete distribution or the mid points xi of different classes in a continuous distribution are large and so the calculation of mean and variance becomes tedious and time consuming.

By using step-deviation method, it is possible to simplify the procedure.

Let the assumed mean be ‘A’ and the scale be reduced to color{navy}(1//h) times (h being the width of class-intervals).

Let the step-deviations or the new values be color{navy}(y_i.)

i.e.  " " y_i = (x_i-A)/h" " or " " x_i = A+hy_i ......................(1)

We know that color{green}(barx =( sum_(i=1)^(n) f_i x_i)/N) ..........................................(2)

Replacing color{navy}(x_i) from (1) in (2),

" " color{navy}(barx= ( sum_(i=1)^(n) f_i(A+hy_i))/N)

 " " = 1/N (sum_(i=1)^(n) f_i A + sum_(i=1)^(n) h f_i y_i )

" " = 1/N (A sum_(i=1)^(n) f_i + h sum_(i=1)^(n) f_i y_i)

" " = A. N/N + h (sum_(i=1)^(n)f_i y_i)/N " " ("because" sum_(i=1)^(n)f_i= N)

Thus " " color{navy}(barx = A + hbary) ..............................(3)

Now Variance of the variable color{navy}(x, sigma_(x)^(2) = 1/N sum_(i=1)^(n)f_i (x_i-barx)^2)

" " color{navy}(= 1/N sum_(i=1)^(n)f_i(A+hy_i -A-h bary)^2) (Using (1) and (3))

" " color{navy}(= 1/N sum_(i=1)^(n)f_i h^2 (y_i-bary)^2)

" " color{navy}(=(h^2)/(N) sum_(i=1)^(n)f_i (y_i-bary)^2 = h^2 × "variance of the variable" \ \ y_i)

i.e.  color{navy}(sigma_(x)^(2) = h^2sigma_(y)^(2))

or color{navy}(sigma_x = hsigma_y)....................................(4)

From (3) and (4), we have

color{navy}(sigma_x = h/N sqrt(N sum_(i=1)^(n)f_i y_(i)^(2) - ( sum_(i=1)^(n)f_i y_i)^2))
Q 3116278179

Calculate mean, Variance and Standard Deviation for the following
distribution.

Solution:

Let the assumed mean A = 65. Here h = 10
We obtain the following Table 15.11 from the given data :

Therefore bar x = A + (sum f_i y_i)/50 xx h = 65 -15/50 xx 10 = 62

Variance σ^2 = h^2/N^2 [N sum f_i y_(i)^2 - (sum f_i y_i )^2]

= ( (10)^2 )/[ (50)^2) [50 xx 105 -(-15)^2 ]

= 1/25 [ 5250 -225 ] = 201

and standard deviation (σ ) = sqrt (201) = 14.18