`color{green} ✍️` Recall that while calculating mean deviation about mean or median, the absolute values of the deviations were taken. The absolute values were taken to give meaning to the mean deviation, otherwise the deviations may cancel among themselves.
`color{green} ✍️` Another way to overcome this difficulty which arose due to the signs of deviations, is to take squares of all the deviations.
`color{green} ✍️` Obviously all these squares of deviations are non-negative. Let `color{navy}(x_1, x_2, x_3, ..., x_n)` be `n` observations and `color{navy}(barx)` be their mean.
Then `color{navy}((x_1-barx)^2 +(x_2-barx)^2 = sum_(i=1)^(n) (x_i-barx)^2)`
`color{green} ✍️` If this sum is zero, then each `color{navy}((x_i barx))` has to be zero.
`color{green} ✍️` This implies that there is no dispersion at all as all observations are equal to the mean `color{navy}(barx)`
`color{green} ✍️` If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is small , this indicates that the observations `color{navy}(x_1, x_2, x_3, ..., x_n)` close to the mean `barx` and therefore, there is a lower degree of dispersion.
`color{green} ✍️` On the contrary, if this sum is large, there is a higher degree of dispersion of the observations from the mean `color{navy}(barx)` Can we thus say that the sum `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is a reasonable indicator of the degree of dispersion or scatter?
`color(red)(=>"Let us take the set A of six observations 5, 15, 25, 35, 45, 55." )`
The mean of the observations is `color{navy}(barx=30)` The sum of squares of deviations from `barx` for this set is
`color{navy}(sum_(i=1)^(6) (x_i-barx)^2 = (5-30)^2 +(15-30)^2 + (25-30)^2 + (35-30)^2 + (45-30)^2 + (55-30)^2)`
`" " color{navy}(= 625 + 225 + 25 + 25 + 225 + 625 = 1750)`
`color(red)(=>"Let us now take another set B of 31 observations")`
`color{navy}(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)`.
The mean of these observations is `color{navy}(bary = 30)`
Note that both the sets `A` and `B` of observations have a mean of `30.`
Now, the sum of squares of deviations of observations for set B from the mean `color{navy}(bary)` is given by
`color{navy}(sum_(i=1)^(31) (y_i-bary)^2 = (15–30)^2 +(16–30)^2 + (17–30)^2 + ...+ (44–30)^2 +(45–30)^2)`
`" " color{navy}(= (–15)^2 +(–14)^2 + ...+ (–1)^2 + 0^2 + 1^2 + 2^2 + 3^2 + ...+ 14^2 + 15^2)`
`" " color{navy}(= 2 [15^2 + 14^2 + ... + 1^2])`
`" "color{navy}(= 2 xx (15xx(15+1)(30+1))/6 = 5 × 16 × 31 = 2480)`
(Because sum of squares of first `n` natural numbers `color{navy}(= (n(n+1)(2n+1))/6)` (Here `color{navy}(n = 15)`)
If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is simply our measure of dispersion or scatter about mean, we will tend to say that the set `A` of six observations has a lesser dispersion about the mean than the set `B` of `31` observations, even though the observations in set `A` are more scattered from the mean (the range of deviations being from `–25` to `25`) than in the set `B` (where the range of deviations is from `–15` to `15`).
This is also clear from the following diagrams.
Thus, we can say that the sum of squares of deviations from the mean is not a proper measure of dispersion. To overcome this difficulty we take the mean of the squares of the deviations, i.e., we take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` In case of the set A, we have
Mean `color{navy}(= 1/6 xx 1750= 291.67)` and in case of the set B, it is `color{navy}(1/31 xx 2480 = 80)`
This indicates that the scatter or dispersion is more in set `A` than the scatter or dispersion in set `B`, which confirms with the geometrical representation of the two sets.
`color{green} ✍️` Thus, we can take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` as a quantity which leads to a proper measure of dispersion.
`color{green} ✍️` This number, i.e., mean of the squares of the deviations from mean is called the variance and is denoted by `color{navy}(σ^2)` (read as sigma square).
Therefore, the variance of `n` observations `color{navy}(x_1, x_2,........, x_n)` is given by
`" " color{red}(σ^2) = 1/n sum_(i=1)^(n) (x_i-barx)^2`
`color{green} ✍️` Recall that while calculating mean deviation about mean or median, the absolute values of the deviations were taken. The absolute values were taken to give meaning to the mean deviation, otherwise the deviations may cancel among themselves.
`color{green} ✍️` Another way to overcome this difficulty which arose due to the signs of deviations, is to take squares of all the deviations.
`color{green} ✍️` Obviously all these squares of deviations are non-negative. Let `color{navy}(x_1, x_2, x_3, ..., x_n)` be `n` observations and `color{navy}(barx)` be their mean.
Then `color{navy}((x_1-barx)^2 +(x_2-barx)^2 = sum_(i=1)^(n) (x_i-barx)^2)`
`color{green} ✍️` If this sum is zero, then each `color{navy}((x_i barx))` has to be zero.
`color{green} ✍️` This implies that there is no dispersion at all as all observations are equal to the mean `color{navy}(barx)`
`color{green} ✍️` If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is small , this indicates that the observations `color{navy}(x_1, x_2, x_3, ..., x_n)` close to the mean `barx` and therefore, there is a lower degree of dispersion.
`color{green} ✍️` On the contrary, if this sum is large, there is a higher degree of dispersion of the observations from the mean `color{navy}(barx)` Can we thus say that the sum `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is a reasonable indicator of the degree of dispersion or scatter?
`color(red)(=>"Let us take the set A of six observations 5, 15, 25, 35, 45, 55." )`
The mean of the observations is `color{navy}(barx=30)` The sum of squares of deviations from `barx` for this set is
`color{navy}(sum_(i=1)^(6) (x_i-barx)^2 = (5-30)^2 +(15-30)^2 + (25-30)^2 + (35-30)^2 + (45-30)^2 + (55-30)^2)`
`" " color{navy}(= 625 + 225 + 25 + 25 + 225 + 625 = 1750)`
`color(red)(=>"Let us now take another set B of 31 observations")`
`color{navy}(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)`.
The mean of these observations is `color{navy}(bary = 30)`
Note that both the sets `A` and `B` of observations have a mean of `30.`
Now, the sum of squares of deviations of observations for set B from the mean `color{navy}(bary)` is given by
`color{navy}(sum_(i=1)^(31) (y_i-bary)^2 = (15–30)^2 +(16–30)^2 + (17–30)^2 + ...+ (44–30)^2 +(45–30)^2)`
`" " color{navy}(= (–15)^2 +(–14)^2 + ...+ (–1)^2 + 0^2 + 1^2 + 2^2 + 3^2 + ...+ 14^2 + 15^2)`
`" " color{navy}(= 2 [15^2 + 14^2 + ... + 1^2])`
`" "color{navy}(= 2 xx (15xx(15+1)(30+1))/6 = 5 × 16 × 31 = 2480)`
(Because sum of squares of first `n` natural numbers `color{navy}(= (n(n+1)(2n+1))/6)` (Here `color{navy}(n = 15)`)
If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is simply our measure of dispersion or scatter about mean, we will tend to say that the set `A` of six observations has a lesser dispersion about the mean than the set `B` of `31` observations, even though the observations in set `A` are more scattered from the mean (the range of deviations being from `–25` to `25`) than in the set `B` (where the range of deviations is from `–15` to `15`).
This is also clear from the following diagrams.
Thus, we can say that the sum of squares of deviations from the mean is not a proper measure of dispersion. To overcome this difficulty we take the mean of the squares of the deviations, i.e., we take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` In case of the set A, we have
Mean `color{navy}(= 1/6 xx 1750= 291.67)` and in case of the set B, it is `color{navy}(1/31 xx 2480 = 80)`
This indicates that the scatter or dispersion is more in set `A` than the scatter or dispersion in set `B`, which confirms with the geometrical representation of the two sets.
`color{green} ✍️` Thus, we can take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` as a quantity which leads to a proper measure of dispersion.
`color{green} ✍️` This number, i.e., mean of the squares of the deviations from mean is called the variance and is denoted by `color{navy}(σ^2)` (read as sigma square).
Therefore, the variance of `n` observations `color{navy}(x_1, x_2,........, x_n)` is given by
`" " color{red}(σ^2) = 1/n sum_(i=1)^(n) (x_i-barx)^2`