1. Statistics :Measures of Dispersion

Chapter 15

Statistics

Statistics : Measures of Dispersion:

A measure of central tendency gives us a rough idea where data points are centred.

the measures of central tendency are not sufficient to give complete information about a given data. Variability is another factor which is required to be studied under statistics. Like ‘measures of central tendency’ we want to have a single number to describe variability. This single number is called a ‘measure of dispersion’.

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is Small, the data in the set is clustered.

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse.

https://upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Comparison_standard_deviations.svg/400px-Comparison_standard_deviations.svg.png

Most measures of dispersion have the same units as the quantity being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion.

Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

A measure of dispersion indicates the scattering of data. It explains the disparity of data from one another, delivering a precise view of their distribution. The measure of dispersion displays and gives us an idea about the variation and the central value of an individual item.

OR

the measures of dispersion help to interpret the variability of data i.e. to know how much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or scattered the variable is.

Measures of dispersion are vital because they can show you the within a specific sample, or group of people. When it comes to samples, that dispersion is important because it determines the margin of error you'll have when making inferences about measures of central tendency, like averages.

The variation can be measured in different numerical measures, namely:

(i) Range: It is the simplest method of measurement of dispersion and defines the difference between the largest and the smallest item in a given distribution. If Y max and Y min are the two ultimate items, then

Range = Y max – Y min

(ii) Quartile deviation: It is known as semi-interquartile range, i.e., half of the difference between the upper quartile and lower quartile. The first quartile is derived as Q, the middle digit Q1 connects the least number with the median of the data. The median of a data set is the (Q2) second quartile. Lastly, the number connecting the largest number and the median is the third quartile (Q3). Quartile deviation can be calculated by

Q = ½ × (Q3 – Q1)

(iii) Mean deviation: Mean deviation is the arithmetic mean (average) of deviations  MD of observations from a central value (mean or median).

Mean deviation can be evaluated by using the formula: 

Thus mean deviation about a central value ‘A’ is the mean of the absolute values of the deviations of the observations

from ‘A’. The mean deviation from ‘a’ is denoted as M.D. (A).

(iv) Standard deviation: Standard deviation is the square root of the arithmetic average of the square of the deviations measured from the mean. The standard deviation is given as,

Mean

The average of the given set of data is computed by dividing the total number of numbers by the sum of the given numbers.

Mean = (Sum of all observations/Total number of observations)

Mean for Ungrouped Data

Example:

There are 20 pupils in a class, and their grades are 88, 82, 88, 85, 84, 80, 81, 82, 83, 85, 84, 74, 75, 76, 89, 90, 89, 80, 82, and 83.

The mean is the sum of the percentages obtained

= [88 + 82 + 88 + 85 + 84 + 80 + 81 + 82 + 83 + 85 + 84 + 74 + 75 + 76 + 89 + 90 + 89 + 80 + 82 + 83] /20 = 1660/20 = 83 %

Mean for Grouped Data

In statistics, the mean, or arithmetic mean, of a group of numbers is the sum of the numbers divided by the number of numbers in the group. The mean is a measure of the central tendency of a group of numbers.

When dealing with grouped data, the mean is calculated by first finding the midpoint of each group, then finding the sum of the numbers in each group, and finally dividing the sum by the number of groups.

2. Mean deviation, range, variance and standard deviation of grouped and ungrouped data

Mean deviation, range, variance and standard deviation of grouped and ungrouped data:

Range of Ungrouped Data

We know now that range is the difference between the maximum and minimum value. Hence for ungrouped data, we arrange the series in ascending or descending order. This helps us to select the highest and lowest values in the distribution. Henceforth, we simply subtract the minimum value from the maximum value.

Example :

The marks of a student in 5 tests of the chapter statistics are(out of 20)- 11, 14, 16, 13 and 18.

Arranging them in ascending order- 18, 16, 14, 13 and 11. The range of the data is given as- 18-11=7.

Range= Maximum value – Minimum value

Mean Deviation for Ungrouped Data

Mean deviation measures the dispersion of data about a measure of central tendency. This measure of central tendency is generally median or mean. How to calculate mean and median for individual distribution series.

Calculating Mean and Median

For calculation of median, we first arrange the data in ascending or descending order(generally ascending order). Further, we count the number of observations which is denoted by n. Now depending on whether n is even or odd, the further calculation is bifurcated as:

  • If n is an odd number then the value of (n+1)/2th item is the median.
  • If n is an even number then median is given as: [ value of (n+1)/2th item + value of (n/2 +1)th item]÷2 

Mean is simply calculated as the ration of summation of observations to the number of observations.

Mean= Sum of observations/number of observations

Steps to Calculate the Mean Deviation for Ungrouped data

To calculate the mean deviation for ungrouped data, the following steps are followed:

Let the set of data consist of  observations.

Step i) The measure of central tendency about which mean deviation is to be found out is calculated. Let a be assumed mean.

Step ii) Calculate the absolute deviation of each observation from the measure of central tendency calculated in step (i) i.e.,

Step iii) Evaluate the mean of all the absolute deviations. This gives the mean absolute deviation (M.D) about ‘a‘ for ungrouped data i.e.,

In case the measure of central tendency is mean the above equation can be rewritten as:

Mean Deviation Formulas for Grouped Data:

Mean deviation for grouped data We know that data can be grouped into

two ways :

(a) Discrete frequency distribution,

(b) Continuous frequency distribution.

Let us discuss the method of finding mean deviation for both types of the data.

(a) Discrete frequency distribution Let the given data consist of n distinct values

x1, x2, ..., xn occurring with frequencies f1, f2 , ..., fn respectively. This data can be

represented in the tabular form as given below, and is called discrete frequency

distribution:

x : xxx3 ... xn

f : f1   f2     f3  ... fn

(i) Mean deviation about mean

First of all we find the mean x of the given data by using the formula

where    denotes the sum of the products of observations xi with their respective frequencies fi and  is the sum of the frequencies.

Then, we find the deviations of observations xi from the mean x and take their absolute values,

i.e.,| x- xi | for all i =1, 2,..., n.

After this, find the mean of the absolute values of the deviations, which is the required mean deviation about the mean. Thus

(ii) Mean deviation about median To find mean deviation about median, we find the median of the given discrete frequency distribution. For this the observations are arranged in ascending order. After this the cumulative frequencies are obtained. Then, we identify the observation whose cumulative frequency is equal to or just greater than N/2 , where N is the sum of frequencies. This value of the observation lies in the middle of the data, therefore, it is the required median. After finding median, we obtain the mean of the absolute values of the deviations from median .Thus,

where  Median

Example:  Find mean deviation about the mean for the following data :

Solution:

Variance:

A variance of zero indicates that all the values are identical. It should be noted that variance is always non-negative- a small variance indicates that the data points tend to be very close to the mean and hence to each other while a high variance indicates that the data points are very spread out around the mean and from each other.

Mean of the squares of the deviations from mean is called the variance and is denoted by

s2 (read as sigma square). Therefore, the variance of n observations x1, x2,..., xn is given by

Standard Deviation :  In the calculation of variance, we find that the units of individual observations xi and the unit of their mean x are different from that of variance, since variance involves the sum of squares of (xi– x ). For this reason, the proper measure of dispersion about the mean of a set of observations is expressed as positive square-root of the variance and is called standard deviation. Therefore, the standard deviation, usually denoted by s

Properties of Variance

  • It is always non-negative since each term in the variance sum is squared and therefore the result is either positive or zero.
  • Variance always has squared units. For example, the variance of a set of weights estimated in kilograms will be given in kg squared. Since the population variance is squared, we cannot compare it directly with the mean or the data themselves.

Variance Formulas for UnGrouped Data

Variance Formulas for Grouped Data

Standard deviation of a discrete frequency distribution: Let the given discrete

frequency distribution be

 

Standard deviation of a continuous frequency distribution: The given continuous frequency distribution can be represented as a discrete frequency distribution by replacing each class by its mid-point. Then, the standard deviation is calculated by the technique adopted in the case of a discrete frequency distribution.

If there is a frequency distribution of n classes each class defined by its mid-point

xi with frequency fi, the standard deviation will be obtained by

where  is the mean of the distribution and

Another formula for standard deviation :

We know that

Variance (s2 )  =

Properties of Standard Deviation

  • It describes the square root of the mean of the squares of all values in a data set and is also called the root-mean-square deviation.
  • The smallest value of the standard deviation is 0 since it cannot be negative.
  • When the data values of a group are similar, then the standard deviation will be very low or close to zero. But when the data values vary with each other, then the standard variation is high or far from zero.

Question: Find the variance for the following set of data representing trees heights in feet: 3, 21, 98, 203, 17, 9

Solution:

Step 1: Add up the numbers in your given data set.

3 + 21 + 98 + 203 + 17 + 9 = 351

Step 2: Square your answer:

351 × 351 = 123201

…and divide by the number of items. We have 6 items in our example so:

123201/6 = 20533.5

Step 3: Take your set of original numbers from Step 1, and square them individually this time:

3 × 3 + 21 × 21 + 98 × 98 + 203 × 203 + 17 × 17 + 9 × 9

Add the squares together:

9 + 441 + 9604 + 41209 + 289 + 81 = 51,633

Step 4: Subtract the amount in Step 2 from the amount in Step 3.

51633 – 20533.5 = 31,099.5

Set this number aside for a moment.

Step 5: Subtract 1 from the number of items in your data set. For our example:

6 – 1 = 5

Step 6: Divide the number in Step 4 by the number in Step 5. This gives you the variance:

31099.5/5 = 6219.9

Step 7: Take the square root of your answer from Step 6. This gives you the standard deviation:

σ =√6219.9 = 78.86634

The answer is 78.86.

Question :

Calculate the variance for the following data:

Solution:

What Are the merits and demerits of range?

Merits

  1. It is very easy to calculate and simple to understand.
  2. No special knowledge is needed while calculating range.
  3. It takes the least time for computation.
  4. It provides a broad picture of the data at a glance.

Demerits

  1. It is a crude measure because it is only based on two extreme values (highest and lowest).
  2. It cannot be calculated in the case of open-ended series.
  3. Range is significantly affected by fluctuations of sampling, i.e. it varies widely from sample to sample.

Merits and demerits of Quartile Deviation

Merits

  1. It is also quite easy to calculate and simple to understand.
  2. It can be used even in case of open-end distribution.
  3. It is less affected by extreme values so, it a superior to ‘Range’.
  4. It is more useful when the dispersion of the middle 50% is to be computed.

Demerits

  1. It is not based on all the observations.
  2. It is not capable of further algebraic treatment or statistical analysis.
  3. It is affected considerably by fluctuations of sampling.
  4. It is not regarded as a very reliable measure of dispersion because it ignores 50% observations.

What Are the merits and demerits of mean deviation?

Merits

  1. It is based on all the observations of the series and not only on the limits like Range and QD.
  2. It is simple to calculate and easy to understand.
  3. It is not much affected by extreme values.
  4. For calculating mean deviation, deviations can be taken from any average.

Demerits

  1. Ignoring + and – signs is bad from the mathematical viewpoint.
  2. It is not capable of further mathematical treatment.
  3. It is difficult to compute when the mean or median is in fraction.
  4. It may not be possible to use this method in case of open ended series.

3. Discrete and Continuous distributions

Discrete and Continuous distributions:

A discrete distribution is one in which the data can only take on certain values, for example integers. 

Thus, in a discrete frequency distribution, the values of the variable are determined individually. The number of times each value occurs denotes the frequencies of the particular value or observation.

A continuous frequency distribution is a series in which the data are classified into different class intervals without gaps and their respective frequencies are assigned as per the class intervals and class width.

Continuous data is data that can take any value. Height, weight, temperature and length are all examples of continuous data.

Question: Calculate the variance and standard deviation of the following continuous frequency distribution

Solution:

 

Example: Prepare a discrete frequency distribution table for the following data.

12, 21, 21, 3, 9, 3, 6, 12, 13, 21, 15, 22, 3, 6, 9, 9, 21, 22, 15, 13, 15, 9, 15, 6, 15, 13, 6, 9, 13, 22

Solution:

Given data:

12, 21, 21, 3, 9, 3, 6, 12, 13, 21, 15, 22, 3, 6, 9, 9, 21, 22, 15, 13, 15, 9, 15, 6, 15, 13, 6, 9, 13, 22

The discrete frequency distribution table is given as:

4. Frequency distributions analysis

Frequency distributions analysis

Frequency distribution, in statistics, a graph or data set organized to show the frequency of occurrence of each possible outcome of a repeatable event observed many times. Simple examples are election returns and test scores listed by percentile. A frequency distribution can be graphed as a histogram or pie chart.

 Frequency is the number of times an event occurs. Frequency Analysis is an important area of statistics that deals with the number of occurrences (frequency) and analyzes measures of central tendency, dispersion, percentiles, etc.

 Whenever we want to compare the variability of two series with the same mean, measured in different units, we do not merely calculate the measures of dispersion. Still, we need such measures which are independent of the units. The measure of variability, independent of units, is called the coefficient of variation (CV). However, we know that the mean deviation and the standard deviation have the same units in which the data are given.

The coefficient of variation is defined as the percentage of standard deviation over mean. This can be calculated as:

Here,  = Standard deviation of the data

 = Mean of the data

we calculate the coefficient of variance for each series. The series having greater C.V. is said to be more variable than the other. The series having lesser C.V. is said to be more consistent than the other.

Comparison of two frequency distributions with same mean:

Let x  and s1 be the mean and standard deviation of the first distribution, and   and s2 be the

mean and standard deviation of the second distribution.

It is clear from (1) and (2) that the two C.Vs. can be compared on the basis of values

of s1 and s2only.

Thus, we say that for two series with equal means, the series with greater standard

deviation (or variance) is called more variable or dispersed than the other. Also, the

series with lesser value of standard deviation (or variance) is said to be more consistent

than the other.

Question:

The below data shows the mean and variance of heights and the corresponding weights of the students of Class X:

What can be said about the weights and the heights?

Solution:

For the given, we consider the heights of students as one series of data and weights as the other series of data.

So,

Mean of height = 162.6 cm

Variance of height = 127.69cm2

Therefore, standard deviation of height = √127.69 cm = 11.3 cm

Also,

Mean of weight = 52.36 kg

Variance of weight = 23.1361 kg2

Thus, the standard deviation of weight = √23.1361 kg = 4.81 kg

Now, we need to calculate the coefficient of variation for these two data sets to identify the relationship between them.

Using the coefficient of variation formula,

i.e. CV = (standard deviation/mean) × 100

For heights, the coefficient of variation (C.V.) = (11.3/162.6) × 100 = 6.95

For weights, the coefficient of variation (CV) = (4.81/52.36) × 100 = 9.18

Here, the C.V. of heights is lesser than the C.V. of weights.

Therefore, weights show more variability than heights.

5. Shortcut method to find variance and standard deviation

Shortcut method to find variance and standard deviation:

Shortcut method to find variance and standard deviation Sometimes the values of xi in a discrete distribution or the mid points xi of different classes in a continuous distribution are large and so the calculation of mean and variance becomes tedious and time consuming. By using step-deviation method, it is possible to simplify the procedure.

Let the assumed mean be ‘A’ and the scale be reduced to 1/h times (h being the width of class-intervals). Let the step-deviations or the new values be yi.

i.e