## Frequency Distributions and Graphs

Dec 07

Business Statistics No Comments

Link to the slides presented during class

Frequency distribution and Ogives

A table that lists or shows the number of instances or occurrences across a particular variable or category is called a frequency distribution. For example, the survey conducted in class on November 11, 2011, was presented as a table with columns representing mobile phone brands and the second column representing the number of students or counts of students owning a particular mobile phone brand. The variable or category in this example is the column representing the mobile phone brands and the number of instances or occurrences across this variable is the frequency or counts of students owning that particular brand as shown in the second column. The above mentioned example is known as univariate frequency distribution of table that logically and coherently compresses data in a tabular form.

Frequency distributions for large amount of data can be categorized and reported across multiple variables. The main purpose of developing frequency distributions is to summarize data by grouping data into mutually exclusive classes and the number of occurrences in a class. Mutually exclusive is where the occurrence of one event is not influenced or caused by another event. For example, students who own a Blackberry do not influence other students’ ownership of Nokia. A student purchased a Blackberry independent of another student who purchased a Nokia phone. Frequencies are reported with regards to common purchases of mobile phone brands.

Consider a frequency distribution, consisting of the 1^{st} column representing age categories and the 2^{nd} column representing frequencies or counts of individuals owning any mobile phone (NOT specific brands). In this example, data on number of individuals owning a mobile phone is grouped across different age intervals or levels. The characteristics of this table are as follows:

- The age categories are intervals begin with 10 – 14 and end with 40 – 44. This implies that the data was collected from individuals with a minimum age of 10 years and maximum age of 44 years.
- The difference between the age intervals is 4. The age interval could have had a bigger difference but to show a proper and equal distribution of the frequencies, smaller differences were considered
- Each age category has a lower limit and a upper limit. For example, in the age interval 10 -14, 10 is the lower limit and 14 is the upper limit. Similarly, the age interval 15 – 19, has 15 as the lower limit and 19 as the upper limit and so on
- The class intervals are discrete. This implies that there are only finite numbers of values possible within each class interval. Counts are associated with discrete class intervals and most of our examples in class will be related to discrete class intervals
- The discrete class intervals are different from continuous class intervals that require the data to be continuous for example, 16.1, 16.15, 16.17, etc. The class intervals in our example are NOT continuous because counts of students owning mobile phone brands cannot be 1.5 or 1.7. Each student owns a finite number of phones i.e. either 1 or 2 phones but not 1.5 phones. Hence, frequency tables studied during class with discrete class intervals were not converted to continuous class intervals for analyses. The conversion from discrete to continuous will not be necessary unless instructed. Examples for continuous data could include data reported in fractions. For example, weights of students in the class interval 40 – 50 could include weights like 40.1, 40.26, 40.38, 40.78, 40.87, 40.9, and so on till 49.99. The consecutive class interval will be 50 – 60.
- The 3
^{rd}column represents the cumulative frequencies. Cumulative frequencies are the sum of all frequencies upto and including that value. For example, cumulative frequency for age interval 25 – 29 is 18 (3 + 2 + 6 + 7). This implies, that 18 individuals upto (less than and equal to) the age of 29 years own mobile phones. Similarly, cumulative frequency for age interval 15 – 19 is 5 (3+2), which implies that 5 individuals, upto the ages 19 years own mobile phones. - The 4
^{th}column represents the percentage cumulative frequency or cumulative percentage, which is simply another way of expressing frequency distribution. It calculates the percentage of cumulative frequency within each interval. The main advantage of cumulative percentage is to provide an easier way of comparing data sets. For example, the cumulative percentage for the age interval 30 – 34 is 76.67% or 77% [(23/30)*100], wherein 23 is the cumulative frequency and 30 is the total frequency or counts of individuals owning mobile phones. This implies that about 77% of individuals (of the total 30 individuals) up to the age 34 years own mobile phones.

**Graphs**

Data can be presented in graphical forms. Graphs are two-dimensional, visual depiction of data that reflects relation between 2 or more variables and is accordingly presented across the x-axis and the y-axis. The x-axis is the horizontal axis that showed the values of variables which is the characteristic (or categories) of the values we are measuring whereas, the y-axis is the vertical axis that reflect the actual frequencies observed across each category. For example, the age intervals are plotted on the x-axis, while the frequencies across the age intervals are plotted on the y-axis (Please refer to slides on “Frequency distribution and Ogives”). The frequencies in the same example can be plotted as a line or as a histogram. A histogram is a series of rectangles, each proportional in the width to the range of the values within a class and proportional in height to the number or frequencies falling in the class. In our example, the difference of the age interval 10 -14, 15 – 19, and so on, which is 4 (14 minus 10 or 19 minus 15) is the width of the rectangles and the height of the histogram is the count of frequencies of mobile phone ownership across each age interval (3, 2, 6, and so on).

The cumulative percentage determined for the same example can be plotted as a line known as *ogives* (pronounced as “oh-jives”). Ogives enable a visual depiction of how many observations or frequencies lie above or below certain values, rather than merely recording the number of items within the intervals.

Other than lines and histograms, pie-charts are also commonly considered graphs for visual presentation of data.

**Problems for tabulations (Please provide proper titles and sub-titles and source to your frequency table)**

- Construct a blank table in which could be shown , at two different dates (Ending March 31, 2010 and Ending March 31, 2011) and in five industries (any), the average wages of four groups, males, females, eighteen years and over, and under eighteen years. Suggest a suitable title (no source required)
- As super market divided into five main sections: grocery, vegetables, medicines, textiles and novelties, recorded the following sales in 1985, 1986 and 1987; In 1985 the sales in groceries, vegetables, medicines and novelties were Rs 6,25,000, Rs 2, 20,000, Rs 188,000 and Rs 94,000 respectively. Textiles accounted for 30% of the total sales during the year. In 1986 the total sales showed 10% increase over the previous year while grocery and vegetables registered 8% and 10% increase over the corresponding previous year, medicines dropped by Rs 13,000 and textiles increased by Rs 53000 over their corresponding figures of 1985. In 1987, though the total sales remained the figures as in 1986, grocery fell by Rs 22000, vegetables by Rs 32,000, medicines by Rs 10,000 and novelties by Rs 12,000. Tabulate the above data (in Rs and percentages)

**Problems for constructing frequency distribution (Please provide proper titles and sub-titles and source to your frequency table)**

1. Construct a frequency distribution with discrete class intervals of the marks obtained by 50 students in economics as given below( Bharadwaj, Chapter 6, question 13)

42, 53, 65, 63, 61, 47, 58, 60, 64, 45, 55, 57, 82, 42, 39, 51, 65, 55, 33, 70, 50, 52, 53, 45, 45, 25, 36, 59, 63, 39, 65, 30, 45, 35, 49, 15, 54, 48, 64, 26, 75, 20, 42, 40, 41, 55, 52, 46, 35, 18

2. The number of children among 50 families of a locality is given below. Construct a discrete frequency distribution

2, 2, 2, 3, 2, 4, 5, 4, 6, 8, 3, 3, 1,4, 3, 1, 3, 3, 2, 1, 3, 3, 2, 4, 3, 5, 4, 3, 3, 2, 2, 5, 2, 5, 3, 3, 3, 4, 3, 5, 4, 4, 2, 6, 3, 6, 3, 3, 7, 3

3. Following are the ages of 50 members for a social service program (Reference: Levin and Rubin Exercise 2.3, SC 2-1)

83, 51, 66, 61, 82, 65, 54, 56, 92, 60, 65, 87, 68, 64, 51, 70, 75, 66, 74, 68, 44, 55, 78, 69, 98, 67, 82, 77, 79, 62, 38, 88, 76, 99, 84, 47, 60, 42, 66, 74, 91, 71, 83, 80, 68, 65, 51, 56, 73, 55

Construct a cumulative frequency and percentage cumulative frequency distribution using 7-equal intervals and 13 equal intervals. State policies on social service programs require that about 50% of the programme participants should be older than 50. Which class interval (7 or 13) helps you answer whether the programme is compliant with the policy. (Hint: check the cumulative percentage before it approaches the end of the column. Another easy method of confirming the answer (without calculating the cumulative frequency) is by adding the frequencies from 50 years (50 – 59 or 50-54 age interval) and dividing them by the total frequencies