Cumulative Frequency & Box Plots
Visually represent data distribution using cumulative frequency curves and box plots.
### Introduction to Cumulative Frequency
Cumulative frequency is a 'running total' of frequencies. It is used for grouped data to understand how many data points fall below a certain value. By plotting this information, we can create a cumulative frequency diagram, also known as an ogive, which is an S-shaped curve. This powerful tool allows us to estimate key statistical measures like the median, quartiles, and percentiles, providing a deep insight into the data's distribution.
### Constructing a Cumulative Frequency Table and Diagram
1. Create the Cumulative Frequency Table:
Start with a standard grouped frequency table. To prepare for plotting, you need to add two new columns:
* Upper Class Boundary: This is the upper limit of each class interval, which will be used for the x-axis. For continuous data, this is straightforward. For discrete data (like scores), you might need to find the midpoint between the upper limit of one class and the lower limit of the next (e.g., if a class is 11-20 and the next is 21-30, the upper boundary for the first class is 20.5).
* Cumulative Frequency: Calculate this by adding the frequency of each class to the total of the frequencies of the classes before it. The final cumulative frequency must equal the total number of data points (N).
Process:
* Start with a cumulative frequency of 0 at the lower boundary of the first class interval.
* For each subsequent class, plot the calculated cumulative frequency against the upper class boundary.
2. Draw the Cumulative Frequency Diagram:
* Axes: The horizontal axis (x-axis) represents the variable (e.g., height, marks, time), using the upper class boundaries as plot points. The vertical axis (y-axis) represents the cumulative frequency, scaled from 0 to the total frequency (N).
* Plotting: Plot the points (Upper Class Boundary, Cumulative Frequency) from your table. Remember to include the starting point (Lower Boundary of first class, 0).
* The Curve: Join the plotted points with a smooth, continuous curve. The resulting S-shape is the ogive.
### Interpreting the Cumulative Frequency Diagram
The diagram is used to find estimates for measures of spread:
* Median (Q2): The middle value of the data (the 50th percentile). To find it, calculate N/2. Locate this value on the y-axis, draw a horizontal line across to the curve, and then a vertical line down to the x-axis. The value you read on the x-axis is the estimated median.
* Lower Quartile (Q1): The value below which 25% of the data lies (the 25th percentile). Calculate N/4. Follow the same read-off process as for the median to find the estimated Q1.
* Upper Quartile (Q3): The value below which 75% of the data lies (the 75th percentile). Calculate 3N/4. Follow the read-off process to find the estimated Q3.
* Interquartile Range (IQR): This measures the spread of the middle 50% of the data and is less affected by outliers than the range. The formula is: IQR = Q3 – Q1. You calculate this using the values you estimated from the curve.
* Percentiles: You can find any percentile. For the *k*th percentile, calculate (k/100) * N on the y-axis and read the corresponding value from the x-axis.
### Box-and-Whisker Plots
A box-and-whisker plot (or box plot) is a concise graphical representation of the data's distribution based on five key values.
The Five-Number Summary:
Constructing a Box Plot:
Interpreting a Box Plot:
* The box represents the middle 50% of the data (the IQR).
* The whiskers represent the lower 25% and the upper 25% of the data.
* The total length of the plot shows the range of the data.
* A wider box indicates a greater spread (less consistency) in the central half of the data.
* The position of the median line indicates skewness. If it is not in the centre of the box, the distribution is skewed.
Key Points to Remember
- 1Cumulative frequency is the running total of frequencies, plotted against upper class boundaries.
- 2The cumulative frequency curve (ogive) must start at the lower boundary of the first class with a frequency of 0.
- 3The Median (Q2) is estimated at the 50th percentile (N/2) mark on the cumulative frequency axis.
- 4The Interquartile Range (IQR = Q3 - Q1) measures the spread of the middle 50% of the data.
- 5A Box Plot is a visual representation of the five-number summary: Minimum, Q1, Median, Q3, and Maximum.
- 6Each of the four sections of a box plot (two whiskers, two parts of the box) contains 25% of the data.
- 7Values read from a cumulative frequency diagram for grouped data are always estimates.
- 8Box plots are excellent for comparing the distribution and spread of two or more datasets on the same scale.
Pakistan Example
Analysis of Daily Temperatures in Lahore
Suppose the daily maximum temperatures (°C) in Lahore were recorded for 60 days in summer, and the data is grouped into a frequency table (e.g., 30-34°C, 35-39°C, etc.). Students can first create a cumulative frequency table and then draw the curve. Using this curve, they can estimate the median daily temperature, find the interquartile range to understand the consistency of the heat, and determine the number of days the temperature exceeded a critical value, like 42°C. Finally, they can construct a box-and-whisker plot to visually summarise the temperature distribution, showing the coolest day, the hottest day, and the range of typical summer temperatures in the city. This provides a practical, climate-related application of the statistical tools.
Quick Revision Infographic
Mathematics — Quick Revision
Cumulative Frequency & Box Plots
Key Concepts
Formulas to Know
Interquartile Range (IQR = Q3 - Q1) measures the spread of the middle 50% of the data.Analysis of Daily Temperatures in Lahore
Suppose the daily maximum temperatures (°C) in Lahore were recorded for 60 days in summer, and the data is grouped into a frequency table (e.g., 30-34°C, 35-39°C, etc.). Students can first create a cumulative frequency table and then draw the curve. Using this curve, they can estimate the median daily temperature, find the interquartile range to understand the consistency of the heat, and determine the number of days the temperature exceeded a critical value, like 42°C. Finally, they can construct a box-and-whisker plot to visually summarise the temperature distribution, showing the coolest day, the hottest day, and the range of typical summer temperatures in the city. This provides a practical, climate-related application of the statistical tools.