Mathematics (4024)
Topic 18 of 18Cambridge O Levels

Cumulative Frequency & Box Plots

Visually represent data distribution using cumulative frequency curves and box plots.

### Introduction to Cumulative Frequency


Cumulative frequency is a 'running total' of frequencies. It is used for grouped data to understand how many data points fall below a certain value. By plotting this information, we can create a cumulative frequency diagram, also known as an ogive, which is an S-shaped curve. This powerful tool allows us to estimate key statistical measures like the median, quartiles, and percentiles, providing a deep insight into the data's distribution.


### Constructing a Cumulative Frequency Table and Diagram


1. Create the Cumulative Frequency Table:

Start with a standard grouped frequency table. To prepare for plotting, you need to add two new columns:

* Upper Class Boundary: This is the upper limit of each class interval, which will be used for the x-axis. For continuous data, this is straightforward. For discrete data (like scores), you might need to find the midpoint between the upper limit of one class and the lower limit of the next (e.g., if a class is 11-20 and the next is 21-30, the upper boundary for the first class is 20.5).

* Cumulative Frequency: Calculate this by adding the frequency of each class to the total of the frequencies of the classes before it. The final cumulative frequency must equal the total number of data points (N).


Process:

* Start with a cumulative frequency of 0 at the lower boundary of the first class interval.

* For each subsequent class, plot the calculated cumulative frequency against the upper class boundary.


2. Draw the Cumulative Frequency Diagram:

* Axes: The horizontal axis (x-axis) represents the variable (e.g., height, marks, time), using the upper class boundaries as plot points. The vertical axis (y-axis) represents the cumulative frequency, scaled from 0 to the total frequency (N).

* Plotting: Plot the points (Upper Class Boundary, Cumulative Frequency) from your table. Remember to include the starting point (Lower Boundary of first class, 0).

* The Curve: Join the plotted points with a smooth, continuous curve. The resulting S-shape is the ogive.


### Interpreting the Cumulative Frequency Diagram


The diagram is used to find estimates for measures of spread:


* Median (Q2): The middle value of the data (the 50th percentile). To find it, calculate N/2. Locate this value on the y-axis, draw a horizontal line across to the curve, and then a vertical line down to the x-axis. The value you read on the x-axis is the estimated median.


* Lower Quartile (Q1): The value below which 25% of the data lies (the 25th percentile). Calculate N/4. Follow the same read-off process as for the median to find the estimated Q1.


* Upper Quartile (Q3): The value below which 75% of the data lies (the 75th percentile). Calculate 3N/4. Follow the read-off process to find the estimated Q3.


* Interquartile Range (IQR): This measures the spread of the middle 50% of the data and is less affected by outliers than the range. The formula is: IQR = Q3 – Q1. You calculate this using the values you estimated from the curve.


* Percentiles: You can find any percentile. For the *k*th percentile, calculate (k/100) * N on the y-axis and read the corresponding value from the x-axis.


### Box-and-Whisker Plots


A box-and-whisker plot (or box plot) is a concise graphical representation of the data's distribution based on five key values.


The Five-Number Summary:

  • Minimum Value: The smallest value in the dataset.
  • Lower Quartile (Q1): Estimated from the cumulative frequency curve.
  • Median (Q2): Estimated from the cumulative frequency curve.
  • Upper Quartile (Q3): Estimated from the cumulative frequency curve.
  • Maximum Value: The largest value in the dataset.

  • Constructing a Box Plot:

  • Draw a suitable number line (scale) that covers the entire range of your data from the minimum to the maximum value.
  • Draw a box with its left edge at Q1 and its right edge at Q3.
  • Draw a vertical line inside the box to mark the Median (Q2).
  • Draw a horizontal line (a 'whisker') from the left edge of the box (Q1) to the Minimum Value.
  • Draw another whisker from the right edge of the box (Q3) to the Maximum Value.

  • Interpreting a Box Plot:

    * The box represents the middle 50% of the data (the IQR).

    * The whiskers represent the lower 25% and the upper 25% of the data.

    * The total length of the plot shows the range of the data.

    * A wider box indicates a greater spread (less consistency) in the central half of the data.

    * The position of the median line indicates skewness. If it is not in the centre of the box, the distribution is skewed.

    Key Points to Remember

    • 1Cumulative frequency is the running total of frequencies, plotted against upper class boundaries.
    • 2The cumulative frequency curve (ogive) must start at the lower boundary of the first class with a frequency of 0.
    • 3The Median (Q2) is estimated at the 50th percentile (N/2) mark on the cumulative frequency axis.
    • 4The Interquartile Range (IQR = Q3 - Q1) measures the spread of the middle 50% of the data.
    • 5A Box Plot is a visual representation of the five-number summary: Minimum, Q1, Median, Q3, and Maximum.
    • 6Each of the four sections of a box plot (two whiskers, two parts of the box) contains 25% of the data.
    • 7Values read from a cumulative frequency diagram for grouped data are always estimates.
    • 8Box plots are excellent for comparing the distribution and spread of two or more datasets on the same scale.

    Pakistan Example

    Analysis of Daily Temperatures in Lahore

    Suppose the daily maximum temperatures (°C) in Lahore were recorded for 60 days in summer, and the data is grouped into a frequency table (e.g., 30-34°C, 35-39°C, etc.). Students can first create a cumulative frequency table and then draw the curve. Using this curve, they can estimate the median daily temperature, find the interquartile range to understand the consistency of the heat, and determine the number of days the temperature exceeded a critical value, like 42°C. Finally, they can construct a box-and-whisker plot to visually summarise the temperature distribution, showing the coolest day, the hottest day, and the range of typical summer temperatures in the city. This provides a practical, climate-related application of the statistical tools.

    Quick Revision Infographic

    Mathematics — Quick Revision

    Cumulative Frequency & Box Plots

    Key Concepts

    1Cumulative frequency is the running total of frequencies, plotted against upper class boundaries.
    2The cumulative frequency curve (ogive) must start at the lower boundary of the first class with a frequency of 0.
    3The Median (Q2) is estimated at the 50th percentile (N/2) mark on the cumulative frequency axis.
    4The Interquartile Range (IQR = Q3 - Q1) measures the spread of the middle 50% of the data.
    5A Box Plot is a visual representation of the five-number summary: Minimum, Q1, Median, Q3, and Maximum.
    6Each of the four sections of a box plot (two whiskers, two parts of the box) contains 25% of the data.

    Formulas to Know

    Interquartile Range (IQR = Q3 - Q1) measures the spread of the middle 50% of the data.
    Pakistan Example

    Analysis of Daily Temperatures in Lahore

    Suppose the daily maximum temperatures (°C) in Lahore were recorded for 60 days in summer, and the data is grouped into a frequency table (e.g., 30-34°C, 35-39°C, etc.). Students can first create a cumulative frequency table and then draw the curve. Using this curve, they can estimate the median daily temperature, find the interquartile range to understand the consistency of the heat, and determine the number of days the temperature exceeded a critical value, like 42°C. Finally, they can construct a box-and-whisker plot to visually summarise the temperature distribution, showing the coolest day, the hottest day, and the range of typical summer temperatures in the city. This provides a practical, climate-related application of the statistical tools.

    SeekhoAsaan.com — Free RevisionCumulative Frequency & Box Plots Infographic

    Test Your Knowledge!

    5 questions to test your understanding.

    Start Quiz