Statistics

KS3

MA-KS3-D009

Collecting, representing, interpreting and comparing data using appropriate graphical representations and measures

National Curriculum context

Statistics at KS3 develops pupils' ability to work with data — collecting, organising, representing, analysing and interpreting it to answer questions and make inferences. Pupils extend their primary experience of simple charts and averages to a full range of graphical representations including scatter graphs, histograms and cumulative frequency, and to formal measures of average (mean, median, mode) and spread (range, quartiles). The statutory curriculum requires pupils to describe, interpret and compare distributions, and to begin reasoning about the relationship between two variables using correlation. Statistical reasoning is foundational to science, geography, economics and social sciences, and the curriculum prepares pupils for quantitative work across all these disciplines.

6

Concepts

3

Clusters

3

Prerequisites

6

With difficulty levels

AI Direct: 6

Lesson Clusters

1

Calculate and interpret measures of central tendency and spread

introduction Curated

Mean, mode, median and range (with outliers) are tightly co-taught (C084 lists C083). These foundational summary statistics must be established before data representation work.

2 concepts Patterns
2

Represent and compare data distributions using appropriate charts

practice Curated

Data distribution, data representation (bar charts, pie charts, pictograms) and grouped data are co-taught (C085 lists C082, C083, C086; C084 lists C082, C083, C086). This is the data representation and analysis cluster.

3 concepts Patterns
3

Describe relationships between two variables using scatter graphs

practice Curated

Bivariate data and scatter graphs represent a distinct statistical idea (correlation between two variables) that is meaningfully different from univariate distribution work.

1 concepts Patterns

Prerequisites

Concepts from other domains that pupils should know before this domain.

Domain Vocabulary

65 terms across 6 concepts (65 domain-specific)(11 shared)

Domain-specific (65)
Concept
T3

average(noun)

A single value that represents a typical or central value of a data set; usually refers to the mean.

T3

axis(noun)

A reference line on a graph or chart used for plotting data; the horizontal is the x-axis, vertical is the y-axis.

T3

bar chart(noun)

A graph that uses rectangular bars of different heights to compare quantities across categories.

Shared by 2 concepts

T3

bivariate data(noun)

Data that involves two variables measured on the same set of items, often displayed on a scatter graph.

T3

boundary(noun)

The outer edge or perimeter of a shape.

T3

categorical(adjective)

Data that can be sorted into named groups or categories rather than measured numerically.

Shared by 2 concepts

T3

central tendency(noun)

A statistical measure representing the centre or typical value of a data set (mean, median, or mode).

Shared by 2 concepts

T3

class interval(noun)

A range of values used to group continuous data in a frequency table or histogram.

T3

class width(noun)

The range covered by a single class interval, found by subtracting the lower boundary from the upper boundary.

T3

compare(verb)

To look at two or more numbers or objects to find which is bigger, smaller, longer, shorter, etc.

T3

consistency(noun)

Producing the same or similar results repeatedly; low spread in data.

T3

continuous(adjective)

Data that can take any value within a range, not just whole numbers; measured rather than counted.

Shared by 2 concepts

T3

correlation(noun)

A statistical relationship between two variables shown on a scatter graph; can be positive, negative, or none.

T3

data(noun)

Information collected and recorded, often as numbers, that can be sorted, compared, and displayed.

T3

discrete(adjective)

Data that can only take specific distinct values, usually whole numbers; counted rather than measured.

T3

distribution(noun)

How data values are spread out or arranged across a range, visible in graphs or frequency tables.

Shared by 2 concepts

T3

estimated mean(noun)

An approximation of the mean calculated from grouped data using midpoints of class intervals.

T3

extrapolate(verb)

To estimate a value beyond the range of known data by extending an observed trend.

T3

frequency(noun)

The number of times a particular value or event occurs in a set of data.

Shared by 4 concepts

T3

frequency density(noun)

Frequency divided by class width; used as the y-axis in histograms with unequal class widths.

T3

frequency table(noun)

A table showing how often each value or range of values occurs in a data set.

T3

grouped data(noun)

Data organised into class intervals rather than listed as individual values.

T3

highest value(noun)

The maximum or largest value in a data set.

T3

histogram(noun)

A graph for continuous data where bars have no gaps and the area of each bar represents the frequency.

Shared by 2 concepts

T3

interpolate(verb)

To estimate a value between two known data points on a graph by reading from the line.

T3

interquartile range(noun)

The difference between the upper quartile (Q3) and lower quartile (Q1); measures the spread of the middle 50% of data.

T3

iqr(noun)

Abbreviation for interquartile range: Q3 minus Q1.

T3

key(noun)

A legend on a pictogram or chart explaining what each symbol represents.

T3

label(noun)

Words or symbols added to a graph, diagram, or shape to identify parts and make it easier to read.

T3

line graph(noun)

A graph that uses points connected by lines to show how data changes over time or another continuous variable.

T3

line of best fit(noun)

A straight line drawn through the middle of data points on a scatter graph, showing the general trend.

T3

lowest value(noun)

The minimum or smallest value in a data set.

T3

mean(noun)

A type of average found by adding all values in a data set and dividing by the number of values.

T3

median(noun)

The middle value when all data values are arranged in order from smallest to largest.

T3

middle value(noun)

The value at the centre of an ordered data set; another way of describing the median.

T3

midpoint(noun)

The exact middle point between two positions, values, or coordinates.

T3

mode(noun)

The value that appears most frequently in a data set.

T3

most common(phrase)

The value that occurs most frequently; another way of describing the mode.

T3

negative correlation(noun)

A relationship where one variable increases as the other decreases, shown by a downward trend on a scatter graph.

T3

no correlation(noun)

No apparent relationship between two variables; scattered points on a scatter graph with no trend.

T3

ordered(adjective)

Arranged in a specific sequence, usually from smallest to largest or vice versa.

T3

outlier(noun)

A data value that is significantly different from the rest of the data set.

Shared by 2 concepts

T3

pictogram(noun)

A chart that uses pictures or symbols to represent data, where each symbol may represent one or more items.

T3

pie chart(noun)

A circular chart divided into sectors where each sector represents a proportion of the whole data set.

T3

positive correlation(noun)

A relationship where both variables increase together, shown by an upward trend on a scatter graph.

T3

quartile(noun)

A value that divides ordered data into four equal parts: Q1 (25%), Q2/median (50%), Q3 (75%).

T3

range(noun)

The difference between the largest and smallest values in a data set, showing how spread out the data is.

T3

relationship(noun)

A connection between numbers, operations, or mathematical ideas.

T3

represent(verb)

To show or stand for a number, quantity, or idea using symbols, pictures, or objects.

T3

representative(adjective)

Typical of the whole group; a sample that fairly reflects the characteristics of the full data set.

T3

scale(noun)

The numbered markings on a measuring instrument or the axis of a graph, showing regular intervals.

T3

scatter graph(noun)

A graph plotting paired data as individual points to show the relationship between two variables.

T3

sector(noun)

A slice-shaped region of a circle, bounded by two radii and an arc.

T3

skew(noun)

When data distribution is not symmetrical; values are concentrated more on one side.

T3

spread(noun)

How widely data values are distributed; a data set with a large range has a wide spread.

Shared by 2 concepts

T3

stem-and-leaf(noun)

A data display where each value is split into a stem (leading digits) and leaf (final digit), preserving all data values.

T3

sum(noun)

The total when two or more numbers are added together.

T3

symmetrical(adjective)

Having one or more lines of symmetry; one half is a mirror image of the other.

T3

tally(noun)

A mark made to record counting, using groups of five (four vertical lines crossed by a diagonal).

Shared by 2 concepts

T3

title(noun)

A heading or label on a graph, table, or chart that describes what the data shows.

T3

total(noun)

The amount you get when everything is added together.

T3

typical(adjective)

Representative of the centre or majority of a data set; described by measures of central tendency.

Shared by 2 concepts

T3

variable(noun)

A letter or symbol that represents a quantity which can change or take different values.

T3

variation(noun)

How much data values differ from each other; measured by range, IQR, or standard deviation.

T3

vertical line chart(noun)

A chart using vertical lines (not bars) to show the frequency of discrete data values.

Concepts (6)

Data distribution

skill AI Direct

MA-KS3-C082

Describing and comparing distributions using appropriate graphs and measures

Teaching guidance

Teach pupils to select appropriate representations for different types of data: bar charts for categorical data, histograms for continuous data, line graphs for time series. Compare distributions using back-to-back stem-and-leaf diagrams or comparative bar charts. Focus on interpretation: what does the distribution tell us about the context? Practise describing distributions using measures of central tendency and spread. Include real-world datasets where pupils must choose how to represent and describe the data. Emphasise that different representations can tell different stories about the same data.

Vocabulary (15 terms)
bar chart T3 — A graph that uses rectangular bars of different heights to compare quantities across categories.
categorical T3 new — Data that can be sorted into named groups or categories rather than measured numerically.
central tendency T3 new — A statistical measure representing the centre or typical value of a data set (mean, median, or mode).
compare T3 — To look at two or more numbers or objects to find which is bigger, smaller, longer, shorter, etc.
continuous T3 — Data that can take any value within a range, not just whole numbers; measured rather than counted.
data T3 — Information collected and recorded, often as numbers, that can be sorted, compared, and displayed.
distribution T3 — How data values are spread out or arranged across a range, visible in graphs or frequency tables.
frequency T3 — The number of times a particular value or event occurs in a set of data.
histogram T3 new — A graph for continuous data where bars have no gaps and the area of each bar represents the frequency.
line graph T3 — A graph that uses points connected by lines to show how data changes over time or another continuous variable.
represent T3 — To show or stand for a number, quantity, or idea using symbols, pictures, or objects.
skew T3 new — When data distribution is not symmetrical; values are concentrated more on one side.
spread T3 — How widely data values are distributed; a data set with a large range has a wide spread.
stem-and-leaf T3 new — A data display where each value is split into a stem (leading digits) and leaf (final digit), preserving all data values.
symmetrical T3 new — Having one or more lines of symmetry; one half is a mirror image of the other.
Common misconceptions

Pupils often use bar charts for continuous data when a histogram would be more appropriate. The distinction between discrete and continuous data affects the choice of representation, but pupils commonly ignore this. When comparing distributions, pupils may only compare averages without considering spread. Some pupils think taller bars always mean 'better' without considering what the data represents.

Difficulty levels

Emerging

Can read information from a simple chart or table and describe a data set in general terms (e.g. 'most people chose blue').

Example task

This bar chart shows favourite colours in a class. Which colour was most popular? How many chose red?

Model response: Blue was most popular (12 students). 7 students chose red.

Developing

Compares two data sets using basic measures (mean, range) and appropriate graphical representations.

Example task

Class A's test scores have mean 65 and range 40. Class B has mean 62 and range 15. Compare the two classes.

Model response: Class A has a slightly higher average (65 vs 62) but much greater spread (range 40 vs 15). Class B is more consistent — most students scored near the average — while Class A has a wider spread of abilities.

Secure

Selects and constructs appropriate graphical representations for different data types and uses them to compare distributions.

Example task

You want to compare the heights of Year 7 boys and girls. Which type of graph would you use and why?

Model response: Dual bar charts or back-to-back stem and leaf diagrams would allow direct comparison. Box plots would be best for comparing the medians, quartiles and ranges of both distributions side by side. Histograms would show the shape of each distribution but are harder to compare directly.

Mastery

Describes, interprets and compares complex distributions using shape (skewness), outliers and multiple summary statistics.

Example task

Dataset A has median 45, mean 52, IQR 12. Dataset B has median 50, mean 50, IQR 20. Compare and interpret.

Model response: Dataset A: mean > median suggests positive skew (a few high values pull the mean up). The IQR of 12 indicates relatively consistent data. Dataset B: mean ≈ median suggests a roughly symmetric distribution. The IQR of 20 shows greater variability. Despite B having a higher median, A has less spread. The skew in A suggests a few exceptional high values — the median (45) is more representative of the typical value than the mean (52).

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.

Measures of central tendency

skill AI Direct

MA-KS3-C083

Understanding and calculating mean, mode and median

Teaching guidance

Teach all three averages (mean, median, mode) and when each is most appropriate. Use datasets where the three averages differ significantly to motivate the discussion. Calculate the mean from raw data and from frequency tables. Find the median by ordering data and locating the middle value (or mean of the two middle values for even datasets). Identify the mode as the most frequent value. Discuss which average best represents a dataset: the mean is affected by outliers, the median is resistant to outliers, the mode is useful for categorical data.

Vocabulary (13 terms)
average T3 — A single value that represents a typical or central value of a data set; usually refers to the mean.
central tendency T3 — A statistical measure representing the centre or typical value of a data set (mean, median, or mode).
frequency T3 — The number of times a particular value or event occurs in a set of data.
mean T3 — A type of average found by adding all values in a data set and dividing by the number of values.
median T3 — The middle value when all data values are arranged in order from smallest to largest.
middle value T3 — The value at the centre of an ordered data set; another way of describing the median.
mode T3 — The value that appears most frequently in a data set.
most common T3 new — The value that occurs most frequently; another way of describing the mode.
ordered T3 — Arranged in a specific sequence, usually from smallest to largest or vice versa.
representative T3 — Typical of the whole group; a sample that fairly reflects the characteristics of the full data set.
sum T3 — The total when two or more numbers are added together.
total T3 — The amount you get when everything is added together.
typical T3 new — Representative of the centre or majority of a data set; described by measures of central tendency.
Common misconceptions

Pupils commonly forget to order data before finding the median. When finding the mean from a frequency table, pupils may divide by the number of categories rather than the total frequency. Some pupils think there must always be exactly one mode, not recognising bimodal distributions or datasets with no mode. The belief that the mean is always the 'best' average persists despite counterexamples with skewed data.

Difficulty levels

Emerging

Can calculate the mean of a small data set by adding values and dividing by the number of values.

Example task

Find the mean of: 4, 7, 3, 8, 3.

Model response: Sum = 4 + 7 + 3 + 8 + 3 = 25. Number of values = 5. Mean = 25/5 = 5.

Developing

Calculates mean, median and mode from ungrouped data and knows when each is appropriate.

Example task

Find the mean, median and mode of: 3, 5, 7, 7, 8, 10, 15.

Model response: Mean = (3+5+7+7+8+10+15)/7 = 55/7 = 7.86 (2 d.p.). Median = 7 (middle value of 7 ordered values). Mode = 7 (appears twice). The median and mode both suggest 7 is typical, while the mean is pulled up by the outlier 15.

Secure

Calculates the mean from a frequency table and understands the effect of outliers on different averages.

Example task

Calculate the mean from this frequency table: Score 1 (freq 3), Score 2 (freq 5), Score 3 (freq 8), Score 4 (freq 4).

Model response: Total frequency = 3+5+8+4 = 20. Σfx = 1(3)+2(5)+3(8)+4(4) = 3+10+24+16 = 53. Mean = 53/20 = 2.65.

Mastery

Uses the mean to solve problems algebraically, including finding missing values and understanding the mean as a balance point.

Example task

The mean of five numbers is 8. Four of the numbers are 5, 7, 9, 11. Find the fifth number.

Model response: Sum of all five = 5 × 8 = 40. Sum of known four = 5+7+9+11 = 32. Fifth number = 40 - 32 = 8. The mean acts as a balance point — the total deviation above the mean equals the total deviation below.

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.

Measures of spread

knowledge AI Direct

MA-KS3-C084

Understanding range and consideration of outliers

Teaching guidance

Introduce range as the simplest measure of spread: range = highest value − lowest value. Discuss its limitations: it depends on only two values and is heavily affected by outliers. Identify outliers visually and using rules of thumb (e.g., values more than 1.5 × IQR beyond the quartiles, though this may be simplified for KS3). Use comparative datasets where two groups have the same mean but different spreads to show why a measure of spread is needed alongside an average. Introduce the concept of consistency — lower spread means more consistent data.

Vocabulary (12 terms)
consistency T3 new — Producing the same or similar results repeatedly; low spread in data.
distribution T3 — How data values are spread out or arranged across a range, visible in graphs or frequency tables.
highest value T3 new — The maximum or largest value in a data set.
interquartile range T3 new — The difference between the upper quartile (Q3) and lower quartile (Q1); measures the spread of the middle 50% of data.
iqr T3 new — Abbreviation for interquartile range: Q3 minus Q1.
lowest value T3 new — The minimum or smallest value in a data set.
outlier T3 — A data value that is significantly different from the rest of the data set.
quartile T3 new — A value that divides ordered data into four equal parts: Q1 (25%), Q2/median (50%), Q3 (75%).
range T3 — The difference between the largest and smallest values in a data set, showing how spread out the data is.
spread T3 — How widely data values are distributed; a data set with a large range has a wide spread.
typical T3 — Representative of the centre or majority of a data set; described by measures of central tendency.
variation T3 new — How much data values differ from each other; measured by range, IQR, or standard deviation.
Common misconceptions

Pupils often confuse range with the highest value rather than the difference between highest and lowest. Some pupils include outliers in range calculations without recognising their disproportionate effect. The concept of an outlier is often applied inconsistently — pupils may exclude values that are merely unusual rather than genuinely anomalous. Some think a larger range always means 'worse' data.

Difficulty levels

Emerging

Can calculate the range of a data set (largest minus smallest) and understands it measures how spread out the data is.

Example task

Find the range of: 3, 7, 2, 11, 5.

Model response: Range = 11 - 2 = 9.

Developing

Understands that range is sensitive to outliers and can compare data sets using both an average and the range.

Example task

Data set A: 5, 6, 6, 7, 7, 8. Data set B: 2, 6, 6, 7, 7, 50. Compare using mean and range.

Model response: A: mean = 6.5, range = 3. B: mean = 13, range = 48. The outlier (50) in B dramatically increases both the mean and range. The median would be a better comparison: both have median 6.5, showing the typical values are similar.

Secure

Calculates and interprets the interquartile range (IQR) as a measure of spread that is resistant to outliers.

Example task

Find the IQR of: 2, 3, 5, 7, 8, 9, 11, 13, 15.

Model response: Q1 = 4 (median of lower half: 2,3,5,7). Q3 = 12 (median of upper half: 9,11,13,15). IQR = Q3 - Q1 = 12 - 4 = 8. The IQR tells us the range of the middle 50% of the data.

Mastery

Uses IQR to identify outliers (values more than 1.5 × IQR from the quartiles) and evaluates which measure of spread is most appropriate for a given data set.

Example task

Data: 2, 5, 7, 8, 9, 10, 11, 12, 35. Identify any outliers using the IQR method.

Model response: Q1 = 6, Q3 = 11.5. IQR = 5.5. Lower fence: Q1 - 1.5(IQR) = 6 - 8.25 = -2.25. Upper fence: Q3 + 1.5(IQR) = 11.5 + 8.25 = 19.75. Value 35 > 19.75, so 35 is an outlier. Range = 33 (heavily influenced by the outlier). IQR = 5.5 (not affected). For this data, the IQR is a more representative measure of spread.

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.

Data representation

skill AI Direct

MA-KS3-C085

Constructing and interpreting frequency tables, bar charts, pie charts, pictograms

Teaching guidance

Build on primary experience with bar charts, pictograms and line graphs. Teach pupils to select appropriate representations for different data types: pictograms for small datasets, bar charts for categorical data, vertical line charts for discrete data, pie charts for proportional comparison. Emphasise the mechanics of construction: equal bar widths, appropriate scales, clear labels and titles. Include frequency table construction as a data-organising step before graphical representation. Critique real-world charts for misleading features (truncated axes, 3D effects).

Vocabulary (14 terms)
axis T3 — A reference line on a graph or chart used for plotting data; the horizontal is the x-axis, vertical is the y-axis.
bar chart T3 — A graph that uses rectangular bars of different heights to compare quantities across categories.
categorical T3 — Data that can be sorted into named groups or categories rather than measured numerically.
frequency T3 — The number of times a particular value or event occurs in a set of data.
frequency table T3 new — A table showing how often each value or range of values occurs in a data set.
key T3 — A legend on a pictogram or chart explaining what each symbol represents.
label T3 — Words or symbols added to a graph, diagram, or shape to identify parts and make it easier to read.
pictogram T3 — A chart that uses pictures or symbols to represent data, where each symbol may represent one or more items.
pie chart T3 — A circular chart divided into sectors where each sector represents a proportion of the whole data set.
scale T3 — The numbered markings on a measuring instrument or the axis of a graph, showing regular intervals.
sector T3 — A slice-shaped region of a circle, bounded by two radii and an arc.
tally T3 — A mark made to record counting, using groups of five (four vertical lines crossed by a diagonal).
title T3 — A heading or label on a graph, table, or chart that describes what the data shows.
vertical line chart T3 new — A chart using vertical lines (not bars) to show the frequency of discrete data values.
Common misconceptions

Pupils frequently use unequal bar widths or leave gaps between bars for continuous data. Pie chart construction errors include not measuring sectors accurately or not making sectors proportional to the data. Scale selection causes problems — choosing scales that either compress or stretch the data. Some pupils think pie charts show absolute values rather than proportions.

Difficulty levels

Emerging

Can read information from bar charts, pictograms and simple tables.

Example task

How many pupils chose chocolate in this pictogram? (Each symbol represents 4 pupils, chocolate has 3.5 symbols.)

Model response: 3.5 × 4 = 14 pupils chose chocolate.

Developing

Constructs bar charts, pie charts and frequency tables from raw data, choosing appropriate scales and labels.

Example task

Draw a bar chart for: Red 8, Blue 12, Green 5, Yellow 3.

Model response: I draw a bar chart with the y-axis from 0 to 14 (suitable scale), bars of equal width, gaps between bars, and labels on both axes. The bars have heights 8, 12, 5, 3.

Secure

Constructs and interprets pie charts (calculating sector angles), frequency polygons and dual bar charts.

Example task

In a survey, 90 people were asked their favourite sport: Football 35, Tennis 20, Swimming 25, Other 10. Draw a pie chart.

Model response: Total = 90. Football: (35/90) × 360° = 140°. Tennis: (20/90) × 360° = 80°. Swimming: (25/90) × 360° = 100°. Other: (10/90) × 360° = 40°. Check: 140+80+100+40 = 360° ✓.

Mastery

Selects the most appropriate representation for a given data type and purpose, and critically evaluates misleading graphs.

Example task

A newspaper shows a bar chart where the y-axis starts at 90 instead of 0, making a small difference between two political parties look dramatic. Explain the problem and draw a corrected version.

Model response: Starting the y-axis at 90 exaggerates the visual difference. If Party A has 95% and Party B has 92%, the truncated chart makes it look like A has roughly double B's support. The corrected chart with y-axis from 0 to 100 shows the bars are nearly the same height. This is a common way to mislead with data — the visual impression contradicts the numerical reality. Always check the axis scales before interpreting a graph.

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.

Grouped data

skill AI Direct

MA-KS3-C086

Working with discrete, continuous and grouped data

Teaching guidance

Introduce the distinction between discrete data (counted, specific values) and continuous data (measured, any value within a range). Show that continuous data must be grouped because individual values are unlikely to repeat. Teach frequency table construction with class intervals, emphasising that intervals must not overlap and must cover all values. Discuss the effect of different class widths on the appearance of the data. Calculate estimated means from grouped frequency tables using midpoints. Introduce histograms where frequency density is used for unequal class widths.

Vocabulary (12 terms)
boundary T3 — The outer edge or perimeter of a shape.
class interval T3 new — A range of values used to group continuous data in a frequency table or histogram.
class width T3 new — The range covered by a single class interval, found by subtracting the lower boundary from the upper boundary.
continuous T3 — Data that can take any value within a range, not just whole numbers; measured rather than counted.
discrete T3 — Data that can only take specific distinct values, usually whole numbers; counted rather than measured.
estimated mean T3 new — An approximation of the mean calculated from grouped data using midpoints of class intervals.
frequency T3 — The number of times a particular value or event occurs in a set of data.
frequency density T3 new — Frequency divided by class width; used as the y-axis in histograms with unequal class widths.
grouped data T3 new — Data organised into class intervals rather than listed as individual values.
histogram T3 — A graph for continuous data where bars have no gaps and the area of each bar represents the frequency.
midpoint T3 — The exact middle point between two positions, values, or coordinates.
tally T3 — A mark made to record counting, using groups of five (four vertical lines crossed by a diagonal).
Common misconceptions

Pupils commonly create overlapping class intervals (10-20, 20-30) rather than non-overlapping ones (10 ≤ x < 20, 20 ≤ x < 30). When calculating the estimated mean, pupils may use the lower boundary instead of the midpoint. The difference between a bar chart and a histogram (bar chart has gaps, histogram does not; histogram uses frequency density for unequal widths) is frequently confused.

Difficulty levels

Emerging

Understands the difference between discrete data (counted) and continuous data (measured) and can organise data into simple groups.

Example task

Classify each as discrete or continuous: (a) number of siblings (b) height in cm (c) shoe size (d) time to run 100m.

Model response: (a) Discrete (counted, whole numbers). (b) Continuous (measured, any value). (c) Discrete (set sizes: 3, 3.5, 4, ...). (d) Continuous (measured, any positive value).

Developing

Constructs grouped frequency tables with appropriate class intervals, ensuring no gaps or overlaps.

Example task

Group these test scores into a frequency table: 45, 52, 67, 71, 38, 59, 82, 63, 55, 74. Use groups 30-39, 40-49, etc.

Model response: 30-39: 1, 40-49: 1, 50-59: 3, 60-69: 2, 70-79: 2, 80-89: 1. Total: 10 ✓.

Secure

Estimates the mean from grouped frequency tables using midpoints and understands why it is an estimate.

Example task

Estimate the mean from: 0-10 (freq 4), 10-20 (freq 7), 20-30 (freq 5), 30-40 (freq 2).

Model response: Midpoints: 5, 15, 25, 35. Σfm = 4(5)+7(15)+5(25)+2(35) = 20+105+125+70 = 320. Σf = 18. Estimated mean = 320/18 = 17.8 (1 d.p.). This is an estimate because we assume all values in each group equal the midpoint, which may not be true.

Mastery

Works with grouped continuous data to estimate median, draw and interpret cumulative frequency curves and histograms.

Example task

From a cumulative frequency graph with total 80, estimate the median, Q1 and Q3.

Model response: Median at 80/2 = 40th value: read across from 40 on the y-axis to the curve, then down to the x-axis. Q1 at 80/4 = 20th value. Q3 at 3(80)/4 = 60th value. The positions are read from the smooth cumulative frequency curve, giving estimates not exact values.

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.

Bivariate data

skill AI Direct

MA-KS3-C087

Describing relationships between two variables using scatter graphs

Teaching guidance

Introduce scatter graphs using real data: height versus arm span, temperature versus ice cream sales, hours of study versus test score. Plot ordered pairs with the independent variable on the x-axis and the dependent variable on the y-axis. Discuss correlation: positive (both increase together), negative (as one increases the other decreases), and no correlation. Draw lines of best fit by eye, balancing points above and below the line. Use the line of best fit to make predictions and discuss the reliability of interpolation versus extrapolation.

Vocabulary (12 terms)
bivariate data T3 new — Data that involves two variables measured on the same set of items, often displayed on a scatter graph.
correlation T3 new — A statistical relationship between two variables shown on a scatter graph; can be positive, negative, or none.
extrapolate T3 — To estimate a value beyond the range of known data by extending an observed trend.
interpolate T3 — To estimate a value between two known data points on a graph by reading from the line.
line of best fit T3 new — A straight line drawn through the middle of data points on a scatter graph, showing the general trend.
negative correlation T3 new — A relationship where one variable increases as the other decreases, shown by a downward trend on a scatter graph.
no correlation T3 new — No apparent relationship between two variables; scattered points on a scatter graph with no trend.
outlier T3 — A data value that is significantly different from the rest of the data set.
positive correlation T3 new — A relationship where both variables increase together, shown by an upward trend on a scatter graph.
relationship T3 — A connection between numbers, operations, or mathematical ideas.
scatter graph T3 new — A graph plotting paired data as individual points to show the relationship between two variables.
variable T3 — A letter or symbol that represents a quantity which can change or take different values.
Common misconceptions

Pupils often confuse correlation with causation — just because two variables correlate does not mean one causes the other. When drawing lines of best fit, pupils may try to connect all the points rather than drawing a straight line that represents the overall trend. Some pupils think negative correlation means there is no relationship. Others place the line of best fit through the origin regardless of the data.

Difficulty levels

Emerging

Can plot points on a scatter graph when given paired data and describe the overall trend informally.

Example task

Plot these pairs on a scatter graph: (2,3), (4,5), (6,8), (8,9), (10,11). What do you notice?

Model response: The points roughly form an upward line. As x increases, y tends to increase too.

Developing

Describes correlation (positive, negative, none) and draws a line of best fit by eye.

Example task

A scatter graph shows that as temperature increases, ice cream sales increase. Describe the correlation.

Model response: There is a positive correlation — as temperature increases, ice cream sales tend to increase.

Secure

Uses a line of best fit to estimate values (interpolation) and understands the limitations of extrapolation.

Example task

Using a scatter graph and line of best fit, estimate the ice cream sales when the temperature is 25°C (within the data range) and 45°C (outside the data range). Comment on reliability.

Model response: At 25°C: reading from the line gives approximately 180 sales (interpolation — reliable, within data range). At 45°C: the line suggests about 320 sales, but this is extrapolation (outside the data range) and unreliable — there may be a ceiling effect or other factors that change the relationship at extreme temperatures.

Mastery

Critically evaluates bivariate data analysis, distinguishes correlation from causation, and identifies lurking variables.

Example task

A study finds a strong positive correlation between shoe size and reading ability in children aged 5-16. Does having big feet help you read better?

Model response: No — this is a classic spurious correlation. The lurking variable is age: as children get older, both their feet grow (shoe size increases) and their reading improves (more years of practice). Age causes both variables to increase, creating a correlation between them, but there is no causal link between shoe size and reading ability. To test for a genuine relationship, you would need to control for age (compare children of the same age).

Delivery rationale

Secondary maths concept — abstract, procedural, and objectively assessable.