A quick guide to descriptive statistics (with examples)
The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.
Descriptive statistics are headline or summary statistics that illustrate the key themes and findings of a data set. Along with inferential statistics, they effectively provide statistical data to the reviewer. Understanding what these statistics are and knowing how to use them appropriately is key to handling and applying statistical data correctly. In this article, we outline the essentials of these summary statistics, including types and the differences between descriptive and inferential statistics.
What are descriptive statistics?
As their name suggests, descriptive statistics describe data, making it easy to understand, interpret and use. They are most effective where there are obvious patterns in the dataset, and researchers typically present several types to provide a complete picture of the data summary. These summary statistics are important because they present extensive sets of data that the reader can quickly understand and use.
Using descriptive and other summary statistics
These statistics are most effective as a simple and meaningful summary of a sample that shares the investigator's observations. The observations communicated with descriptive summary statistics may be quantitative or visual, with graphical representation. They usually form the prelude to a more in-depth statistical analysis, but the trends an investigator identified may be so clear that they suffice as an analysis of the data.
Related: How to become an actuary
Are these statistics the same as inferential statistics?
With inferential statistics, researchers take a sample of data from the larger population and use it to make inferences. Summary statistics and inferential statistics differ significantly in their purposes. The key differences between descriptive and inferential statistics are:
You can only use a descriptive statistic with the data set it refers to.
Such summary statistics limit you to summarising and presenting information that helps you find patterns among the data.
You use this type of statistic mainly for quantitative analysis.
Inferential statistics facilitate further analysis of data such as predictions and inferences.
You can apply inferential statistics to a general population.
Inferential statistics are reliant on probability theory.
4 key types of descriptive statistics
There are four primary types of summary statistics. You can use them to summarise trends in data such as averages and variability. Here are the four classes:
1. Measures of frequency
Frequency is the number of times a value occurs within a dataset. You can use measures of frequency to simplify and summarise data by counting how often the individual variables within a dataset occur. Investigators can take a disorganised set of data, group all similar values together and graphically display these frequencies in a table, graph or chart.
2. Measures of central tendency
Central tendency refers to the clustering of data around a central value within the set. Measures of central tendency use a single value to represent the middle of a set of data. The three measures are mean, median and mode, as explained below:
The mean is a measure of central tendency that finds the sum of all the numbers in a data set and then divides it by the total amount of numbers. The primary advantage is that it considers all figures in a data set, but it is important to know that the mean is susceptible to outliers.
The median is the midrange value within a data set. You can identify it by organising all the values in a data set in numerical order from least to greatest and then finding the number in the middle of the set. The primary advantage is that outliers affect it less than the mean. If a data set has an odd number of values, the median is the number in the middle, but if there is an even number of values, you average the two values in the middle to calculate the median.
The mode is the value that has the highest frequency in a data set. You can find the mode by grouping all matching values together and selecting the group that has the most values. The mode allows you to compare values that are both numerical and nominal, such as colours or shapes, whereas the mean and median can only compare numerical values.
3. Measures of dispersion or variation
Measures of dispersion or variation determine the spread of values across a data set, known as variability. Statisticians use measures of spread when they want to show how spread out a data set is. There are three measures of dispersion or variation: range, variance and standard deviation. Here are details of these measures of dispersion:
The range measures the difference between the highest and lowest values in a data set. You can calculate the range by subtracting the lowest value in a data set from the highest one. The range provides an indication of the variability of a data set, especially if there aren't significant outlier values.
Standard deviation measures the amount of dispersion or variance between the mean of a set of data and each individual value within the data set. Another way of explaining standard deviation is that it measures how much the data deviates from its mean. A low standard deviation means that the values in the data set are close to the mean, while a high standard deviation shows that the values vary across a wide range.
Variance is a statistical measurement of the average distance between each value and the mean, showing how spread out the values are. It focuses on the relationships between individual values within the data set rather than grouping numbers into quartiles and treats all deviations from the mean equally, regardless of direction. You can calculate the variance of a data set by finding the square of the standard deviation.
4. Measures of position
These measures determine the position of individual values in relation to the other values within a data set. Outlier values affect them less than the mean and standard deviation. You can group values within your data into a variety of quantiles to see where certain percentages of your data fall. You can arrange a summary of data using measures of position in quantile ranks explained here:
Quantiles divide the values within a data set into contiguous intervals. You can divide data in two around its median, into four equal parts creating quartiles or into hundredths to create percentiles. Within quantiles, the median is the 50th percentile. Data scientists may also summarise data by grouping it into 10 equal parts called deciles or five equal parts called quintiles.
Examples of descriptive statistics
To help you understand how you can use different classes of these summary statistics, here are some examples:
An example of measures of frequency
A class of 25 students had the following grades: A, F, B, B, D, B, C, A, F, A, D, A, A, C, D, C, A, A, A, A, A, A, B, F and B. To show the number of students who earned each different letter grade score on a test, you can use a table to represent the frequency of these scores from A to F.
The frequency table for these scores would look like this:
GradeNumber of studentsA11B5C3D3F3
An example of using mean, median and mode
Here is the salary information for nine professionals with an identical job title and description: £38k, £41k, £45k, £43k, £47k, £50k, £55k, £15k and £75k.
You can calculate the mean of this set by adding the salaries together (£409k) and dividing this by the total number of values (9), which is £45.4k.
You can find the median salary by organising the values in your data set from the least to greatest (£15k, £38k, £41k, £43k, £45k, £47k, £50k, £55k and £75k). The median value is the salary in the middle of the data set, ¬£45k.
The mode salary would be the value that appears the most frequently. In this data set, all the salaries are unique, so there is no mode.
An example of range
Here is a set of test scores from a cohort of 15 job candidates: 64, 76, 42, 55, 87, 99, 92, 100, 73, 56, 99, 98, 100, 88 and 93. You can calculate the range of these test scores by subtracting the lowest test score from the highest one (100−42). The range of these test scores is 58.
Explore more articles
- What is TOGAF? Specialisations, certifications and job roles
- 9 workplace collaboration benefits (with types and tips)
- What is assurance in accounting? (Components and example)
- Assets vs. liabilities: definition, differences and examples
- What is brand consistency? (Benefits and common strategies)
- What is agile project management? (Everything you need to know)
- 11 ways to increase organic growth in business (with tips)
- Reciprocal teaching: what is it, and what are its benefits?
- What is retail management? (Plus importance and steps)
- 12 types of power for effective leaders (And how to gain it)
- What is cost cutting? A guide to cut costs more effectively
- 36 insightful and inspirational quotes on growth in business