A guide to probability sampling (with types and FAQs)

By Indeed Editorial Team

Published 5 July 2022

The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.

In statistical work and related fields, using random population samples is often an integral part of the process. This is to ensure that the results aren't skewed and are properly representative of the population in question. If you're interested in statistics or analysis, understanding sampling techniques is useful for generating quality data sets. In this article, we explain what probability sampling is, explore the different ways of doing it, discuss its various pros and cons and answer some frequently asked questions.

What is probability sampling?

Probability sampling is a sampling method whereby you get a random selection from a population. With this method, every member of the population typically has an equal probability of selection. This allows the sample to be more representative of the population and allows you to make more accurate conclusions about the population as a whole.

Quantitative research often uses this type of sampling to make sound statistical analyses. There are also different variants of this type of sampling, which can be more appropriate for situations where you have sub-groups within the population and want to represent them in the analysis.

Related: What is quantitative analysis? (With definitions and examples)

Types of probability sampling

There are multiple ways of getting a random sample from a population. Some common methods are as follows:

Simple random sample

The simple random sample method is the most straightforward as it's entirely random. Every member of the population has an equal chance of selection, regardless of the existence of any sub-groups or identifying characteristics. For instance, if you have a population of 10,000 and want to select a random sample of 500 individuals for the sake of statistical analysis, every member of the population has a one in 20 chance of selection.

There are two main methods of doing this. The first is a lottery system, which is typically better suited to smaller populations. In this method, you'd write the name of every individual on something like a piece of paper and put them all into a container. You then pull out the desired number at random. For larger populations, using software is typically preferable. With spreadsheet or similar software, you'd assign a number to each member of the population and use a random number generator (RNG) to select a random sample by number.

Related: Inferential statistics: definition, tips and applications

Random cluster sample

Random cluster sampling involves splitting the population in question into smaller groups based on criteria, typically geographical areas. Each of these groups is a 'cluster'. You'd then use simple random sampling to select individuals from each of these clusters. For instance, you might divide the population into clusters depending on the cities in which they live. You'd then use simple random sampling to get a certain number from each city. This can help to ensure greater geographical diversity in your sample, as larger population centres might be over-represented in a simple random sample of the population.

If you want to take population into account, you can develop a ratio so that you select more individuals from larger population centres while also getting a diverse geographical selection. For instance, you might have three cities with populations of 500,000, 2 million and 10 million. This is a ratio of 1:4:20. For every individual from the smallest city, you'd select 20 from the largest and four from the middle city.

Related: Structured vs. unstructured data: differences and uses

Stratified random sample

The stratified random sampling approach is similar to making clusters, but instead separates the population into groups based on certain characteristics. Examples characteristics include gender, ethnicity, income bracket, age or occupation. A key factor is that these characteristics have no overlap between them, meaning there's no chance of repeats. This method can be useful when you have significant variation in the representation of certain groups in the population. If individuals within larger groups are likely to return similar results, the results of the analysis might be less diverse and representative.

Just like cluster sampling, you can divide these into groups based on size to ensure fair representation for larger groups while also ensuring that under-represented groups enter the analysis. A technique of dividing the groups and developing a ratio can work well for this. Another name for stratified random sampling is 'random quota sampling'.

Systematic sample

The systematic sampling approach is the most similar to a simple random sample. It also has an equal opportunity of selection for every individual regardless of any groups or clusters. For a systematic sample, you select every 'nth' individual in the population. You can determine the value of 'n' depending on the size of the population and the sample size you want. For instance, you might have a population of 5,000 and want 250 individuals for your sample. By dividing 5,000 by 250, you get 20, which means you select every 20th individual in the population.

It can also be a good idea to randomly sort the population prior to applying systematic sampling. This ensures that any sorting based on characteristics doesn't affect the population sample you get. This is the only sampling technique on this list where a random number generator isn't going to be useful, as the others can use an RNG either for the entire population or within clusters and groups.

Pros and cons of probability sampling

The different techniques for generating random samples have some benefits and limitations, some of which are as follows:

Pros of random samples

Here are some advantages of using a random sampling method:

  • Simplicity: A major benefit of random samples is that they're quite easy to generate. Anyone with basic computer skills can generate a random sample in spreadsheet software without learning complex new skills.

  • Randomness: This type of sampling is among the best for getting a truly random selection. Randomness helps to ensure that any biases from the researcher don't affect the results of the analysis.

  • Representativeness: Since the sample is random, this typically means it's more representative of the population in question. Methods like clustered and stratified sampling are particularly effective for getting representative and diverse data.

  • Cost effectiveness: Due to their simplicity, random sampling techniques are quite cost-effective. A normal computer and an individual with basic computer skills can get a random sample easily, meaning there's no need for advanced hardware or highly skilled professionals.

Related: Top 9 basic computer skills and why they're useful

Cons of random samples

Here are some of the disadvantages of using a random sampling techniques:

  • Monotonous: For a statistician or analyst who regularly requires random samples for their work, the process can become quite repetitive. If this leads to boredom, they might become less diligent in their work and inadvertently skew the results.

  • Over-representativeness: If the researcher doesn't account for the sizes of groups within the population, they might inadvertently over-represent or under-represent certain groups. This can have a negative effect on the results of the analysis, and weighted sampling using ratios can become tedious when you have a large number of groups or clusters.

  • Random number generators: Simple random number generators in applications like spreadsheet software can often return repeat entries. This is more likely with smaller populations and can affect the results if a researcher doesn't notice it.

  • Group and cluster selection: Although the samples taken from each cluster or group are random, the selection of these subsets of the population isn't. The researcher might therefore skew the results by selecting these groups or clusters based on their own biases.

Related: 8 types of biases and how they may affect your thinking

Frequently asked questions

Here are some frequently asked questions about random sampling, together with their respective answers:

What's the difference between probability and non-probability sampling?

Whereas probability methods rely on randomness for quantitative analysis, non-probability techniques involve deliberately choosing certain individuals for the sake of the research. This is more common in qualitative research and researchers typically don't apply the results to the broader population. This makes it more appropriate for analysing smaller groups of individuals.

How can I generate random numbers in spreadsheet software?

Both Microsoft Excel and Google Sheets have functions for randomly generating numbers. For Excel, the ideal function to use is as follows:

=RANDBETWEEN(x,y)

The 'x' represents the lowest number in the range and the 'y' is the highest number in your population range. This gives you ten random numbers within the range you've chosen. For Sheets, you can use the same function as Excel. The only difference is that Sheets' version returns one result, so you'd repeat it several times. Both functions are 'volatile', which means that they can change their results when you make alterations to the sheet, so saving or copy-pasting the results as values is a good idea for retaining them.

Please note that none of the companies, institutions or organisations mentioned in this article are affiliated with Indeed.

Explore more articles