How can one draw a conclusion from statistical data that holds true across all members of a given population? This is where sampling comes in! It entails collecting information from a smaller group that represents a larger population, so that we can draw insights about the population as a whole. In this post, we will take a look at Simple Random Sampling (SRS), one of the most common sampling techniques and discuss its strengths and weaknesses.
Imagine that you are a researcher trying to study how effective a new drug is at combating COVID-19. You can't possibly administer the drug to every single person suffering from the disease. You would give the drug to a few people and measure its effectiveness. By studying the results of this sample, you can draw inferences about how effective the drug will be on the population as a whole.
There are many ways we can select a sample from a population, such as simple random sampling, stratified sampling, cluster sampling, systematic sampling and convenience sampling. In this blog, we will focus on Simple Random Sampling (SRS).
What is Simple Random Sampling?
Simple Random Sampling (SRS) is a method that selects a subset of a population in such a way that each member of the population has an equal chance, or probability, of being chosen.
How to conduct Simple Random Sampling?
Simple random sampling is a straightforward process which includes the following steps:
1. Define the target population:
The target population or population of interest is the entire group a researcher or statistician wants to draw conclusions about. For example, if epidemiologists want to study the prevalence of Covid-19 in the US, the target population would all people residing in the United States. However, if the researchers are interested in studying the prevalence of Covid-19 in vaccinated individuals in US, then the population of interest would be all individuals residing in US who have received the Covd-19 vaccine.
2. Determine the population size:
If the population is finite or well-defined, the population size can be easily determined either by counting the number of members in the population or obtaining data from a reliable source that tracks the population. For example, if the population of interest is the employees in Google, we can just count the number of employees or get the information from HR. For very large target populations, such as all people residing in the US, we can get the information from the US Census Bureau, which estimates the US population to be around 334 million as of Jan 2023.
If the population is not finite or ill-defined, it becomes tricky to estimate the population size, as we cannot directly count the number of members. For instance, if the population is defined by subjective characteristics, such as all people who like chocolate, or all people who use a particular app, we can use surveys or polls to estimate the population. We can also take a smaller sample and then extrapolate to estimate the larger population size using a statistical formula.
3. Assign a number to each member of the population:
The next step is to assign a unique identifier, typically a number, to each member of the population. This can become challenging for very large populations such as all people residing in the US.
4. Generate a random sample:
There are many different ways to generate a random sample:
Physical Methods
If the population size is small, physical methods such as rolling a dice, tossing a coin, spinning a wheel or pulling chits out of a container such as a hat, box or bag can be used to generate a random sample.
Random Number Table
A random table is a list or table of numbers (usually 5 digits) which are not arranged in any defined order. It is designed to ensure that the numbers 0 to 9 have an equal chance of appearing in any position in the table. An example of a random table is given below.
Suppose, we wanted to generate 5 random samples from a population of 50. We would start at a random position in the table by pointing a finger while closing your eyes. Let's say that you point at 17614 in row 4 and column 5. Since the population is 50, we would need to look at two digit numbers within the range 01 to 50. We can move in any predetermined direction, left to right or top to bottom. For our example, let's move from left to right. The first number is 17. Since it is in the range, we keep it. The next two-digit number to the right is 61. Since it is greater than 50, we discard it. The next numbers are 45 and 07, which we keep. We discard 93, keep 05, discard 96 and keep 29. So, in our case, the 5 random samples are 17, 45, 07, 05 and 29. These numbers are matched with the numbers assigned to the members of the population and these members are included in the study.
Computer Programs
Many programing languages such as R, python, MATLAB and software suites such as SAS, SPSS, Stata can be used to generate a random sample. Although computers generate "pseudo-random" numbers because they use mathematical formulas in their algorithms, these numbers are sufficiently random for most practical purposes.
Advantages of Simple Random Sampling
1. It is easy to understand and implement SRS.
2. As each member of the population has an equal chance of being selected, SRS ensures that the selected sample is unbiased.
3. The sample is representative of the population.
Limitations of Simple Random Sampling
1. In some cases, it may be impossible to get the list of the entire population of interest.
2. SRS is impractical if the population size is very large or dispersed over a wide geographical area.
3. It is time-consuming to assign numbers to each member of the population, especially if it is large.
4. SRS is not the best choice if the population is not homogenous. For heterogenous populations, where the subsets need to be represented proportionally, other methods of sampling such as stratified random sampling or cluster sampling might be more effective.
Key Takeaways
1. Simple Random Sampling is a simple powerful tool that researchers can use to generate unbiased, representative samples of a population.
2. Each member of the population has an equal probability of being selected.
3. SRS involves defining the population, determining its size, assigning each member of the population a unique identifier, and randomly selecting members for the sample.
4. Several methods such as physical, random number table and computer programs can be used to generate a random sample.
5. Although there are many advantages of using SRS, it is not appropriate for very large, geographically dispersed or heterogenous populations.
Commentaires