Randomising Test and Control Groups with R

Giulia Panozzo
Jun 30
3 min read

Updated: 2 days ago

If you've followed any of my SEO testing masterclasses (you can watch for free the one I did in March for Sitebulb here), you will know that I am big on keeping things neat, and one essential step of this is making sure that test and control groups are comparable, so that we can have more confidence in the results of the test.

One of the easiest methods to define test and control groups is randomisation, which allows you to have a very similar baseline for each group.

For small tests and websites, you can do this manually by getting the past 12 months performance data across the page type that you want to test (depending on how much your industry is affected by seasonality) and then manually split the entries into Test and Control, by ensuring that the final groups have similar average traffic (or any other metric you are expecting the test to affect as a result of your treatment):

Test and Control group randomisation done manually

However, this is a lengthy process, especially when we want to test larger groups. That's when R can come to the rescue and allow us to efficiently split Test and Control groups without having to manually review each page.

Here's a step-by-step guide on how to do so:

1.Download R and R Studio

If it's your first time with it, download R for free and then download RStudio to be able to run the script we need. This will also come in handy if you ever want to get into Causal Impact Analysis, for which I wrote a full guide on the WTS Knowledge Hub.

Once you’re done, open the R Studio environment to find a screen that looks like this:

To use the script you likely won't need to install any additional packages, so you should be ready to go right away.

2.Prepare your raw file

This is the document that the script will analyse to select and randomise pages into test and control groups.

For example, if I want to match by average position in the last 12 months, I will need to obtain that data from Google Search Console or any other tool you might want to use as a reference, and export that file to excel.

3.Input the randomisation script

This script includes a series of commands that define

The two groups
Their size
The fact they need to have the same average position (in my example), so they’re comparable for testing:

#Load the readxl package to read Excel files
library(readxl)

#Read the Excel file "test.xlsx" from the "Documents" folder into a dataframe called 'data'
data <- read_excel("Documents/test.xlsx")

#Optional: view the dataset in RStudio's data viewer
View(data)

#Set a seed to make the random sampling reproducible (you can change the number if needed)
set.seed(123)

#Define the number of participants per group (floor is used to ensure it's an integer)
group_size <- floor(nrow(data) / 2)

#Shuffle all row indices randomly
shuffled_indices <- sample(nrow(data))

#Assign the first half of the shuffled data to group1
group1 <- data[shuffled_indices[1:group_size], ]

#Assign the remaining data to group2
group2 <- data[shuffled_indices[(group_size + 1):nrow(data)], ]

#Calculate the average value of the 'position' column for each group
avg_traffic_group1 <- mean(group1$position, na.rm = TRUE)
avg_traffic_group2 <- mean(group2$position, na.rm = TRUE)

#Save the two groups as separate CSV files in the "Documents/test" folder
write.csv(group1, "~/Documents/test/group1.csv", row.names = FALSE)
write.csv(group2, "~/Documents/test/group2.csv", row.names = FALSE)

The end command ("write.csv" brings back the groups in separate CSV files directly in the same folder as your original document.

And now you're ready to apply the changes to your test group and keep track of performance.

Some notes on the script:

The script assumes that the position column is numeric and may contain some N/A entries. na.rm = TRUE handles that.
If test.xlsx is not in the exact path "Documents/test.xlsx", make sure to adjust it with the correct path.
The folder "~/Documents/test/" must exist before writing the CSV files. You can create it in R with this command:

dir.create("~/Documents/test", recursive = TRUE)

If you decide to give it a go, give me a tag on LinkedIn!

Happy testing!