A/B Testing

Cook(s)

Rachel Schmidt

Tag(s)

Research & testing

Nutrition profile

A/B testing, also called split testing, is a research method that tests two or more versions of a design with users to understand which version performs best.

Cooking time

Varies by the scope of the project and how much data needs to be collected. Some tests take several weeks to run to collect enough data and understand user behavior, but other tests with a narrower and more focused scope can generate enough data within a few days or hours.

Perfect for

Choosing the best option between different design elements based on quantitative results rather than assumptions. A/B testing can be conducted throughout the design and prototype phases to improve and validate the usability of specific product features without needing to make significant changes to the overall design.

For example, the UX team at the University of Arizona Libraries tested the usability of a third-party website’s navigation by comparing versions of the design with and without the top navigation menu using A/B testing. We will reference this test example throughout the recipe.

Version A, with the top navigation

Version B, without the top navigation

Prep work

Decide what to test

Identify the problem to target in an existing design based on analytics and user feedback. Review the key elements that could be contributing to the issue, such as a page’s layout, navigation structure, or color choices, to understand which design choices may be causing the issue and prioritize features to test.

Develop a measurable hypothesis about the changes you’re testing. For example, when altering the third-party website’s top navigation menu, we might hypothesize, “Removing the navigation menu will decrease time spent on tasks by 15% because users won’t experience as much choice paralysis.”

Choose testing method

A/B testing can be completed in two ways:

Moderated testing: similar to usability testing, where you ask the participants to perform tasks in a prototype and observe their behavior. However, in A/B testing, you’ll test the same tasks with two or more designs.
Metric-based evaluation: launching two or more designs in your product, then monitoring the outcome, which might include number of visitors, time on task, or revenue, and deciding which one performs better.

This recipe will focus on the moderated testing approach. For metric-based evaluation, see our Web analytics recipe.

Prepare multiple designs

After deciding which feature to test, prepare two (or more) versions of a design that are identical except for this specific feature in order to isolate the variable you’re testing and clearly understand how each version affects user behavior.

In the example from the University of Arizona Libraries, the control variant, Version A, in the third-party website example maintains the top navigation menu consistent across all University of Arizona Libraries websites while the test variant, Version B, removes the top navigation menu to test the hypothesis that removing it will decrease time spent on tasks.

Write tasks and define success

Develop a set of tasks for participants to complete that are based on your goals and address your hypothesis, but aren’t biased towards a specific action.

When comparing two versions of the third-party website with and without a top navigation menu, we focused on tasks driven by our goal of understanding the usability of the site navigation without leading the user.

Try this:

✅ Find the list of research databases available through the library.

✅ Find the physical address of the Main Library.

Instead of:

❌ Find the list of research databases using the top navigation.

❌ Go to the footer to find the physical address of the library.

The poor examples encourage the participant to either use the top navigation bar or avoid it, leading the user to a specific result. The tasks we used allow the user to complete the tasks how they see fit, addressing our goal of understanding whether the top navigation bar is useful or not.

Identify a set of success metrics to note when participants are completing each task, including quantitative (task completion time or rate) and/or qualitative (user frustration or reactions).

Ingredients

A moderator and a note taker
Two versions of an interactive prototype design to test
Incentives for participants (e.g. gift cards, snacks)

Directions

Prepare testing environment

For in-person moderated sessions, set up your test in a space that is easy for participants to find with minimal distractions. For remote sessions, use a video conferencing tool such as Zoom that allows participants to share their screen.

Ensure both versions of the design are set up and working properly, whether they’re displayed on a live website or a prototyping tool. Use a tool to track user interactions and behaviors, such as a note taking sheet, screen recordings, or Google Forms.

Recruit and assign participants

For in-person sessions in high traffic areas, catch users passing by for short testing sessions. For remote sessions, attract participants through social media posts or sending out recruitment emails.

Randomly assign each participant to either Version A or Version B of your design, ensuring even distribution if possible.

Conduct the testing

At the beginning of each session, gather background information from the participant and introduce the study. Outline any guidelines and ask for consent to record, if using Zoom for the session. Inform the participant of the context of the study, but avoid mentioning the two versions of the study to prevent bias.

Take notes while each participant completes the tasks. The notes will depend on each specific scenario and your research goals, but they may include:

Navigation path(s) (e.g. visiting → parking → how to pay)
Number of attempts (e.g. how many times they click the back button)
Success or failure on task completion
Time on task (i.e. how long it takes for them to reach success or give up)
Useful quotes

Example notetaking sheet indicating task success or failure

Analyze and compare results

Set aside time to debrief immediately following the A/B test so the results are fresh in your mind. Identify patterns in task success rates and qualitative feedback and compare these findings between the two versions. Use these key takeaways and insights to determine which version of your design performed better according to your goals.

Plating

Depending on your audience, you might present your findings through a presentation, a report, or an informal dialogue. Explain the goal of the test and compare the key findings between each version. Offer a recommendation for the winning version with support from key metrics or behavioral insights to back your decision.

Credits

A/B Testing 101 by Tim Neusesser