Experiment Health Checks
What are experiment Health Checks?
When running experiments, ensuring their reliability and validity is critical for making data-driven decisions. To achieve this, ABsmartly performs a series of automated health checks in the background. These checks are designed to identify potential issues that could compromise the integrity of the results. Below, we outline the health checks we currently perform and their importance in maintaining experiment quality.
Where can I see them?
You can view and access your experiment health checks directly at the top of your experiment’s dashboard.
If any issue is detected with your experiment, an alert would be triggered and a notification shown on the dashboard.
Which health checks do ABsmartly perform?
Assignment Conflict
What is it?
Detects if a visitor has been exposed to more than 1 variant.
Why is it important?
Exposing the same visitor to multiple variants can invalidate results by mixing treatment effects, making it impossible to measure the true impact of each variant. It can also lead to a poor user experience.
Example
A user first sees Variant A of an experiment and later encounters Variant B due to a targeting or implementation error.
What action should be taken?
Investigate the cause of the conflict, ensure consistent assignment logic, and prevent participants from being reassigned to different variants of the same experiment. Aborting or continuing the experiment really depends on how many participants were affected.
Sample Ratio Mismatch
What is it?
Verifies whether the allocation of participants among experiment groups matches the intended sample ratios. Check this Sample Ratio Mismatch (SRM) Checker for more information on SRM.
Why is it important?
Imbalanced group sizes can introduce bias into metrics and compromise the validity of statistical analyses.
Example
An expected 50/50 split in a two-variant experiment shows significant deviations between variants. For example 1000 participants are in variant A and 1200 participants in variant B.
What action should be taken?
The experiment should be aborted and the reasons for SRM should be investigated. No reliable decision can be made when an SRM error is detected.
Audience Mismatch
What is it?
Ensures that the actual audience for the experiment matches the intended target audience. This check is only performed when Audience Enforced is turned off. This serves as a reminder to put the required checks in your code to separate that targeting audience.
Why is it important?
It means that a non-intended audience is being exposed to your changes.Including the wrong audience can dilute the experiment’s impact and obscure meaningful insights.
Example
An experiment targeted at mobile users unintentionally includes desktop users.
What action should be taken?
Abort the experiments and refine audience targeting criteria in the code and ensure that only the intended participants are included.
Experiment Interactions
What is it?
Identifies if two or more experiments running simultaneously are interacting in ways that could influence results.
Why is it important?
Interactions between experiments can confound results, making it difficult to isolate the impact of each experiment.
Example
Two experiments affect the same page element, creating overlapping effects.
What action should be taken?
Assess the risk of the interaction and both one or both of the experiments to eliminate the interaction.
Variables Conflicting
What is it?
Checks for conflicting variables used across 2 or more experiments. This check is performed when Experiment Variables are used in the experiment setup.
Why is it important?
Variable conflicts can impact the reliability of the experiments and can have a negative impact on the user experience.
Example
Two experiments modify the same variable, such as a pricing display, but in incompatible ways.
What action should be taken?
Abort one or more experiments to prevent the conflict. Create new variables or run the experiments sequentially to avoid the conflict.
Metric Thresholds
What is it?
Evaluates whether key metrics remain within acceptable thresholds during the experiment. The check is performed when one of the experiment’s metrics is defined with a threshold value.
Why is it important?
Monitoring ensures that any unintended negative impact on metrics is caught early.
Example
A sudden spike in bounce rates during an experiment may indicate a poor variation experience.
What action should be taken?
Metrics threshold are usually defined to safeguard against harmful experiments. It is recommended to contact the metric’s owner to understand the impact and the risk of continuing the experiment.
How these checks improve experimentation
By running these health checks, we provide experimenters with confidence in their results. Automated detection of issues allows for:
- Early intervention: Spot problems early to address them before they impact outcomes significantly.
- Reliable insights: Ensure that data accurately reflects the effects of the tested variations.
- Improved decisions: Make data-driven decisions with a higher degree of certainty.
With these safeguards in place, our platform empowers users to conduct robust, trustworthy experiments that drive impactful business outcomes.