Think about the largest and smallest you see today, in [dollars]. Write those min and max amounts below.
_____________
Title of Card
What is the average [online purchase value] you see today, in [dollars]? Drag the dot there.
Now, think about the middle 70% of the [online purchase values] you see today, in [dollars]–the ones that aren't extreme in any direction and Drag the barbells so they shade over that 70%.
Finally, think about how you are hoping to shift those [online purchase values] with this experiment. What is the smallest shift you'd need to see in the experiment, to feel confident rolling out the idea your experimenting with more broadly? Drag the shaded area –the average and the 70% --to represent that minimum shift.
Because an experiment is sampling from a broader population, there is always a chance (even with the best random sampling methods) that we accidentally draw from some extreme set of people and see results that are differentthan the truth of that population. We could see a false positive – an effect in the sample that really isn't there in the population – or a false negative – no effect in the sample, even though there really is one in the population.

We won't know their falsity, of course. Which means with a false positive, we may move forward with an idea that really doesn't work. Or with a false negative, we may withhold an idea that really does work.

Think about this in the context of your hypothesis: [hypothesis]. Which would be worsefor you, financially, PR-wise, politically, operationally...? A false positive, i.e., moving forward with an idea that really doesn't work? Or a false negative, i.e., withholding one that really does work? Drag the ball to indicate their relative risk.
You run the experiment, and you see an effect! You go tell your boss. But you need to remind your boss there is some % chance that the effect is just a false positive... What is the largest % chance of a false positive that you –and your boss –are comfortable with? Drag the bar to indicate it.
What is the absolute maximum amount of [units] you could use for your sample in this experiment? Consider how more [units] can often mean greater financial cost, greater time to implement, and / or greater risk.

Wizard: Power calculation for comparing proportionsaverages

Is the number of (s) you can experiment with extremely large or unlimited? Or not, due to budget, access, or other constraints?
Sensitivity power calculation for comparing proportions
Categorical Sensitivity Analysis Current Scenario
alpha (risk of false positives or type I error rate): The likelihood of measuring at least the observed result, when in fact there is no effect at all
help_outline
Assigning an alpha level asks you to live in a world where you've ran the experiment and found a seemingly statistically significant effect size. If you select a p-value of .1, you've decided to live in a world where 1 out of every 10 experiments of this design would produce an effect size as least as large as the one you've measured even though there is truly no effect at all.
Power (1-risk of false negatives or 1-type II error rate): The likelihood that, when a treatment has an effect, you will be able to distinguish the effect from zero
help_outline
Determining power asks asks you to live in a world where you've ran the experiment and found a seemingly significant null result. If you select power of .95, you've decided to live in a world where 1 out of every 20 experiments of this design would produce an effect size of zero even though there is truly an effect
Total # of (s) available for experiment
Expected control group proportion that NEW QUESTION TODO (% success in control group)
help_outline
If we collected 100 samples of the DV from your control group, how many would meet the criteria of success as you define it? (e.g., if your DV is click through rate on an email, how many emails out of 100 do you expect to receive a click in your control group?)
Expected treatment group proportion that NEW QUESTION TODO (% success in treatment group)
help_outline
This value establishes the minimum proportion that your treatment group must achieve in order to produce a statistically significant result. E.g., treatment group expected proportion of 13 or 15 means your intervention must produce an average of more than 15%, or less than 13%, to produce a statistically significant result
-
Minimum Detectable Effect (MDE)
help_outline
This value is establishes the minimum effect size between your control and treatment comparison group(s) that your test must measure in order to achieve statistical significance. If you're not very confident in your rationale for why it makes sense your treatment would produce an incremental improvement on the control of this size, that is a sign you may want to tighten your expected treatment group estimate
-
Sensitivity power calculation for comparing averages
Continuous Sensitivity Analysis Current Scenario
alpha (risk of false positives or type I error rate): The likelihood of measuring at least the observed result, when in fact there is no effect at all
help_outline
Assigning an alpha level asks you to live in a world where you've ran the experiment and found a seemingly statistically significant effect size. If you select a p-value of .1, you've decided to live in a world where 1 out of every 10 experiments of this design would produce an effect size as least as large as the one you've measured even though there is truly no effect at all.
Power (1-risk of false negatives or 1-type II error rate): The likelihood that, when a treatment has an effect, you will be able to distinguish the effect from zero
help_outline
Determining power asks asks you to live in a world where you've ran the experiment and found a seemingly significant null result. If you select power of .95, you've decided to live in a world where 1 out of every 20 experiments of this design would produce an effect size of zero even though there is truly an effect
Total # of (s) available for experiment
Pooled standard deviation of DV: How variable or 'spread' is the outcome (dependent) variable in both test & control group(s)?
help_outline
If have no sample data to establish SD, you can estimate standard deviation with the following thought experiment: Excluding outliers, subtract the minimum DV value you'd expect from the maximum DV value and divide by 6
Expected control group NEW QUESTION TODO average
help_outline
If we collected 100 samples of the DV from your control group and averaged it, what number do you estimate?
Expected treatment group NEW QUESTION TODO average
help_outline
This value establishes the minimum average that your treatment group must achieve in order to produce a statistically significant result. E.g., treatment group expected average of 27 or 33 means your intervention must produce an average of more than 33, or less than 27, to produce a statistically significant result.
-
Minimum Detectable Effect (MDE)
help_outline
This value is establishes the minimum effect size between your control and treatment comparison group(s) that your test must measure in order to achieve statistical significance. If you're not very confident in your rationale for why it makes sense your treatment would produce an incremental improvement on the control of this size, that is a sign you may want to tighten your expected treatment group estimate
-

Please note:

Test group expected proportionaverage & MDE: Our power calculators assume you will run a two-sided test, meaning you're willing to consider the chance that the treatment may increase OR decrease the value of your primary outcome. The two values indicate a statistically significant average decrease and increase in primary outcome relative to the expected control outcome.

A priori power calculation for comparing proportions
Categorical A-Priori Analysis Current Scenario
alpha (risk of false positives or type I error rate): The likelihood of measuring at least the observed result, when in fact there is no effect at all
help_outline
Assigning an alpha level asks you to live in a world where you've ran the experiment and found a seemingly statistically significant effect size. If you select a p-value of .1, you've decided to live in a world where 1 out of every 10 experiments of this design would produce an effect size as least as large as the one you've measured even though there is truly no effect at all.
Power (1-risk of false negatives or 1-type II error rate): The likelihood that, when a treatment has an effect, you will be able to distinguish the effect from zero
help_outline
Determining power asks asks you to live in a world where you've ran the experiment and found a seemingly significant null result. If you select power of .95, you've decided to live in a world where 1 out of every 20 experiments of this design would produce an effect size of zero even though there is truly an effect
Expected control group proportion that NEW QUESTION TODO (% success in control group)
help_outline
If we collected 100 samples of the DV from your control group, how many would meet the criteria of success as you define it? (e.g., if your DV is click through rate on an email, how many emails out of 100 do you expect to receive a click in your control group?)
Expected treatment group proportion that NEW QUESTION TODO (% success in treatment group)
help_outline
This value establishes the minimum proportion that your treatment group must achieve in order to produce a statistically significant result. E.g., treatment group expected proportion of 13 or 15 means your intervention must produce an average of more than 15%, or less than 13%, to produce a statistically significant result
Minimum Detectable Effect (MDE)
help_outline
This value is establishes the minimum effect size between your control and treatment comparison group(s) that your test must measure in order to achieve statistical significance. If you're not very confident in your rationale for why it makes sense your treatment would produce an incremental improvement on the control of this size, that is a sign you may want to tighten your expected treatment group estimate
-
Sample size per comparison group -
A priori power calculation for comparing averages
Continuous A-Priori Analysis Current Scenario
alpha (risk of false positives or type I error rate): The likelihood of measuring at least the observed result, when in fact there is no effect at all
help_outline
Assigning an alpha level asks you to live in a world where you've ran the experiment and found a seemingly statistically significant effect size. If you select a p-value of .1, you've decided to live in a world where 1 out of every 10 experiments of this design would produce an effect size as least as large as the one you've measured even though there is truly no effect at all.
Power (1-risk of false negatives or 1-type II error rate): The likelihood that, when a treatment has an effect, you will be able to distinguish the effect from zero
help_outline
Determining power asks asks you to live in a world where you've ran the experiment and found a seemingly significant null result. If you select power of .95, you've decided to live in a world where 1 out of every 20 experiments of this design would produce an effect size of zero even though there is truly an effect
Pooled standard deviation of DV: How variable or 'spread' is the outcome (dependent) variable in both test & control group(s)?
help_outline
If have no sample data to establish SD, you can estimate standard deviation with the following thought experiment: Excluding outliers, subtract the minimum DV value you'd expect from the maximum DV value and divide by 6
Expected control group NEW QUESTION TODO average
help_outline
If we collected 100 samples of the DV from your control group and averaged it, what number do you estimate?
Expected treatment group NEW QUESTION TODO average
help_outline
This value establishes the minimum average that your treatment group must achieve in order to produce a statistically significant result. E.g., treatment group expected average of 27 or 33 means your intervention must produce an average of more than 33, or less than 27, to produce a statistically significant result.
Minimum Detectable Effect (MDE)
help_outline
This value is establishes the minimum effect size between your control and treatment comparison group(s) that your test must measure in order to achieve statistical significance. If you're not very confident in your rationale for why it makes sense your treatment would produce an incremental improvement on the control of this size, that is a sign you may want to tighten your expected treatment group estimate
-
Sample size per comparison group -
Business Experiment Launchpad