Small Telescopes for Higher-Power Replications
TL;DR. In Simonsohn’s Small‑Telescopes framework you test whether the true effect is smaller than the effect that would have given the original study 33% power (call this
). Simonsohn (2015) shows that this requires 2.5x the original sample size if you want 80% power to show that the true effect is smaller than if the true effect is 0. In this post, I provide alternative multiples for 90% and 95% power (3.5x and 4.5x respectively).
Small-Telescopes in Short: What is being tested (and why “one‑sided”)?
Let
at
When might higher power be worth it?
The original small telescopes rule-of-thumb — about 2.5× the original sample size for 80% power — works well for many replications. But there are situations where aiming higher is justified. For example:
- Policy- or practice-relevant findings where acting on a false positive would have high costs.
- Expensive or rare data where you won’t get a second chance to replicate, so making the most of the opportunity matters.
- Adversarial collaborations where proponents and critics may want to aim for a definitive answer.
In these cases, the (substantial) cost of increasing power from 80% to 90% or 95% may be outweighed by the clarity it brings. A higher-powered replication yields in a more precise effect size estimate, and is more likely to confirm whether the effect is significantly smaller than the small telescope threshold.
Analytic multipliers
The multipliers for the original per‑cell
with
Exact calculation and validation of the heuristic multipliers
Using the t-distribution for specific sample sizes, we can calculate more specific multipliers for the small telescopes test. The code below computes these “exact” multiplier for a given original per‑cell
Show code to calculate exact multiplier
library(dplyr)
library(purrr)
library(pwr)
# d_33% for the original two-sample t (per-cell n), p0 = 1/3 per supplement
d33_two_sample_t <- function(n_per_cell, p0 = 1/3, alpha0 = 0.05) {
pwr.t.test(n = n_per_cell,
power = p0,
sig.level = alpha0,
type = "two.sample",
alternative = "two.sided")$d
}
# Power to reject H0: d = d33 with a one-sided (less) t-test at alpha, when true d = 0
# Test is one-sided because the replication has a directional hypothesis
power_reject_d33_under_null0 <- function(n_per_cell_rep, d33, alpha = 0.05) {
df <- 2 * n_per_cell_rep - 2
tcrit <- qt(alpha, df = df, lower.tail = TRUE)
ncp <- -d33 * sqrt(n_per_cell_rep / 2)
pt(tcrit, df = df, ncp = ncp, lower.tail = TRUE)
}
# Exact multiplier via t; returns integer per-cell n for replication and achieved power
small_telescopes_multiplier_exact <- function(n_original_per_cell,
target_power = 0.80,
alpha = 0.05,
p0 = 1/3,
alpha0 = 0.05,
max_mult = 100,
tol = 1e-7) {
stopifnot(n_original_per_cell >= 3)
d33 <- d33_two_sample_t(n_original_per_cell, p0 = p0, alpha0 = alpha0)
f <- function(n_rep) power_reject_d33_under_null0(n_rep, d33, alpha) - target_power
lower <- max(3, n_original_per_cell) * 1.0
upper <- min(max_mult * n_original_per_cell, 1e7)
f_lower <- f(lower); f_upper <- f(upper)
while (f_lower * f_upper > 0 && upper < 1e7) {
upper <- min(upper * 2, 1e7); f_upper <- f(upper)
}
n_rep_cont <- if (f_lower >= 0) lower else if (f_upper <= 0) upper else
uniroot(f, interval = c(lower, upper), tol = tol)$root
n_rep_int <- ceiling(n_rep_cont)
tibble(
n_original_per_cell = n_original_per_cell,
d33 = d33,
n_replication_per_cell_exact = n_rep_int,
exact_multiplier = n_rep_int / n_original_per_cell,
achieved_power_exact = power_reject_d33_under_null0(n_rep_int, d33, alpha)
)
}
# Normal-approx multiplier (independent of n0) for p0 = 1/3
small_telescopes_multiplier_normal <- function(target_power = 0.80,
alpha = 0.05,
p0 = 1/3,
alpha0 = 0.05) {
z_alpha0_2 <- qnorm(1 - alpha0/2)
z_p0 <- qnorm(p0)
z_1_minus_alpha <- qnorm(1 - alpha)
z_target_power <- qnorm(target_power)
mu_orig <- z_alpha0_2 + z_p0
mu_rep <- z_1_minus_alpha + z_target_power
(mu_rep / mu_orig)^2
}
| Target power = 80% | ||||
| Heuristic multiplier: 2.5× | ||||
| n0 per cell | d33% (orig) | exact m | achieved power (exact) | achieved power (heuristic) |
|---|---|---|---|---|
| Target power = 90% | ||||
| Heuristic multiplier: 3.5× | ||||
| n0 per cell | d33% (orig) | exact m | achieved power (exact) | achieved power (heuristic) |
|---|---|---|---|---|
| Target power = 95% | ||||
| Heuristic multiplier: 4.5× | ||||
| n0 per cell | d33% (orig) | exact m | achieved power (exact) | achieved power (heuristic) |
|---|---|---|---|---|
Practical takeaway
- While the original Simonsohn (2015) heuristic of 2.5× the original per‑cell
for 80% power slightly underestimates the required sample size when n > 20, it achieves at least 78% power even in large samples. - If you need 90% or 95% power, 3.5× and 4.5× can serve as comparable heuristics. They also slightly underestimate the sample size, but still achieve 89% and 94% power respectively in large samples.
- If you want to power your replication study more precisely, you can use the
small_telescopes_multiplier_exact()function in the code above.
Reference
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559-569.
In prose we say “33% power,” but numerically we follow the Simonsohn (2015) supplement and compute with
. Using instead shifts the normal multipliers slightly (e.g., 2.68 vs 2.64).↩︎All
here are per‑cell for two‑sample t with equal variances. Simonsohn (2015) shows that Χ2 tests behave similarly, and various other tests directly depend on the t-distribution (e.g. correlations, regression coefficients). Nevertheless, the stability of the rules-of-thumb across designs could be investigated further.↩︎