Small Telescopes for Higher-Power Replications

Last updated on 9 August 2025

TL;DR. In Simonsohn’s Small‑Telescopes framework you test whether the true effect is smaller than the effect that would have given the original study 33% power (call this $d_{33 %}$ ). Simonsohn (2015) shows that this requires 2.5x the original sample size if you want 80% power to show that the true effect is smaller than $d_{33 %}$ if the true effect is 0. In this post, I provide alternative multiples for 90% and 95% power (3.5x and 4.5x respectively).

Small-Telescopes in Short: What is being tested (and why “one‑sided”)?

Let $d_{33 %}$ be the effect size that would have given the original two‑sample t test 33% power at two‑sided $α_{0} = .05$ . The replication runs a one‑sided test

$H_{0} : d = d_{33 %} vs H_{1} : d < d_{33 %}$

at $α = .05$ , and we plan the replication to have (e.g.) 80% power to reject $H_{0}$ if the true effect is 0. Intuition: Rather than asking whether the effect is exactly zero, the small telescopes test asks whether it is smaller than the minimal effect the original study had a reasonable chance to detect. If the replication can rule out effects of that size, the onus shifts — the original proponents are suggested to now bear the “burden of proof” again, because even the smallest effect they could plausibly have detected is inconsistent with the replication data.

When might higher power be worth it?

The original small telescopes rule-of-thumb — about 2.5× the original sample size for 80% power — works well for many replications. But there are situations where aiming higher is justified. For example:

Policy- or practice-relevant findings where acting on a false positive would have high costs.
Expensive or rare data where you won’t get a second chance to replicate, so making the most of the opportunity matters.
Adversarial collaborations where proponents and critics may want to aim for a definitive answer.

In these cases, the (substantial) cost of increasing power from 80% to 90% or 95% may be outweighed by the clarity it brings. A higher-powered replication yields in a more precise effect size estimate, and is more likely to confirm whether the effect is significantly smaller than the small telescope threshold.

Analytic multipliers

The multipliers for the original per‑cell $n$ to achieve higher power in the small telescopes test can be derived from the normal approximation of the two‑sample t test. Under a normal approximation, the replication‑to‑original per‑cell‑ $n$ ratio is

$m = {(\frac{z_{1 - α} + z_{power}}{z_{1 - α_{0} / 2} + z_{p_{0}}})}^{2},$

with $p_{0} = 1 / 3$ (see SM in Simonsohn, 2015).¹ This gives: 80% → 2.64×, 90% → 3.66×, 95% → 4.63×. With finite‑sample t tests, the exact multipliers are slightly lower for small samples and approach these limits as $n_{0}$ grows. In line with the original paper, which proposed 2.5x for 80% power, I propose using 3.5x and 4.5x for 90% and 95% power respectively.²

Exact calculation and validation of the heuristic multipliers

Using the t-distribution for specific sample sizes, we can calculate more specific multipliers for the small telescopes test. The code below computes these “exact” multiplier for a given original per‑cell $n$ and target power, and compares it to the normal approximation. The results show that the multiples lead to a power within 2% of the target power for $n_{0}$ up to 1000 participants per cell.

Show code to calculate exact multiplier

library(dplyr)
library(purrr)
library(pwr)

# d_33% for the original two-sample t (per-cell n), p0 = 1/3 per supplement
d33_two_sample_t <- function(n_per_cell, p0 = 1/3, alpha0 = 0.05) {
  pwr.t.test(n = n_per_cell,
             power = p0,
             sig.level = alpha0,
             type = "two.sample",
             alternative = "two.sided")$d
}

# Power to reject H0: d = d33 with a one-sided (less) t-test at alpha, when true d = 0
# Test is one-sided because the replication has a directional hypothesis
power_reject_d33_under_null0 <- function(n_per_cell_rep, d33, alpha = 0.05) {
  df    <- 2 * n_per_cell_rep - 2
  tcrit <- qt(alpha, df = df, lower.tail = TRUE)
  ncp   <- -d33 * sqrt(n_per_cell_rep / 2)
  pt(tcrit, df = df, ncp = ncp, lower.tail = TRUE)
}

# Exact multiplier via t; returns integer per-cell n for replication and achieved power
small_telescopes_multiplier_exact <- function(n_original_per_cell,
                                              target_power = 0.80,
                                              alpha = 0.05,
                                              p0 = 1/3,
                                              alpha0 = 0.05,
                                              max_mult = 100,
                                              tol = 1e-7) {
  stopifnot(n_original_per_cell >= 3)
  d33 <- d33_two_sample_t(n_original_per_cell, p0 = p0, alpha0 = alpha0)
  f <- function(n_rep) power_reject_d33_under_null0(n_rep, d33, alpha) - target_power

  lower <- max(3, n_original_per_cell) * 1.0
  upper <- min(max_mult * n_original_per_cell, 1e7)
  f_lower <- f(lower); f_upper <- f(upper)
  while (f_lower * f_upper > 0 && upper < 1e7) {
    upper <- min(upper * 2, 1e7); f_upper <- f(upper)
  }

  n_rep_cont <- if (f_lower >= 0) lower else if (f_upper <= 0) upper else
    uniroot(f, interval = c(lower, upper), tol = tol)$root

  n_rep_int <- ceiling(n_rep_cont)
  tibble(
    n_original_per_cell = n_original_per_cell,
    d33 = d33,
    n_replication_per_cell_exact = n_rep_int,
    exact_multiplier = n_rep_int / n_original_per_cell,
    achieved_power_exact = power_reject_d33_under_null0(n_rep_int, d33, alpha)
  )
}

# Normal-approx multiplier (independent of n0) for p0 = 1/3
small_telescopes_multiplier_normal <- function(target_power = 0.80,
                                               alpha = 0.05,
                                               p0 = 1/3,
                                               alpha0 = 0.05) {
  z_alpha0_2      <- qnorm(1 - alpha0/2)
  z_p0            <- qnorm(p0)
  z_1_minus_alpha <- qnorm(1 - alpha)
  z_target_power  <- qnorm(target_power)
  mu_orig <- z_alpha0_2 + z_p0
  mu_rep  <- z_1_minus_alpha + z_target_power
  (mu_rep / mu_orig)^2
}

n₀ per cell	d_33% (orig)	exact m	achieved power (exact)	achieved power (heuristic)
Target power = 80%
Heuristic multiplier: 2.5×
10	0.72	2.50	0.81	0.81
20	0.50	2.55	0.80	0.79
50	0.31	2.62	0.80	0.79
100	0.22	2.63	0.80	0.78
500	0.10	2.64	0.80	0.78
1,000	0.07	2.64	0.80	0.78
100,000	0.01	2.65	0.80	0.78

n₀ per cell	d_33% (orig)	exact m	achieved power (exact)	achieved power (heuristic)
Target power = 90%
Heuristic multiplier: 3.5×
10	0.72	3.40	0.90	0.91
20	0.50	3.55	0.90	0.90
50	0.31	3.62	0.90	0.89
100	0.22	3.64	0.90	0.89
500	0.10	3.66	0.90	0.89
1,000	0.07	3.66	0.90	0.89
100,000	0.01	3.67	0.90	0.89

n₀ per cell	d_33% (orig)	exact m	achieved power (exact)	achieved power (heuristic)
Target power = 95%
Heuristic multiplier: 4.5×
10	0.72	4.30	0.95	0.96
20	0.50	4.45	0.95	0.95
50	0.31	4.56	0.95	0.95
100	0.22	4.60	0.95	0.95
500	0.10	4.62	0.95	0.94
1,000	0.07	4.63	0.95	0.94
100,000	0.01	4.63	0.95	0.94

Practical takeaway

While the original Simonsohn (2015) heuristic of 2.5× the original per‑cell $n$ for 80% power slightly underestimates the required sample size when n > 20, it achieves at least 78% power even in large samples.
If you need 90% or 95% power, 3.5× and 4.5× can serve as comparable heuristics. They also slightly underestimate the sample size, but still achieve 89% and 94% power respectively in large samples.
If you want to power your replication study more precisely, you can use the small_telescopes_multiplier_exact() function in the code above.

Reference

Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559-569.

In prose we say “33% power,” but numerically we follow the Simonsohn (2015) supplement and compute with $p_{0} = 1 / 3$ . Using $p_{0} = .33$ instead shifts the normal multipliers slightly (e.g., 2.68 vs 2.64).↩︎
All $n$ here are per‑cell for two‑sample t with equal variances. Simonsohn (2015) shows that Χ² tests behave similarly, and various other tests directly depend on the t-distribution (e.g. correlations, regression coefficients). Nevertheless, the stability of the rules-of-thumb across designs could be investigated further.↩︎

Metascience

Dr Lukas Wallrich

Senior Lecturer

Researcher and educator with a focus on Open Science and intergroup relations.