16

A/B test analysis framework

A reusable framework for analyzing controlled experiments using frequentist, Bayesian, and sequential methods, with multiple comparison correction and power analysis.

Bayesian inference Sequential testing Hypothesis testing Power analysis Experimentation Product analytics
0.996
P(B > A), website redesign

Interactive dashboard

Five-page Streamlit application for end-to-end experiment analysis

Overview
  • Three experiments with conversion and revenue
  • Summary table with significance decisions
  • Forest plot of effect sizes
Frequentist
  • Z-test for proportions and Welch's t-test
  • Confidence intervals and effect sizes
  • Multiple comparison correction (Bonferroni, Holm, BH)
Bayesian
  • Beta-Binomial posterior distributions
  • P(B > A) and expected lift with credible intervals
  • Posterior sampling with 100K draws
Power calculator
  • Sample size estimation given baseline, MDE, alpha, power
  • Interactive sensitivity curves
  • MDE vs. sample size trade-off chart
Report
  • Structured summary for each experiment with frequentist and Bayesian verdicts
  • Sequential monitoring with O'Brien-Fleming spending function boundaries
  • Exportable results table with all metrics, p-values, and credible intervals
$ pip install -r requirements.txt && streamlit run app.py

Key results

Three controlled experiments analyzed with dual frequentist and Bayesian frameworks

3
Experiments analyzed
p<.001
Email subject line test
0.996
P(B > A), website redesign
2 / 3
Significant experiments

Methodology

Three experiments (website redesign, pricing change, email subject line) with conversion and revenue metrics. The frequentist path uses z-tests for proportions and Welch's t-tests for continuous outcomes, with Cohen's h and d effect sizes and multiple comparison correction via Bonferroni, Holm, and Benjamini-Hochberg FDR. The Bayesian path uses a Beta-Binomial conjugate model with uninformative priors and 100K posterior samples. Sequential monitoring applies O'Brien-Fleming spending functions for valid early stopping.

Power analysis
Sample size, MDE, alpha, power
Frequentist tests
Z-test, t-test, correction
Bayesian inference
Beta-Binomial, posterior, P(B>A)
Decision report
Sequential monitoring, verdicts

Links