Publication-ready statistical testing with 23 tests, effect sizes, power analysis, and APA formatting
Full Documentation · pip install scitex-stats
Statistical testing in Python is fragmented across scipy, statsmodels, and pingouin — each with different interfaces and output conventions. Getting publication-ready results requires substantial manual work: computing effect sizes, running power analysis, formatting to APA or journal standards. AI agents face a further barrier: they cannot call Python libraries directly and need structured, tool-based access.
scitex-stats provides a unified interface that covers the full statistical workflow:
- 23 statistical tests with automatic recommendation based on data characteristics
- Built-in effect sizes (Cohen's d, Cliff's delta, eta squared), power analysis, and APA-formatted output
- Three interfaces — Python API, CLI, and MCP server — so human researchers and AI agents use the same engine
flowchart LR
A[Raw Data] --> B{Recommend Test}
B --> C[Run Test]
C --> D[Effect Size]
C --> E[Power Analysis]
D --> F[APA Format]
E --> F
F --> G[Publication-Ready Result]
style A fill:#4a90d9,stroke:#2c3e50,color:#fff
style B fill:#f5a623,stroke:#2c3e50,color:#fff
style C fill:#27ae60,stroke:#2c3e50,color:#fff
style D fill:#8e44ad,stroke:#2c3e50,color:#fff
style E fill:#8e44ad,stroke:#2c3e50,color:#fff
style F fill:#e74c3c,stroke:#2c3e50,color:#fff
style G fill:#2c3e50,stroke:#1a252f,color:#fff
Figure 1. Statistical testing workflow. scitex-stats automates the full pipeline from raw data to publication-ready results: test recommendation based on data characteristics, test execution with effect size and power analysis, and APA-formatted output.
Every test returns a unified result dictionary with consistent keys:
{
"test_method": "Student's t-test (independent)",
"statistic": -3.210,
"stat_symbol": "t",
"alternative": "two-sided",
"n_x": 30,
"n_y": 30,
"pvalue": 0.0022,
"stars": "**",
"alpha": 0.05,
"significant": true,
"effect_size": -0.829,
"effect_size_metric": "Cohen's d",
"effect_size_interpretation": "large",
"power": 0.884,
"H0": "μ(x) = μ(y)",
"formatted": "t = -3.210, p = 0.0022, Cohen's d = -0.829, **"
}Table 3. Unified result format. All 23 tests return the same dictionary structure with test statistics, p-value, effect size with interpretation, statistical power, and APA-formatted string.
Requires Python >= 3.10.
pip install scitex-stats
# With MCP server for AI agents
pip install scitex-stats[mcp]
# Everything
pip install scitex-stats[all]SciTeX users:
pip install scitexalready includes Stats. Useimport scitexthenscitex.stats.
import scitex_stats as ss
# Get test recommendation
ctx = ss.StatContext(n_groups=2, sample_sizes=[30, 30], outcome_type="continuous", design="between", paired=False)
recs = ss.recommend_tests(ctx)
# Run a test
result = ss.run_test("ttest_ind", data=group1, data2=group2)
# APA-formatted output
print(result["formatted"])Python API
import scitex_stats as ss
# Automatic test recommendation
ctx = ss.StatContext(n_groups=2, sample_sizes=[30, 30], outcome_type="continuous", design="between", paired=False)
recs = ss.recommend_tests(ctx)
# Run a test
result = ss.run_test("ttest_ind", data=group1, data2=group2)
# Effect sizes
from scitex_stats import effect_sizes
d = effect_sizes.cohens_d(group1, group2)
# Power analysis
from scitex_stats import power
n = power.sample_size_ttest(effect_size=0.5, alpha=0.05, power=0.8)
# Multiple comparison correction
from scitex_stats import correct
corrected = correct.correct_fdr(results)
# Post-hoc tests
from scitex_stats import posthoc
results = posthoc.posthoc_tukey(groups)CLI Commands
scitex-stats --help-recursive # Show all commands
scitex-stats list-python-apis # List Python API tree
scitex-stats list-python-apis -v # With docstrings
scitex-stats mcp list-tools # List MCP tools
scitex-stats mcp doctor # Check server health
scitex-stats mcp start # Start MCP serverMCP Server — for AI Agents
AI agents can run statistical tests and format publication-ready results autonomously.
| Tool | Description |
|---|---|
recommend_tests |
Recommend appropriate tests based on data characteristics |
run_test |
Execute a statistical test on provided data |
format_results |
Format results in journal style (APA, Nature, etc.) |
power_analysis |
Calculate statistical power or required sample size |
correct_pvalues |
Apply multiple comparison correction |
describe |
Calculate descriptive statistics |
effect_size |
Calculate effect size between groups |
normality_test |
Test whether data follows normal distribution |
posthoc_test |
Run post-hoc pairwise comparisons |
p_to_stars |
Convert p-value to significance stars |
Table 1. MCP tools available for AI agent integration via scitex-stats mcp start.
scitex-stats mcp startFigure 2. Decision flowchart for choosing a statistical test. Start with your data type, then follow the branches based on number of groups and study design. Brunner-Munzel is recommended as the default for two-group comparisons due to its robustness to unequal variances and non-normality.
| Category | Tests |
|---|---|
| Parametric | t-test (ind, paired, 1-sample), ANOVA (1-way, RM, 2-way) |
| Nonparametric | Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Friedman, Brunner-Munzel |
| Correlation | Pearson, Spearman, Kendall, Theil-Sen |
| Categorical | Chi-squared, Fisher exact, McNemar, Cochran's Q |
| Normality | Shapiro-Wilk, Kolmogorov-Smirnov (1-sample, 2-sample) |
Table 2. All 23 statistical tests organized by category.
Detected by scitex-linter when this package is installed.
| Rule | Severity | Message |
|---|---|---|
STX-ST001 |
warning | scipy.stats.ttest_ind() — use stx.stats.ttest_ind() for auto effect size + CI |
STX-ST002 |
warning | scipy.stats.mannwhitneyu() — use stx.stats.mannwhitneyu() for auto effect size |
STX-ST003 |
warning | scipy.stats.pearsonr() — use stx.stats.pearsonr() for auto CI + power |
STX-ST004 |
warning | scipy.stats.f_oneway() — use stx.stats.anova_oneway() for post-hoc + effect sizes |
STX-ST005 |
warning | scipy.stats.wilcoxon() — use stx.stats.wilcoxon() for auto effect size |
STX-ST006 |
warning | scipy.stats.kruskal() — use stx.stats.kruskal() for post-hoc + effect sizes |
SciTeX Stats is part of SciTeX. When used inside the SciTeX framework, statistical testing integrates with the full pipeline — from data loading through analysis to publication-ready figures:
import scitex
@scitex.session
def main(CONFIG=scitex.INJECTED, plt=scitex.INJECTED):
# Load data
data = scitex.io.load("measurements.csv")
# Run statistical test
result = scitex.stats.run_test("ttest_ind", data=group1, data2=group2)
scitex.io.save(result, "stats_result.csv")
# Visualize with figrecipe (scitex.plt)
fig, ax = scitex.plt.subplots()
ax.plot_box([group1, group2], labels=["Control", "Treatment"])
ax.set_xyt("Group", "Value", f"p = {result['pvalue']:.4f} {result['stars']}")
scitex.io.save(fig, "comparison.png") # Saves plot + CSV data
return 0Figure 3. Example output combining scitex.stats (statistical test) with scitex.plt (publication-ready figure). The box plot shows group comparison with individual data points, significance bracket, p-value, and effect size — all generated from the unified result dictionary.
The ecosystem modules work together:
| Module | Package | Role |
|---|---|---|
scitex.stats |
scitex-stats | Statistical testing, effect sizes, power analysis |
scitex.plt |
figrecipe | Publication-ready figures with auto CSV export |
scitex.io |
scitex-io | Universal file I/O (30+ formats) |
scitex.clew |
scitex-clew | Reproducibility verification via hash DAGs |
The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.

