# Why HypoTestX? ## The Problem Statistical hypothesis testing is a core skill in data science, research, and engineering. Yet every existing option forces a painful trade-off: ### Option A — scipy (and friends) scipy is excellent, but to use it you must: 1. **Already know which test to run.** Is this a t-test or a Mann-Whitney? Should I use Welch's correction? Is this paired or independent? 2. **Manually prepare your data.** Slice the DataFrame yourself, convert to arrays, handle NaN values, split groups by label. 3. **Interpret raw numbers.** `scipy.stats.ttest_ind()` returns a tuple `(statistic, p_value)`. No effect size, no confidence interval, no interpretation. For a seasoned statistician this is fine. For everyone else — the analyst, the ML engineer, the researcher who is not a statistician by training — it creates a significant barrier. ### Option B — Ask a Chat LLM Modern LLMs can answer "do males earn more than females in my data?" surprisingly well — in natural language. But the answer: - **Cannot be embedded in code.** It's a paragraph of text, not a structured object. - **Is not reproducible.** Run the same question again and you may get a different answer. - **Has no audit trail.** You cannot verify which test it ran or with which parameters. - **Cannot be composed.** You can't call `.p_value`, `.effect_size`, or `.plot()` on a chat reply. ### The Gap HypoTestX Fills | Feature | scipy | Ask an LLM | HypoTestX | |---|---|---|---| | Natural language input | ❌ | ✅ | ✅ | | Structured result object | ❌ | ❌ | ✅ | | Effect size + CI included | Manual | ❌ | ✅ | | Reproducible / embeddable | ✅ | ❌ | ✅ | | Works offline | ✅ | ❌ | ✅ (fallback) | | Auto test selection | ❌ | ❌ | ✅ | HypoTestX gives you: - A **plain-English interface** so you don't need to know which test to pick - A **structured `HypoResult`** with every number you need, every time - **Effect sizes and confidence intervals** included automatically - **Reproducible, embeddable results** you can version-control and audit - A **regex fallback** that works completely offline — no API key, no network - An **LLM backend system** where you swap in any model with a single keyword --- ## When to Use the Direct API Instead HypoTestX also exposes all 12 test functions directly. Use these when: - You already know exactly which test you need - You want the most explicit, readable code possible - You're writing a library or module that others will read ```python # Very explicit — no routing, no ambiguity result = hx.ttest_2samp( group1, group2, alternative="greater", equal_var=False, # Welch's correction alpha=0.01, ) ``` The natural-language interface and the direct API return identical `HypoResult` objects — the same `.p_value`, `.effect_size`, `.summary()`, and `.plot()` work on both.