HypoResult Reference

Every statistical test — whether called via analyze() or a direct function — returns a HypoResult object. All fields are public attributes; none are hidden behind properties except is_significant and effect_magnitude.


Fields

Core statistical output

Field

Type

Description

test_name

str

Human-readable name, e.g. "Welch's t-test (unequal variances)"

statistic

float

Test statistic value (t, F, χ², U, W, r, …)

p_value

float

Two-sided or directional p-value

effect_size

float | None

Effect size (Cohen’s d, r, η², Cramér’s V, …)

effect_size_name

str | None

Name of the effect size measure used

confidence_interval

tuple[float, float] | None

(lower, upper) confidence interval

degrees_of_freedom

int | float | tuple | None

Degrees of freedom

sample_sizes

int | tuple | None

Per-group or total sample size(s)

assumptions_met

dict[str, bool]

Assumption check results (may be empty)

interpretation

str | None

Plain-English interpretation

data_summary

dict[str, Any]

Descriptive stats (may be empty)

alpha

float

Significance level used

alternative

str

"two-sided", "greater", or "less"

Routing metadata

These fields are populated when the result was produced by analyze(). For direct test calls they retain their defaults.

Field

Type

Default

Description

routing_confidence

float

1.0

Routing confidence: 1.0 for an LLM, 0.6 for the regex fallback

routing_source

str

"llm"

Source of the routing decision: "llm" or "fallback"

Computed properties

Property

Type

Description

is_significant

bool

True if p_value < alpha

effect_magnitude

str

Cohen’s convention: "negligible", "small", "medium", "large" — scale chosen automatically based on effect_size_name


summary() Output Format

result.summary()                # default
result.summary(verbose=True)    # includes sample sizes, assumption checks, data summary

Default output:

[ Welch's t-test (unequal variances) ]
=======================================
Result: SIGNIFICANT (alpha = 0.05)
Test statistic: 3.2456
p-value: 0.0012
Degrees of freedom: 248.0
Cohen's d: 0.6834 (medium)
95% Confidence Interval: [1.2300, 4.5600]

Interpretation:
There is a statistically significant difference between the two groups
(t = 3.25, df = 248, p = 0.0012, Cohen's d = 0.68).

When routing_source == "fallback", the summary appends:

⚠  Routed via regex fallback (confidence=60%). Verify the correct test was selected.

Verbose output additionally prints:

Sample sizes: (125, 125)

Assumption Checks:
  Normality (group1): Met
  Normality (group2): Met

Data Summary:
  group1_mean: 52500.1234
  group2_mean: 48200.5678
  pooled_std: 6200.4321

to_dict()

Returns all fields as a plain Python dictionary — convenient for serialisation, logging, or building DataFrames of results:

d = result.to_dict()
# {
#   "test_name": "Welch's t-test (unequal variances)",
#   "statistic": 3.2456,
#   "p_value": 0.0012,
#   "is_significant": True,
#   "alpha": 0.05,
#   "alternative": "two-sided",
#   "effect_size": 0.6834,
#   "effect_size_name": "Cohen's d",
#   "effect_magnitude": "medium",
#   "confidence_interval": (1.23, 4.56),
#   "degrees_of_freedom": 248.0,
#   "sample_sizes": (125, 125),
#   "assumptions_met": {},
#   "data_summary": {},
# }

plot()

Produce a matplotlib figure for the result. Requires matplotlib (install with pip install hypotestx[visualization]).

fig = result.plot()                  # auto-selects chart type
fig = result.plot(kind="bar")        # grouped bar chart
fig = result.plot(kind="p_value")    # p-value on null distribution
fig = result.plot(kind="box")        # box plot of groups
fig.savefig("result.png")

kind value

Description

"auto"

Best chart for the test type (default)

"bar"

Mean ± CI bar chart for two-group comparisons

"box"

Box plot of group distributions

"p_value"

p-value highlighted on the null distribution curve

See Visualization for the full plotting guide.


Accessing Individual Fields

result = hx.analyze(df, "Do males earn more than females?")

print(result.test_name)            # "Welch's t-test (unequal variances)"
print(result.statistic)            # 3.2456
print(result.p_value)              # 0.0012
print(result.is_significant)       # True
print(result.effect_size)          # 0.6834
print(result.effect_size_name)     # "Cohen's d"
print(result.effect_magnitude)     # "medium"
print(result.confidence_interval)  # (1.23, 4.56)
print(result.alpha)                # 0.05
print(result.alternative)          # "two-sided"
print(result.routing_confidence)   # 0.6 (fallback) or 1.0 (LLM)
print(result.routing_source)       # "fallback" or "llm"