`analyze()` — Natural Language Entry Point¶

hypotestx.core.engine.analyze(df: Any, question: str, backend: Any = None, alpha: float = 0.05, verbose: bool = False, warn_fallback: bool = True, **kwargs) → HypoResult[source]¶

Natural-language hypothesis testing.

Parses question in the context of df’s schema and automatically selects and executes the most appropriate statistical test.

Parameters:

df (pandas.DataFrame | polars.DataFrame) – The dataset to analyse.
question (str) – A plain-English hypothesis question, e.g. "Do males earn more than females?" or "Is age correlated with salary?".

backend (str | LLMBackend | callable | None) –

LLM to use for intent parsing. - None (default) — fast regex-based FallbackBackend (no API key) - "ollama" — local Ollama (llama3.2 by default) - "gemini" — Google Gemini free tier - "groq" — Groq free tier (OpenAI-compatible) - "openai" — OpenAI API - Any LLMBackend subclass instance. - Any callable(messages) -> str.

Pass any backend constructor kwargs directly to analyze():

kwarg	backends	notes
`api_key`	gemini, openai, groq, together, mistral, perplexity	required for cloud providers
`model`	all	override the default model name/ID
`timeout`	all (default: 60 s)	HTTP / inference timeout in seconds
`temperature`	gemini, openai-compat, huggingface	sampling temperature (0 = deterministic)
`max_tokens`	gemini, openai-compat, huggingface	max tokens in the LLM response
`host`	ollama	server URL (default `http://localhost:11434`)
`options`	ollama	dict forwarded to Ollama model options
`token`	huggingface	HF access token for Inference API
`use_local`	huggingface	load model locally via `transformers`
`device`	huggingface local	`"cpu"` or `"cuda"`
`base_url`	openai-compat	override API base URL (e.g. Azure endpoint)
`provider`	openai-compat	shorthand: `"groq"`, `"together"`, `"mistral"`, etc.
`extra_headers`	openai-compat	additional HTTP headers dict

alpha (float) – Significance level (default 0.05).
verbose (bool) – Print routing info and LLM reasoning to stdout.
warn_fallback (bool) – Emit a UserWarning when the built-in regex fallback is used (default True). Set to False to suppress the warning.

Returns:

Full result object with statistic, p-value, effect size, decision, and human-readable summary.

Return type:

HypoResult

Examples

>>> # Regex fallback — no API key, works offline
>>> result = hx.analyze(df, "Do males earn more than females?")
>>> print(result.summary())

>>> # Gemini — free tier; pick any gemini-2.x model
>>> result = hx.analyze(
...     df, "Is there a salary difference between genders?",
...     backend="gemini", api_key="AIza...",
...     model="gemini-2.0-flash",  # or "gemini-2.0-flash-lite"
...     temperature=0.0, max_tokens=512, timeout=30,
... )

>>> # Groq — free tier, very fast
>>> result = hx.analyze(
...     df, "Do departments differ in performance?",
...     backend="groq", api_key="gsk_...",
...     model="llama-3.3-70b-versatile",  # default; override freely
...     temperature=0.0, max_tokens=512,
... )

>>> # OpenAI
>>> result = hx.analyze(
...     df, "Is salary correlated with tenure?",
...     backend="openai", api_key="sk-...",
...     model="gpt-4o-mini",  # or "gpt-4o"
...     temperature=0.0, max_tokens=256,
... )

>>> # Together AI / Mistral / Perplexity (OpenAI-compatible)
>>> result = hx.analyze(
...     df, "Compare groups A and B",
...     backend="together", api_key="...",
...     model="meta-llama/Llama-3-70b-chat-hf",
... )

>>> # Custom OpenAI-compatible endpoint (Azure, vLLM, LiteLLM, …)
>>> result = hx.analyze(
...     df, "Compare groups",
...     backend="openai", api_key="...",
...     base_url="https://my-az-endpoint.openai.azure.com/openai/v1",
...     model="gpt-4o",
... )

>>> # Ollama — local, no API key
>>> result = hx.analyze(
...     df, "Compare groups A and B",
...     backend="ollama",
...     model="mistral",       # default: llama3.2
...     host="http://localhost:11434",
...     timeout=120,
... )

>>> # HuggingFace Inference API (cloud, free tier)
>>> result = hx.analyze(
...     df, "Are gender and department related?",
...     backend="huggingface", token="hf_...",
...     model="HuggingFaceH4/zephyr-7b-beta",
... )

>>> # HuggingFace local (requires: pip install transformers torch)
>>> result = hx.analyze(
...     df, "Is income different across regions?",
...     backend="huggingface",
...     model="microsoft/Phi-3.5-mini-instruct",
...     use_local=True, device="cuda",  # or "cpu"
... )

>>> # Bring your own callable
>>> result = hx.analyze(
...     df, "Is age correlated with salary?",
...     backend=lambda msgs: my_llm_fn(msgs[-1]["content"]),
... )

Parameters¶

Parameter	Type	Description
`df`	`DataFrame`	pandas or polars DataFrame containing the data to analyse.
`question`	`str`	Plain-English hypothesis question, e.g. `"Do males earn more than females?"`.
`backend`	`str \| LLMBackend \| callable \| None`	LLM to use for intent parsing. `None` (default) = built-in regex fallback. Accepts `"gemini"`, `"groq"`, `"openai"`, `"ollama"`, `"azure"`, `"together"`, `"mistral"`, `"perplexity"`, `"huggingface"`, an `LLMBackend` instance, or any `callable(messages) -> str`.
`alpha`	`float`	Significance level. Default `0.05`.
`verbose`	`bool`	If `True`, prints routing info and LLM reasoning to stdout.
`warn_fallback`	`bool`	Emit a `UserWarning` when the regex fallback is used. Default `True`.
`api_key`	`str`	API key forwarded to the backend constructor.
`model`	`str`	Model name/ID forwarded to the backend constructor.
`temperature`	`float`	Sampling temperature (gemini / openai-compat / huggingface).
`max_tokens`	`int`	Max tokens for the LLM response.
`timeout`	`int`	HTTP timeout in seconds (default: 60).
`host`	`str`	Ollama server URL (default: `http://localhost:11434`).
`base_url`	`str`	Override API base URL (openai-compat / azure).
`api_version`	`str`	Azure API version (default: `"2024-02-01"`).

Returns¶

hypotestx.core.result.HypoResult: Full result object with statistic, p-value, effect size, confidence interval, routing metadata, and human-readable summary.

Examples¶

import hypotestx as hx
import pandas as pd

df = pd.read_csv("survey.csv")

# Regex fallback — no API key
result = hx.analyze(df, "Do males earn more than females?")
print(result.summary())

# Gemini free tier
result = hx.analyze(
    df,
    "Is there a salary difference between engineering and sales?",
    backend="gemini",
    api_key="AIza...",
    model="gemini-2.0-flash",
    temperature=0.0,
)

# Suppress fallback warning
result = hx.analyze(df, "Is age correlated with salary?", warn_fallback=False)

analyze() — Natural Language Entry Point¶

Parameters¶

Returns¶

Examples¶

`analyze()` — Natural Language Entry Point¶