analyze() — Natural Language Entry Point¶
- hypotestx.core.engine.analyze(df: Any, question: str, backend: Any = None, alpha: float = 0.05, verbose: bool = False, warn_fallback: bool = True, **kwargs) HypoResult[source]¶
Natural-language hypothesis testing.
Parses question in the context of df’s schema and automatically selects and executes the most appropriate statistical test.
- Parameters:
df (pandas.DataFrame | polars.DataFrame) – The dataset to analyse.
question (str) – A plain-English hypothesis question, e.g.
"Do males earn more than females?"or"Is age correlated with salary?".backend (str | LLMBackend | callable | None) –
LLM to use for intent parsing. -
None(default) — fast regex-based FallbackBackend (no API key) -"ollama"— local Ollama (llama3.2 by default) -"gemini"— Google Gemini free tier -"groq"— Groq free tier (OpenAI-compatible) -"openai"— OpenAI API - AnyLLMBackendsubclass instance. - Anycallable(messages) -> str.Pass any backend constructor kwargs directly to
analyze():kwarg
backends
notes
api_keygemini, openai, groq, together, mistral, perplexity
required for cloud providers
modelall
override the default model name/ID
timeoutall (default: 60 s)
HTTP / inference timeout in seconds
temperaturegemini, openai-compat, huggingface
sampling temperature (0 = deterministic)
max_tokensgemini, openai-compat, huggingface
max tokens in the LLM response
hostollama
server URL (default
http://localhost:11434)optionsollama
dict forwarded to Ollama model options
tokenhuggingface
HF access token for Inference API
use_localhuggingface
load model locally via
transformersdevicehuggingface local
"cpu"or"cuda"base_urlopenai-compat
override API base URL (e.g. Azure endpoint)
provideropenai-compat
shorthand:
"groq","together","mistral", etc.extra_headersopenai-compat
additional HTTP headers dict
alpha (float) – Significance level (default 0.05).
verbose (bool) – Print routing info and LLM reasoning to stdout.
warn_fallback (bool) – Emit a
UserWarningwhen the built-in regex fallback is used (defaultTrue). Set toFalseto suppress the warning.
- Returns:
Full result object with statistic, p-value, effect size, decision, and human-readable summary.
- Return type:
Examples
>>> # Regex fallback — no API key, works offline >>> result = hx.analyze(df, "Do males earn more than females?") >>> print(result.summary())
>>> # Gemini — free tier; pick any gemini-2.x model >>> result = hx.analyze( ... df, "Is there a salary difference between genders?", ... backend="gemini", api_key="AIza...", ... model="gemini-2.0-flash", # or "gemini-2.0-flash-lite" ... temperature=0.0, max_tokens=512, timeout=30, ... )
>>> # Groq — free tier, very fast >>> result = hx.analyze( ... df, "Do departments differ in performance?", ... backend="groq", api_key="gsk_...", ... model="llama-3.3-70b-versatile", # default; override freely ... temperature=0.0, max_tokens=512, ... )
>>> # OpenAI >>> result = hx.analyze( ... df, "Is salary correlated with tenure?", ... backend="openai", api_key="sk-...", ... model="gpt-4o-mini", # or "gpt-4o" ... temperature=0.0, max_tokens=256, ... )
>>> # Together AI / Mistral / Perplexity (OpenAI-compatible) >>> result = hx.analyze( ... df, "Compare groups A and B", ... backend="together", api_key="...", ... model="meta-llama/Llama-3-70b-chat-hf", ... )
>>> # Custom OpenAI-compatible endpoint (Azure, vLLM, LiteLLM, …) >>> result = hx.analyze( ... df, "Compare groups", ... backend="openai", api_key="...", ... base_url="https://my-az-endpoint.openai.azure.com/openai/v1", ... model="gpt-4o", ... )
>>> # Ollama — local, no API key >>> result = hx.analyze( ... df, "Compare groups A and B", ... backend="ollama", ... model="mistral", # default: llama3.2 ... host="http://localhost:11434", ... timeout=120, ... )
>>> # HuggingFace Inference API (cloud, free tier) >>> result = hx.analyze( ... df, "Are gender and department related?", ... backend="huggingface", token="hf_...", ... model="HuggingFaceH4/zephyr-7b-beta", ... )
>>> # HuggingFace local (requires: pip install transformers torch) >>> result = hx.analyze( ... df, "Is income different across regions?", ... backend="huggingface", ... model="microsoft/Phi-3.5-mini-instruct", ... use_local=True, device="cuda", # or "cpu" ... )
>>> # Bring your own callable >>> result = hx.analyze( ... df, "Is age correlated with salary?", ... backend=lambda msgs: my_llm_fn(msgs[-1]["content"]), ... )
Parameters¶
Parameter |
Type |
Description |
|---|---|---|
|
|
pandas or polars DataFrame containing the data to analyse. |
|
|
Plain-English hypothesis question, e.g.
|
|
|
LLM to use for intent parsing. |
|
|
Significance level. Default |
|
|
If |
|
|
Emit a |
|
|
API key forwarded to the backend constructor. |
|
|
Model name/ID forwarded to the backend constructor. |
|
|
Sampling temperature (gemini / openai-compat / huggingface). |
|
|
Max tokens for the LLM response. |
|
|
HTTP timeout in seconds (default: 60). |
|
|
Ollama server URL (default: |
|
|
Override API base URL (openai-compat / azure). |
|
|
Azure API version (default: |
Returns¶
hypotestx.core.result.HypoResultFull result object with statistic, p-value, effect size, confidence interval, routing metadata, and human-readable summary.
Examples¶
import hypotestx as hx
import pandas as pd
df = pd.read_csv("survey.csv")
# Regex fallback — no API key
result = hx.analyze(df, "Do males earn more than females?")
print(result.summary())
# Gemini free tier
result = hx.analyze(
df,
"Is there a salary difference between engineering and sales?",
backend="gemini",
api_key="AIza...",
model="gemini-2.0-flash",
temperature=0.0,
)
# Suppress fallback warning
result = hx.analyze(df, "Is age correlated with salary?", warn_fallback=False)