analyze() — Natural Language Entry Point

hypotestx.core.engine.analyze(df: Any, question: str, backend: Any = None, alpha: float = 0.05, verbose: bool = False, warn_fallback: bool = True, **kwargs) HypoResult[source]

Natural-language hypothesis testing.

Parses question in the context of df’s schema and automatically selects and executes the most appropriate statistical test.

Parameters:
  • df (pandas.DataFrame | polars.DataFrame) – The dataset to analyse.

  • question (str) – A plain-English hypothesis question, e.g. "Do males earn more than females?" or "Is age correlated with salary?".

  • backend (str | LLMBackend | callable | None) –

    LLM to use for intent parsing. - None (default) — fast regex-based FallbackBackend (no API key) - "ollama" — local Ollama (llama3.2 by default) - "gemini" — Google Gemini free tier - "groq" — Groq free tier (OpenAI-compatible) - "openai" — OpenAI API - Any LLMBackend subclass instance. - Any callable(messages) -> str.

    Pass any backend constructor kwargs directly to analyze():

    kwarg

    backends

    notes

    api_key

    gemini, openai, groq, together, mistral, perplexity

    required for cloud providers

    model

    all

    override the default model name/ID

    timeout

    all (default: 60 s)

    HTTP / inference timeout in seconds

    temperature

    gemini, openai-compat, huggingface

    sampling temperature (0 = deterministic)

    max_tokens

    gemini, openai-compat, huggingface

    max tokens in the LLM response

    host

    ollama

    server URL (default http://localhost:11434)

    options

    ollama

    dict forwarded to Ollama model options

    token

    huggingface

    HF access token for Inference API

    use_local

    huggingface

    load model locally via transformers

    device

    huggingface local

    "cpu" or "cuda"

    base_url

    openai-compat

    override API base URL (e.g. Azure endpoint)

    provider

    openai-compat

    shorthand: "groq", "together", "mistral", etc.

    extra_headers

    openai-compat

    additional HTTP headers dict

  • alpha (float) – Significance level (default 0.05).

  • verbose (bool) – Print routing info and LLM reasoning to stdout.

  • warn_fallback (bool) – Emit a UserWarning when the built-in regex fallback is used (default True). Set to False to suppress the warning.

Returns:

Full result object with statistic, p-value, effect size, decision, and human-readable summary.

Return type:

HypoResult

Examples

>>> # Regex fallback — no API key, works offline
>>> result = hx.analyze(df, "Do males earn more than females?")
>>> print(result.summary())
>>> # Gemini — free tier; pick any gemini-2.x model
>>> result = hx.analyze(
...     df, "Is there a salary difference between genders?",
...     backend="gemini", api_key="AIza...",
...     model="gemini-2.0-flash",  # or "gemini-2.0-flash-lite"
...     temperature=0.0, max_tokens=512, timeout=30,
... )
>>> # Groq — free tier, very fast
>>> result = hx.analyze(
...     df, "Do departments differ in performance?",
...     backend="groq", api_key="gsk_...",
...     model="llama-3.3-70b-versatile",  # default; override freely
...     temperature=0.0, max_tokens=512,
... )
>>> # OpenAI
>>> result = hx.analyze(
...     df, "Is salary correlated with tenure?",
...     backend="openai", api_key="sk-...",
...     model="gpt-4o-mini",  # or "gpt-4o"
...     temperature=0.0, max_tokens=256,
... )
>>> # Together AI / Mistral / Perplexity (OpenAI-compatible)
>>> result = hx.analyze(
...     df, "Compare groups A and B",
...     backend="together", api_key="...",
...     model="meta-llama/Llama-3-70b-chat-hf",
... )
>>> # Custom OpenAI-compatible endpoint (Azure, vLLM, LiteLLM, …)
>>> result = hx.analyze(
...     df, "Compare groups",
...     backend="openai", api_key="...",
...     base_url="https://my-az-endpoint.openai.azure.com/openai/v1",
...     model="gpt-4o",
... )
>>> # Ollama — local, no API key
>>> result = hx.analyze(
...     df, "Compare groups A and B",
...     backend="ollama",
...     model="mistral",       # default: llama3.2
...     host="http://localhost:11434",
...     timeout=120,
... )
>>> # HuggingFace Inference API (cloud, free tier)
>>> result = hx.analyze(
...     df, "Are gender and department related?",
...     backend="huggingface", token="hf_...",
...     model="HuggingFaceH4/zephyr-7b-beta",
... )
>>> # HuggingFace local (requires: pip install transformers torch)
>>> result = hx.analyze(
...     df, "Is income different across regions?",
...     backend="huggingface",
...     model="microsoft/Phi-3.5-mini-instruct",
...     use_local=True, device="cuda",  # or "cpu"
... )
>>> # Bring your own callable
>>> result = hx.analyze(
...     df, "Is age correlated with salary?",
...     backend=lambda msgs: my_llm_fn(msgs[-1]["content"]),
... )

Parameters

Parameter

Type

Description

df

DataFrame

pandas or polars DataFrame containing the data to analyse.

question

str

Plain-English hypothesis question, e.g. "Do males earn more than females?".

backend

str | LLMBackend | callable | None

LLM to use for intent parsing. None (default) = built-in regex fallback. Accepts "gemini", "groq", "openai", "ollama", "azure", "together", "mistral", "perplexity", "huggingface", an LLMBackend instance, or any callable(messages) -> str.

alpha

float

Significance level. Default 0.05.

verbose

bool

If True, prints routing info and LLM reasoning to stdout.

warn_fallback

bool

Emit a UserWarning when the regex fallback is used. Default True.

api_key

str

API key forwarded to the backend constructor.

model

str

Model name/ID forwarded to the backend constructor.

temperature

float

Sampling temperature (gemini / openai-compat / huggingface).

max_tokens

int

Max tokens for the LLM response.

timeout

int

HTTP timeout in seconds (default: 60).

host

str

Ollama server URL (default: http://localhost:11434).

base_url

str

Override API base URL (openai-compat / azure).

api_version

str

Azure API version (default: "2024-02-01").

Returns

hypotestx.core.result.HypoResult

Full result object with statistic, p-value, effect size, confidence interval, routing metadata, and human-readable summary.

Examples

import hypotestx as hx
import pandas as pd

df = pd.read_csv("survey.csv")

# Regex fallback — no API key
result = hx.analyze(df, "Do males earn more than females?")
print(result.summary())

# Gemini free tier
result = hx.analyze(
    df,
    "Is there a salary difference between engineering and sales?",
    backend="gemini",
    api_key="AIza...",
    model="gemini-2.0-flash",
    temperature=0.0,
)

# Suppress fallback warning
result = hx.analyze(df, "Is age correlated with salary?", warn_fallback=False)