LLM Backends

Abstract Base Class

class hypotestx.core.llm.base.LLMBackend[source]

Bases: ABC

Abstract LLM backend.

Subclass this and implement chat() to integrate any LLM. The default route() method handles prompt building, JSON extraction, validation, and returning a RoutingResult — you only need to supply the raw API call.

name: str = 'custom'
abstractmethod chat(messages: List[Dict[str, str]]) str[source]

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:

messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.

Returns:

The model’s text response.

route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) RoutingResult[source]

Build prompts, call the LLM, parse the JSON response.

This method is final — override chat() instead.

class hypotestx.core.llm.base.RoutingResult(test: str = '', value_column: str | None = None, group_column: str | None = None, x_column: str | None = None, y_column: str | None = None, group_values: List[str] | None = None, alternative: str = 'two-sided', alpha: float | None = None, mu: float | None = None, equal_var: bool = False, correction: bool = True, method: str = 'parametric', reasoning: str = '', confidence: float = 1.0, routing_source: str = 'llm', raw_response: str = '')[source]

Bases: object

Structured intent extracted from a user question by an LLM or fallback parser. The engine uses this to fetch the correct columns from the DataFrame and call the right test function.

test: str = ''
value_column: str | None = None
group_column: str | None = None
x_column: str | None = None
y_column: str | None = None
group_values: List[str] | None = None
alternative: str = 'two-sided'
alpha: float | None = None
mu: float | None = None
equal_var: bool = False
correction: bool = True
method: str = 'parametric'
reasoning: str = ''
confidence: float = 1.0
routing_source: str = 'llm'
raw_response: str = ''
class hypotestx.core.llm.base.SchemaInfo(columns: List[str] = <factory>, dtypes: Dict[str, str]=<factory>, n_rows: int = 0, categoricals: Dict[str, ~typing.List[~typing.Any]]=<factory>, numerics: Dict[str, ~typing.Dict[str, float]]=<factory>)[source]

Bases: object

Summary of a DataFrame passed to the LLM as context. Built by build_schema() in prompts.py.

columns: List[str]
dtypes: Dict[str, str]
n_rows: int = 0
categoricals: Dict[str, List[Any]]
numerics: Dict[str, Dict[str, float]]

Callable Wrapper

class hypotestx.core.llm.base.CallableBackend(fn)[source]

Bases: LLMBackend

Wraps any callable(messages) -> str as an LLMBackend.

name: str = 'callable'
chat(messages: List[Dict[str, str]]) str[source]

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:

messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.

Returns:

The model’s text response.

Built-in Regex Fallback

class hypotestx.core.llm.backends.fallback.FallbackBackend[source]

Bases: LLMBackend

Pure regex routing — no LLM, no internet, no dependencies.

Accuracy is lower than an LLM but it always works offline and is extremely fast. Use it for quick experiments or when no LLM is available.

name: str = 'fallback'
chat(messages: List[Dict[str, str]]) str[source]

The fallback backend does not call an LLM. route() is overridden directly instead.

route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) RoutingResult[source]

Bypass LLM and route via regex rules.

Google Gemini

class hypotestx.core.llm.backends.gemini.GeminiBackend(api_key: str, model: str = 'gemini-2.0-flash', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512)[source]

Bases: LLMBackend

Google Gemini backend via the Generative Language REST API.

No SDK required — uses only the Python standard library.

Parameters:
  • api_key – Google AI Studio API key.

  • model – Model name (default: gemini-2.0-flash).

  • timeout – HTTP timeout seconds (default: 60).

  • temperature – Sampling temperature (default: 0).

  • max_tokens – Maximum output tokens (default: 512).

name: str = 'gemini'
chat(messages: List[Dict[str, str]]) str[source]

Call the Gemini generateContent endpoint.

The OpenAI message list is converted to Gemini’s contents format: system roles are prepended to the first user message text.

OpenAI-Compatible (OpenAI / Groq / Together / Mistral / Perplexity / Azure)

class hypotestx.core.llm.backends.openai_compat.OpenAICompatBackend(api_key: str, base_url: str = '', model: str = '', provider: str = 'openai', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512, extra_headers: Dict[str, str] | None = None, api_version: str = '')[source]

Bases: LLMBackend

Backend for any OpenAI-compatible chat-completion API.

Parameters:
  • api_key – API key / bearer token. For Azure this is the api-key header value.

  • base_url – Base URL ending in /v1 (e.g. https://api.groq.com/openai/v1). For Azure: https://<resource>.openai.azure.com (no trailing path).

  • model – Model name. For Azure this is the deployment name.

  • provider – Shorthand: "openai", "groq", "together", "perplexity", "mistral", "azure". Sets base_url + model automatically if not specified.

  • timeout – HTTP timeout in seconds (default: 60).

  • temperature – Sampling temperature (default: 0 for deterministic routing).

  • max_tokens – Maximum tokens in the response (default: 512).

  • extra_headers – Additional HTTP headers dict.

  • api_version – Azure API version string (default: "2024-02-01"). Only used when provider is "azure" or base_url is an Azure endpoint.

name: str = 'openai_compat'
chat(messages: List[Dict[str, str]]) str[source]

Call the OpenAI-compatible /chat/completions endpoint.

Local Ollama

class hypotestx.core.llm.backends.ollama.OllamaBackend(model: str = 'llama3.2', host: str = 'http://localhost:11434', timeout: int = 120, options: Dict | None = None)[source]

Bases: LLMBackend

Ollama backend — fully local, zero API cost.

Parameters:
  • model – Ollama model name (default: llama3.2).

  • host – Base URL of the Ollama server (default: http://localhost:11434).

  • timeout – Request timeout in seconds (default: 120).

  • options – Extra Ollama model options dict, e.g. {"temperature": 0}.

name: str = 'ollama'
chat(messages: List[Dict[str, str]]) str[source]

Send a chat request to the local Ollama server.

available_models() List[str][source]

Return list of locally available model names.

auto_select_model() str[source]

Pick the best locally available model. Preference order: phi4, gemma2, mistral, llama3.2, (anything else).

HuggingFace

class hypotestx.core.llm.backends.huggingface.HuggingFaceBackend(token: str = '', model: str = '', use_local: bool = False, timeout: int = 60, max_tokens: int = 512, device: str = 'cpu', load_kwargs: Dict[str, Any] | None = None)[source]

Bases: LLMBackend

HuggingFace backend (Inference API or local transformers).

Parameters:
  • token – HF access token (required for Inference API; optional locally).

  • model – Model repo ID.

  • use_local – If True, load the model locally via transformers.

  • timeout – HTTP timeout for Inference API (default: 60).

  • max_tokens – Maximum new tokens (default: 512).

  • device – PyTorch device for local inference ("cpu" or "cuda").

  • load_kwargs – Extra kwargs forwarded to AutoModelForCausalLM.from_pretrained().

name: str = 'huggingface'
chat(messages: List[Dict[str, str]]) str[source]

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:

messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.

Returns:

The model’s text response.

Backend Factory

hypotestx.core.llm.get_backend(spec: Any = None, **kwargs) LLMBackend[source]

Resolve spec to a concrete LLMBackend instance.

Parameters:
  • spec (str | LLMBackend | callable | None) –

    • None / "fallback" → FallbackBackend (regex, offline)

    • "gemini" → GeminiBackend

    • "ollama" → OllamaBackend

    • "openai" → OpenAICompatBackend(provider=”openai”)

    • "groq" → OpenAICompatBackend(provider=”groq”)

    • "together" → OpenAICompatBackend(provider=”together”)

    • "mistral" → OpenAICompatBackend(provider=”mistral”)

    • "perplexity" → OpenAICompatBackend(provider=”perplexity”)

    • "huggingface" → HuggingFaceBackend

    • An LLMBackend instance → returned as-is

    • A callable → wrapped in CallableBackend

  • **kwargs

    Forwarded verbatim to the backend constructor. Supported kwargs:

    kwarg

    backends

    default

    api_key

    gemini, openai, groq, together, …

    (required)

    model

    all

    provider default

    timeout

    all

    60 s

    temperature

    gemini, openai-compat, huggingface

    0.0

    max_tokens

    gemini, openai-compat, huggingface

    512

    host

    ollama

    localhost:11434

    options

    ollama

    {“temperature”: 0}

    token

    huggingface

    (required)

    use_local

    huggingface

    False

    device

    huggingface (local)

    ”cpu”

    base_url

    openai-compat

    provider default

    provider

    openai-compat

    ”openai”

    extra_headers

    openai-compat

    None

Examples

>>> from hypotestx.core.llm import get_backend
>>> b = get_backend("gemini", api_key="AIza...", model="gemini-2.0-flash-lite")
>>> b = get_backend("groq",   api_key="gsk_...", model="llama-3.3-70b-versatile")
>>> b = get_backend("openai", api_key="sk-...",  model="gpt-4o", temperature=0.2)
>>> b = get_backend("ollama", model="mistral", host="http://localhost:11434")
>>> b = get_backend("huggingface", token="hf_...", model="HuggingFaceH4/zephyr-7b-beta")
>>> b = get_backend("huggingface", model="microsoft/Phi-3.5-mini-instruct",
...                  use_local=True, device="cuda")
>>> b = get_backend("together", api_key="...", model="meta-llama/Llama-3-70b-chat-hf")
>>> b = get_backend("mistral",  api_key="...", model="mistral-large-latest")
hypotestx.core.llm.build_schema(df) SchemaInfo[source]

Build a SchemaInfo snapshot from a DataFrame (pandas or polars). Works without importing pandas/polars at module level.