LLM Backends¶

Abstract Base Class¶

class hypotestx.core.llm.base.LLMBackend[source]¶

Bases: ABC

Abstract LLM backend.

Subclass this and implement chat() to integrate any LLM. The default route() method handles prompt building, JSON extraction, validation, and returning a RoutingResult — you only need to supply the raw API call.

name: str = 'custom'¶

abstractmethod chat(messages: List[Dict[str, str]]) → str[source]¶

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:: messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.
Returns:: The model’s text response.

route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) → RoutingResult[source]¶

Build prompts, call the LLM, parse the JSON response.

This method is final — override chat() instead.

class hypotestx.core.llm.base.RoutingResult(test: str = '', value_column: str | None = None, group_column: str | None = None, x_column: str | None = None, y_column: str | None = None, group_values: List[str] | None = None, alternative: str = 'two-sided', alpha: float | None = None, mu: float | None = None, equal_var: bool = False, correction: bool = True, method: str = 'parametric', reasoning: str = '', confidence: float = 1.0, routing_source: str = 'llm', raw_response: str = '')[source]¶

Bases: object

Structured intent extracted from a user question by an LLM or fallback parser. The engine uses this to fetch the correct columns from the DataFrame and call the right test function.

test: str = ''¶

value_column: str | None = None¶

group_column: str | None = None¶

x_column: str | None = None¶

y_column: str | None = None¶

group_values: List[str] | None = None¶

alternative: str = 'two-sided'¶

alpha: float | None = None¶

mu: float | None = None¶

equal_var: bool = False¶

correction: bool = True¶

method: str = 'parametric'¶

reasoning: str = ''¶

confidence: float = 1.0¶

routing_source: str = 'llm'¶

raw_response: str = ''¶

class hypotestx.core.llm.base.SchemaInfo(columns: List[str] = <factory>, dtypes: Dict[str, str]=<factory>, n_rows: int = 0, categoricals: Dict[str, ~typing.List[~typing.Any]]=<factory>, numerics: Dict[str, ~typing.Dict[str, float]]=<factory>)[source]¶

Bases: object

Summary of a DataFrame passed to the LLM as context. Built by build_schema() in prompts.py.

columns: List[str]¶

dtypes: Dict[str, str]¶

n_rows: int = 0¶

categoricals: Dict[str, List[Any]]¶

numerics: Dict[str, Dict[str, float]]¶

Callable Wrapper¶

class hypotestx.core.llm.base.CallableBackend(fn)[source]¶

Bases: LLMBackend

Wraps any callable(messages) -> str as an LLMBackend.

name: str = 'callable'¶

chat(messages: List[Dict[str, str]]) → str[source]¶

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:: messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.
Returns:: The model’s text response.

Built-in Regex Fallback¶

class hypotestx.core.llm.backends.fallback.FallbackBackend[source]¶

Bases: LLMBackend

Pure regex routing — no LLM, no internet, no dependencies.

Accuracy is lower than an LLM but it always works offline and is extremely fast. Use it for quick experiments or when no LLM is available.

name: str = 'fallback'¶

chat(messages: List[Dict[str, str]]) → str[source]¶: The fallback backend does not call an LLM. route() is overridden directly instead.

route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) → RoutingResult[source]¶: Bypass LLM and route via regex rules.

Google Gemini¶

class hypotestx.core.llm.backends.gemini.GeminiBackend(api_key: str, model: str = 'gemini-2.0-flash', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512)[source]¶

Bases: LLMBackend

Google Gemini backend via the Generative Language REST API.

No SDK required — uses only the Python standard library.

Parameters:

api_key – Google AI Studio API key.
model – Model name (default: gemini-2.0-flash).
timeout – HTTP timeout seconds (default: 60).
temperature – Sampling temperature (default: 0).
max_tokens – Maximum output tokens (default: 512).

name: str = 'gemini'¶

chat(messages: List[Dict[str, str]]) → str[source]¶

Call the Gemini generateContent endpoint.

The OpenAI message list is converted to Gemini’s contents format: system roles are prepended to the first user message text.

OpenAI-Compatible (OpenAI / Groq / Together / Mistral / Perplexity / Azure)¶

class hypotestx.core.llm.backends.openai_compat.OpenAICompatBackend(api_key: str, base_url: str = '', model: str = '', provider: str = 'openai', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512, extra_headers: Dict[str, str] | None = None, api_version: str = '')[source]¶

Bases: LLMBackend

Backend for any OpenAI-compatible chat-completion API.

Parameters:

api_key – API key / bearer token. For Azure this is the api-key header value.
base_url – Base URL ending in /v1 (e.g. https://api.groq.com/openai/v1). For Azure: https://<resource>.openai.azure.com (no trailing path).
model – Model name. For Azure this is the deployment name.
provider – Shorthand: "openai", "groq", "together", "perplexity", "mistral", "azure". Sets base_url + model automatically if not specified.
timeout – HTTP timeout in seconds (default: 60).
temperature – Sampling temperature (default: 0 for deterministic routing).
max_tokens – Maximum tokens in the response (default: 512).
extra_headers – Additional HTTP headers dict.
api_version – Azure API version string (default: "2024-02-01"). Only used when provider is "azure" or base_url is an Azure endpoint.

name: str = 'openai_compat'¶

chat(messages: List[Dict[str, str]]) → str[source]¶: Call the OpenAI-compatible /chat/completions endpoint.

Local Ollama¶

class hypotestx.core.llm.backends.ollama.OllamaBackend(model: str = 'llama3.2', host: str = 'http://localhost:11434', timeout: int = 120, options: Dict | None = None)[source]¶

Bases: LLMBackend

Ollama backend — fully local, zero API cost.

Parameters:

model – Ollama model name (default: llama3.2).
host – Base URL of the Ollama server (default: http://localhost:11434).
timeout – Request timeout in seconds (default: 120).
options – Extra Ollama model options dict, e.g. {"temperature": 0}.

name: str = 'ollama'¶

chat(messages: List[Dict[str, str]]) → str[source]¶: Send a chat request to the local Ollama server.

available_models() → List[str][source]¶: Return list of locally available model names.

auto_select_model() → str[source]¶: Pick the best locally available model. Preference order: phi4, gemma2, mistral, llama3.2, (anything else).

HuggingFace¶

class hypotestx.core.llm.backends.huggingface.HuggingFaceBackend(token: str = '', model: str = '', use_local: bool = False, timeout: int = 60, max_tokens: int = 512, device: str = 'cpu', load_kwargs: Dict[str, Any] | None = None)[source]¶

Bases: LLMBackend

HuggingFace backend (Inference API or local transformers).

Parameters:

token – HF access token (required for Inference API; optional locally).
model – Model repo ID.
use_local – If True, load the model locally via transformers.
timeout – HTTP timeout for Inference API (default: 60).
max_tokens – Maximum new tokens (default: 512).
device – PyTorch device for local inference ("cpu" or "cuda").
load_kwargs – Extra kwargs forwarded to AutoModelForCausalLM.from_pretrained().

name: str = 'huggingface'¶

chat(messages: List[Dict[str, str]]) → str[source]¶

Send a list of OpenAI-style messages and return the assistant reply.

Parameters:: messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.
Returns:: The model’s text response.

Backend Factory¶

hypotestx.core.llm.get_backend(spec: Any = None, **kwargs) → LLMBackend[source]¶

Resolve spec to a concrete LLMBackend instance.

Parameters:

spec (str | LLMBackend | callable | None) –
- None / "fallback" → FallbackBackend (regex, offline)
- "gemini" → GeminiBackend
- "ollama" → OllamaBackend
- "openai" → OpenAICompatBackend(provider=”openai”)
- "groq" → OpenAICompatBackend(provider=”groq”)
- "together" → OpenAICompatBackend(provider=”together”)
- "mistral" → OpenAICompatBackend(provider=”mistral”)
- "perplexity" → OpenAICompatBackend(provider=”perplexity”)
- "huggingface" → HuggingFaceBackend
- An LLMBackend instance → returned as-is
- A callable → wrapped in CallableBackend

**kwargs –

Forwarded verbatim to the backend constructor. Supported kwargs:

kwarg	backends	default
`api_key`	gemini, openai, groq, together, …	(required)
`model`	all	provider default
`timeout`	all	60 s
`temperature`	gemini, openai-compat, huggingface	0.0
`max_tokens`	gemini, openai-compat, huggingface	512
`host`	ollama	localhost:11434
`options`	ollama	{“temperature”: 0}
`token`	huggingface	(required)
`use_local`	huggingface	False
`device`	huggingface (local)	”cpu”
`base_url`	openai-compat	provider default
`provider`	openai-compat	”openai”
`extra_headers`	openai-compat	None

Examples

>>> from hypotestx.core.llm import get_backend
>>> b = get_backend("gemini", api_key="AIza...", model="gemini-2.0-flash-lite")
>>> b = get_backend("groq",   api_key="gsk_...", model="llama-3.3-70b-versatile")
>>> b = get_backend("openai", api_key="sk-...",  model="gpt-4o", temperature=0.2)
>>> b = get_backend("ollama", model="mistral", host="http://localhost:11434")
>>> b = get_backend("huggingface", token="hf_...", model="HuggingFaceH4/zephyr-7b-beta")
>>> b = get_backend("huggingface", model="microsoft/Phi-3.5-mini-instruct",
...                  use_local=True, device="cuda")
>>> b = get_backend("together", api_key="...", model="meta-llama/Llama-3-70b-chat-hf")
>>> b = get_backend("mistral",  api_key="...", model="mistral-large-latest")

hypotestx.core.llm.build_schema(df) → SchemaInfo[source]¶: Build a SchemaInfo snapshot from a DataFrame (pandas or polars). Works without importing pandas/polars at module level.