LLM Backends¶
Abstract Base Class¶
- class hypotestx.core.llm.base.LLMBackend[source]¶
Bases:
ABCAbstract LLM backend.
Subclass this and implement
chat()to integrate any LLM. The defaultroute()method handles prompt building, JSON extraction, validation, and returning aRoutingResult— you only need to supply the raw API call.- name: str = 'custom'¶
- abstractmethod chat(messages: List[Dict[str, str]]) str[source]¶
Send a list of OpenAI-style messages and return the assistant reply.
- Parameters:
messages – List of dicts with ‘role’ (‘system’ | ‘user’ | ‘assistant’) and ‘content’ (str) keys.
- Returns:
The model’s text response.
- route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) RoutingResult[source]¶
Build prompts, call the LLM, parse the JSON response.
This method is final — override
chat()instead.
- class hypotestx.core.llm.base.RoutingResult(test: str = '', value_column: str | None = None, group_column: str | None = None, x_column: str | None = None, y_column: str | None = None, group_values: List[str] | None = None, alternative: str = 'two-sided', alpha: float | None = None, mu: float | None = None, equal_var: bool = False, correction: bool = True, method: str = 'parametric', reasoning: str = '', confidence: float = 1.0, routing_source: str = 'llm', raw_response: str = '')[source]¶
Bases:
objectStructured intent extracted from a user question by an LLM or fallback parser. The engine uses this to fetch the correct columns from the DataFrame and call the right test function.
- test: str = ''¶
- value_column: str | None = None¶
- group_column: str | None = None¶
- x_column: str | None = None¶
- y_column: str | None = None¶
- group_values: List[str] | None = None¶
- alternative: str = 'two-sided'¶
- alpha: float | None = None¶
- mu: float | None = None¶
- equal_var: bool = False¶
- correction: bool = True¶
- method: str = 'parametric'¶
- reasoning: str = ''¶
- confidence: float = 1.0¶
- routing_source: str = 'llm'¶
- raw_response: str = ''¶
- class hypotestx.core.llm.base.SchemaInfo(columns: List[str] = <factory>, dtypes: Dict[str, str]=<factory>, n_rows: int = 0, categoricals: Dict[str, ~typing.List[~typing.Any]]=<factory>, numerics: Dict[str, ~typing.Dict[str, float]]=<factory>)[source]¶
Bases:
objectSummary of a DataFrame passed to the LLM as context. Built by
build_schema()in prompts.py.- columns: List[str]¶
- dtypes: Dict[str, str]¶
- n_rows: int = 0¶
- categoricals: Dict[str, List[Any]]¶
- numerics: Dict[str, Dict[str, float]]¶
Callable Wrapper¶
- class hypotestx.core.llm.base.CallableBackend(fn)[source]¶
Bases:
LLMBackendWraps any
callable(messages) -> stras anLLMBackend.- name: str = 'callable'¶
Built-in Regex Fallback¶
- class hypotestx.core.llm.backends.fallback.FallbackBackend[source]¶
Bases:
LLMBackendPure regex routing — no LLM, no internet, no dependencies.
Accuracy is lower than an LLM but it always works offline and is extremely fast. Use it for quick experiments or when no LLM is available.
- name: str = 'fallback'¶
- chat(messages: List[Dict[str, str]]) str[source]¶
The fallback backend does not call an LLM.
route()is overridden directly instead.
- route(question: str, schema: SchemaInfo, extra_context: str = '', warn_fallback: bool = True) RoutingResult[source]¶
Bypass LLM and route via regex rules.
Google Gemini¶
- class hypotestx.core.llm.backends.gemini.GeminiBackend(api_key: str, model: str = 'gemini-2.0-flash', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512)[source]¶
Bases:
LLMBackendGoogle Gemini backend via the Generative Language REST API.
No SDK required — uses only the Python standard library.
- Parameters:
api_key – Google AI Studio API key.
model – Model name (default:
gemini-2.0-flash).timeout – HTTP timeout seconds (default: 60).
temperature – Sampling temperature (default: 0).
max_tokens – Maximum output tokens (default: 512).
- name: str = 'gemini'¶
OpenAI-Compatible (OpenAI / Groq / Together / Mistral / Perplexity / Azure)¶
- class hypotestx.core.llm.backends.openai_compat.OpenAICompatBackend(api_key: str, base_url: str = '', model: str = '', provider: str = 'openai', timeout: int = 60, temperature: float = 0.0, max_tokens: int = 512, extra_headers: Dict[str, str] | None = None, api_version: str = '')[source]¶
Bases:
LLMBackendBackend for any OpenAI-compatible chat-completion API.
- Parameters:
api_key – API key / bearer token. For Azure this is the
api-keyheader value.base_url – Base URL ending in
/v1(e.g.https://api.groq.com/openai/v1). For Azure:https://<resource>.openai.azure.com(no trailing path).model – Model name. For Azure this is the deployment name.
provider – Shorthand:
"openai","groq","together","perplexity","mistral","azure". Sets base_url + model automatically if not specified.timeout – HTTP timeout in seconds (default: 60).
temperature – Sampling temperature (default: 0 for deterministic routing).
max_tokens – Maximum tokens in the response (default: 512).
extra_headers – Additional HTTP headers dict.
api_version – Azure API version string (default:
"2024-02-01"). Only used when provider is"azure"or base_url is an Azure endpoint.
- name: str = 'openai_compat'¶
Local Ollama¶
- class hypotestx.core.llm.backends.ollama.OllamaBackend(model: str = 'llama3.2', host: str = 'http://localhost:11434', timeout: int = 120, options: Dict | None = None)[source]¶
Bases:
LLMBackendOllama backend — fully local, zero API cost.
- Parameters:
model – Ollama model name (default:
llama3.2).host – Base URL of the Ollama server (default:
http://localhost:11434).timeout – Request timeout in seconds (default: 120).
options – Extra Ollama model options dict, e.g.
{"temperature": 0}.
- name: str = 'ollama'¶
HuggingFace¶
- class hypotestx.core.llm.backends.huggingface.HuggingFaceBackend(token: str = '', model: str = '', use_local: bool = False, timeout: int = 60, max_tokens: int = 512, device: str = 'cpu', load_kwargs: Dict[str, Any] | None = None)[source]¶
Bases:
LLMBackendHuggingFace backend (Inference API or local transformers).
- Parameters:
token – HF access token (required for Inference API; optional locally).
model – Model repo ID.
use_local – If True, load the model locally via
transformers.timeout – HTTP timeout for Inference API (default: 60).
max_tokens – Maximum new tokens (default: 512).
device – PyTorch device for local inference (
"cpu"or"cuda").load_kwargs – Extra kwargs forwarded to
AutoModelForCausalLM.from_pretrained().
- name: str = 'huggingface'¶
Backend Factory¶
- hypotestx.core.llm.get_backend(spec: Any = None, **kwargs) LLMBackend[source]¶
Resolve spec to a concrete
LLMBackendinstance.- Parameters:
spec (str | LLMBackend | callable | None) –
None/"fallback"→ FallbackBackend (regex, offline)"gemini"→ GeminiBackend"ollama"→ OllamaBackend"openai"→ OpenAICompatBackend(provider=”openai”)"groq"→ OpenAICompatBackend(provider=”groq”)"together"→ OpenAICompatBackend(provider=”together”)"mistral"→ OpenAICompatBackend(provider=”mistral”)"perplexity"→ OpenAICompatBackend(provider=”perplexity”)"huggingface"→ HuggingFaceBackendAn
LLMBackendinstance → returned as-isA
callable→ wrapped in CallableBackend
**kwargs –
Forwarded verbatim to the backend constructor. Supported kwargs:
kwarg
backends
default
api_keygemini, openai, groq, together, …
(required)
modelall
provider default
timeoutall
60 s
temperaturegemini, openai-compat, huggingface
0.0
max_tokensgemini, openai-compat, huggingface
512
hostollama
localhost:11434
optionsollama
{“temperature”: 0}
tokenhuggingface
(required)
use_localhuggingface
False
devicehuggingface (local)
”cpu”
base_urlopenai-compat
provider default
provideropenai-compat
”openai”
extra_headersopenai-compat
None
Examples
>>> from hypotestx.core.llm import get_backend >>> b = get_backend("gemini", api_key="AIza...", model="gemini-2.0-flash-lite") >>> b = get_backend("groq", api_key="gsk_...", model="llama-3.3-70b-versatile") >>> b = get_backend("openai", api_key="sk-...", model="gpt-4o", temperature=0.2) >>> b = get_backend("ollama", model="mistral", host="http://localhost:11434") >>> b = get_backend("huggingface", token="hf_...", model="HuggingFaceH4/zephyr-7b-beta") >>> b = get_backend("huggingface", model="microsoft/Phi-3.5-mini-instruct", ... use_local=True, device="cuda") >>> b = get_backend("together", api_key="...", model="meta-llama/Llama-3-70b-chat-hf") >>> b = get_backend("mistral", api_key="...", model="mistral-large-latest")
- hypotestx.core.llm.build_schema(df) SchemaInfo[source]¶
Build a
SchemaInfosnapshot from a DataFrame (pandas or polars). Works without importing pandas/polars at module level.