LLM Backends¶
HypoTestX uses LLM backends to parse plain-English questions into structured
routing decisions. All backends implement the LLMBackend abstract base class —
you can swap them with a single keyword argument, or build your own.
Backend Summary¶
|
Provider |
Cost |
Default model |
Extra deps |
|---|---|---|---|---|
|
Built-in regex |
Free, offline |
— |
None |
|
Local Ollama |
Free, offline |
|
Ollama app |
|
Google Gemini |
Free (1 500 req/day) |
|
None |
|
Groq Cloud |
Free tier |
|
None |
|
OpenAI |
Paid |
|
None |
|
Azure OpenAI |
Paid |
(deployment name) |
None |
|
Together AI |
Free tier |
|
None |
|
Mistral AI |
Free tier |
|
None |
|
Perplexity AI |
Free tier |
|
None |
|
HF Inference API / local |
Free tier / Local |
|
|
Common kwargs¶
All backends accept these keyword arguments via hx.analyze():
kwarg |
applicable backends |
description |
|---|---|---|
|
gemini, groq, openai, together, mistral, perplexity, azure |
Required for cloud providers |
|
all |
Override the default model name / ID |
|
gemini, openai-compat, huggingface |
Sampling temperature; |
|
gemini, openai-compat, huggingface |
Max tokens in the LLM response |
|
all |
HTTP timeout in seconds (default: |
|
ollama |
Server URL (default: |
|
ollama |
Dict forwarded to Ollama model options |
|
huggingface |
HF access token for Inference API |
|
huggingface |
Load model locally via |
|
huggingface local |
|
|
openai-compat, azure |
Override the API base URL |
|
azure |
Azure API version (default: |
|
openai-compat |
Additional HTTP headers dict |
|
all |
Dict of extra backend-specific kwargs (passthrough) |
Code Examples¶
Regex Fallback (default, offline, no API key)¶
import hypotestx as hx
result = hx.analyze(df, "Do males earn more than females?")
# Uses FallbackBackend automatically — no API key needed
# routing_confidence = 0.6
To suppress the routing warning:
result = hx.analyze(df, "Do males earn more?", warn_fallback=False)
Google Gemini¶
import os, hypotestx as hx
result = hx.analyze(
df,
"Is there a salary difference between engineering and sales?",
backend="gemini",
api_key=os.environ["GEMINI_API_KEY"],
model="gemini-2.0-flash", # or "gemini-2.0-flash-lite"
temperature=0.0,
max_tokens=512,
)
Groq (free tier, very fast)¶
result = hx.analyze(
df,
"Is employee satisfaction correlated with tenure?",
backend="groq",
api_key=os.environ["GROQ_API_KEY"],
model="llama-3.3-70b-versatile", # or "mixtral-8x7b-32768"
temperature=0.0,
)
OpenAI¶
result = hx.analyze(
df,
"Is salary correlated with years of experience?",
backend="openai",
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o-mini", # or "gpt-4o"
temperature=0.0,
max_tokens=256,
)
Ollama (local, offline, free)¶
result = hx.analyze(
df,
"Are there differences in performance scores across teams?",
backend="ollama",
model="phi4", # default: llama3.2
host="http://localhost:11434",
timeout=120,
)
Azure OpenAI¶
result = hx.analyze(
df,
"Do departments differ in performance?",
backend="azure",
api_key=os.environ["AZURE_OPENAI_API_KEY"],
base_url="https://<resource>.openai.azure.com",
model="<deployment-name>",
api_version="2024-02-01",
)
Together AI¶
result = hx.analyze(
df,
"Do groups differ?",
backend="together",
api_key=os.environ["TOGETHER_API_KEY"],
model="meta-llama/Llama-3-70b-chat-hf",
)
Mistral AI¶
result = hx.analyze(
df,
"Is there an association between region and sales tier?",
backend="mistral",
api_key=os.environ["MISTRAL_API_KEY"],
model="mistral-small-latest",
)
Perplexity AI¶
result = hx.analyze(
df,
"Compare satisfaction across customer segments",
backend="perplexity",
api_key=os.environ["PERPLEXITY_API_KEY"],
model="llama-3.1-sonar-small-128k-online",
)
HuggingFace Inference API (cloud, free tier)¶
result = hx.analyze(
df,
"Are gender and department related?",
backend="huggingface",
token=os.environ["HF_TOKEN"],
model="HuggingFaceH4/zephyr-7b-beta",
)
HuggingFace Local¶
pip install transformers torch
result = hx.analyze(
df,
"Is income different across regions?",
backend="huggingface",
model="microsoft/Phi-3.5-mini-instruct",
use_local=True,
device="cuda", # or "cpu"
)
Custom callable¶
Wrap any callable(messages: list) -> str as a backend:
result = hx.analyze(
df,
"Is height correlated with weight?",
backend=lambda msgs: my_llm_function(msgs[-1]["content"]),
)
Custom LLMBackend subclass¶
Subclass LLMBackend to integrate any LLM that’s not yet built-in:
import hypotestx as hx
class MyCompanyLLM(hx.LLMBackend):
name = "my_llm"
def chat(self, messages: list[dict]) -> str:
"""
messages: [{"role": "system", "content": ...},
{"role": "user", "content": ...}]
Must return a JSON string matching the RoutingResult schema.
"""
prompt = messages[-1]["content"]
return my_internal_api.complete(prompt)
result = hx.analyze(df, "Is satisfaction higher in Q4?", backend=MyCompanyLLM())
The chat() method only needs to return a valid JSON routing response — all
prompt construction, JSON extraction, and validation is handled by the base class
route() method.
Custom OpenAI-compatible Endpoint¶
For self-hosted models (vLLM, LiteLLM, Ollama OpenAI mode, …):
result = hx.analyze(
df,
"Compare groups",
backend="openai",
api_key="any-string", # required field even if unused
base_url="https://my-vllm/v1",
model="my-fine-tuned-model",
)
Security: API Key Best Practices¶
Never hard-code API keys in source code or commit them to version control.
import os
# Load from environment
result = hx.analyze(
df, "Do groups differ?",
backend="gemini",
api_key=os.environ["GEMINI_API_KEY"],
)
With python-dotenv:
from dotenv import load_dotenv
load_dotenv() # reads .env file into os.environ
import os, hypotestx as hx
result = hx.analyze(df, "...", backend="groq",
api_key=os.environ["GROQ_API_KEY"])
Add .env to your .gitignore to prevent key leaks.