LLM Backends¶

HypoTestX uses LLM backends to parse plain-English questions into structured routing decisions. All backends implement the LLMBackend abstract base class — you can swap them with a single keyword argument, or build your own.

Backend Summary¶

`backend=` string	Provider	Cost	Default model	Extra deps
`None` / `"fallback"`	Built-in regex	Free, offline	—	None
`"ollama"`	Local Ollama	Free, offline	`llama3.2`	Ollama app
`"gemini"`	Google Gemini	Free (1 500 req/day)	`gemini-2.0-flash`	None
`"groq"`	Groq Cloud	Free tier	`llama-3.3-70b-versatile`	None
`"openai"`	OpenAI	Paid	`gpt-4o-mini`	None
`"azure"`	Azure OpenAI	Paid	(deployment name)	None
`"together"`	Together AI	Free tier	`meta-llama/Llama-3-70b-chat-hf`	None
`"mistral"`	Mistral AI	Free tier	`mistral-small-latest`	None
`"perplexity"`	Perplexity AI	Free tier	`llama-3.1-sonar-small-128k-online`	None
`"huggingface"`	HF Inference API / local	Free tier / Local	`zephyr-7b-beta`	`transformers` (local only)

Common kwargs¶

All backends accept these keyword arguments via hx.analyze():

kwarg	applicable backends	description
`api_key`	gemini, groq, openai, together, mistral, perplexity, azure	Required for cloud providers
`model`	all	Override the default model name / ID
`temperature`	gemini, openai-compat, huggingface	Sampling temperature; `0` = deterministic
`max_tokens`	gemini, openai-compat, huggingface	Max tokens in the LLM response
`timeout`	all	HTTP timeout in seconds (default: `60`)
`host`	ollama	Server URL (default: `http://localhost:11434`)
`options`	ollama	Dict forwarded to Ollama model options
`token`	huggingface	HF access token for Inference API
`use_local`	huggingface	Load model locally via `transformers`
`device`	huggingface local	`"cpu"` or `"cuda"`
`base_url`	openai-compat, azure	Override the API base URL
`api_version`	azure	Azure API version (default: `"2024-02-01"`)
`extra_headers`	openai-compat	Additional HTTP headers dict
`backend_options`	all	Dict of extra backend-specific kwargs (passthrough)

Code Examples¶

Regex Fallback (default, offline, no API key)¶

import hypotestx as hx

result = hx.analyze(df, "Do males earn more than females?")
# Uses FallbackBackend automatically — no API key needed
# routing_confidence = 0.6

To suppress the routing warning:

result = hx.analyze(df, "Do males earn more?", warn_fallback=False)

Google Gemini¶

import os, hypotestx as hx

result = hx.analyze(
    df,
    "Is there a salary difference between engineering and sales?",
    backend="gemini",
    api_key=os.environ["GEMINI_API_KEY"],
    model="gemini-2.0-flash",        # or "gemini-2.0-flash-lite"
    temperature=0.0,
    max_tokens=512,
)

Groq (free tier, very fast)¶

result = hx.analyze(
    df,
    "Is employee satisfaction correlated with tenure?",
    backend="groq",
    api_key=os.environ["GROQ_API_KEY"],
    model="llama-3.3-70b-versatile",  # or "mixtral-8x7b-32768"
    temperature=0.0,
)

OpenAI¶

result = hx.analyze(
    df,
    "Is salary correlated with years of experience?",
    backend="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o-mini",              # or "gpt-4o"
    temperature=0.0,
    max_tokens=256,
)

Ollama (local, offline, free)¶

result = hx.analyze(
    df,
    "Are there differences in performance scores across teams?",
    backend="ollama",
    model="phi4",                     # default: llama3.2
    host="http://localhost:11434",
    timeout=120,
)

Azure OpenAI¶

result = hx.analyze(
    df,
    "Do departments differ in performance?",
    backend="azure",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    base_url="https://<resource>.openai.azure.com",
    model="<deployment-name>",
    api_version="2024-02-01",
)

Together AI¶

result = hx.analyze(
    df,
    "Do groups differ?",
    backend="together",
    api_key=os.environ["TOGETHER_API_KEY"],
    model="meta-llama/Llama-3-70b-chat-hf",
)

Mistral AI¶

result = hx.analyze(
    df,
    "Is there an association between region and sales tier?",
    backend="mistral",
    api_key=os.environ["MISTRAL_API_KEY"],
    model="mistral-small-latest",
)

Perplexity AI¶

result = hx.analyze(
    df,
    "Compare satisfaction across customer segments",
    backend="perplexity",
    api_key=os.environ["PERPLEXITY_API_KEY"],
    model="llama-3.1-sonar-small-128k-online",
)

HuggingFace Inference API (cloud, free tier)¶

result = hx.analyze(
    df,
    "Are gender and department related?",
    backend="huggingface",
    token=os.environ["HF_TOKEN"],
    model="HuggingFaceH4/zephyr-7b-beta",
)

HuggingFace Local¶

pip install transformers torch

result = hx.analyze(
    df,
    "Is income different across regions?",
    backend="huggingface",
    model="microsoft/Phi-3.5-mini-instruct",
    use_local=True,
    device="cuda",   # or "cpu"
)

Custom callable¶

Wrap any callable(messages: list) -> str as a backend:

result = hx.analyze(
    df,
    "Is height correlated with weight?",
    backend=lambda msgs: my_llm_function(msgs[-1]["content"]),
)

Custom LLMBackend subclass¶

Subclass LLMBackend to integrate any LLM that’s not yet built-in:

import hypotestx as hx

class MyCompanyLLM(hx.LLMBackend):
    name = "my_llm"

    def chat(self, messages: list[dict]) -> str:
        """
        messages: [{"role": "system", "content": ...},
                   {"role": "user",   "content": ...}]
        Must return a JSON string matching the RoutingResult schema.
        """
        prompt = messages[-1]["content"]
        return my_internal_api.complete(prompt)

result = hx.analyze(df, "Is satisfaction higher in Q4?", backend=MyCompanyLLM())

The chat() method only needs to return a valid JSON routing response — all prompt construction, JSON extraction, and validation is handled by the base class route() method.

Custom OpenAI-compatible Endpoint¶

For self-hosted models (vLLM, LiteLLM, Ollama OpenAI mode, …):

result = hx.analyze(
    df,
    "Compare groups",
    backend="openai",
    api_key="any-string",              # required field even if unused
    base_url="https://my-vllm/v1",
    model="my-fine-tuned-model",
)

Security: API Key Best Practices¶

Never hard-code API keys in source code or commit them to version control.

import os

# Load from environment
result = hx.analyze(
    df, "Do groups differ?",
    backend="gemini",
    api_key=os.environ["GEMINI_API_KEY"],
)

With python-dotenv:

from dotenv import load_dotenv
load_dotenv()   # reads .env file into os.environ
import os, hypotestx as hx

result = hx.analyze(df, "...", backend="groq",
                    api_key=os.environ["GROQ_API_KEY"])

Add .env to your .gitignore to prevent key leaks.