# LLM Backends

HypoTestX uses LLM backends to parse plain-English questions into structured
routing decisions.  All backends implement the `LLMBackend` abstract base class —
you can swap them with a single keyword argument, or build your own.

---

## Backend Summary

| `backend=` string | Provider | Cost | Default model | Extra deps |
|---|---|---|---|---|
| `None` / `"fallback"` | Built-in regex | Free, offline | — | None |
| `"ollama"` | Local Ollama | Free, offline | `llama3.2` | Ollama app |
| `"gemini"` | Google Gemini | Free (1 500 req/day) | `gemini-2.0-flash` | None |
| `"groq"` | Groq Cloud | Free tier | `llama-3.3-70b-versatile` | None |
| `"openai"` | OpenAI | Paid | `gpt-4o-mini` | None |
| `"azure"` | Azure OpenAI | Paid | *(deployment name)* | None |
| `"together"` | Together AI | Free tier | `meta-llama/Llama-3-70b-chat-hf` | None |
| `"mistral"` | Mistral AI | Free tier | `mistral-small-latest` | None |
| `"perplexity"` | Perplexity AI | Free tier | `llama-3.1-sonar-small-128k-online` | None |
| `"huggingface"` | HF Inference API / local | Free tier / Local | `zephyr-7b-beta` | `transformers` (local only) |

---

## Common kwargs

All backends accept these keyword arguments via `hx.analyze()`:

| kwarg | applicable backends | description |
|---|---|---|
| `api_key` | gemini, groq, openai, together, mistral, perplexity, azure | Required for cloud providers |
| `model` | all | Override the default model name / ID |
| `temperature` | gemini, openai-compat, huggingface | Sampling temperature; `0` = deterministic |
| `max_tokens` | gemini, openai-compat, huggingface | Max tokens in the LLM response |
| `timeout` | all | HTTP timeout in seconds (default: `60`) |
| `host` | ollama | Server URL (default: `http://localhost:11434`) |
| `options` | ollama | Dict forwarded to Ollama model options |
| `token` | huggingface | HF access token for Inference API |
| `use_local` | huggingface | Load model locally via `transformers` |
| `device` | huggingface local | `"cpu"` or `"cuda"` |
| `base_url` | openai-compat, azure | Override the API base URL |
| `api_version` | azure | Azure API version (default: `"2024-02-01"`) |
| `extra_headers` | openai-compat | Additional HTTP headers dict |
| `backend_options` | all | Dict of extra backend-specific kwargs (passthrough) |

---

## Code Examples

### Regex Fallback (default, offline, no API key)

```python
import hypotestx as hx

result = hx.analyze(df, "Do males earn more than females?")
# Uses FallbackBackend automatically — no API key needed
# routing_confidence = 0.6
```

To suppress the routing warning:

```python
result = hx.analyze(df, "Do males earn more?", warn_fallback=False)
```

---

### Google Gemini

```python
import os, hypotestx as hx

result = hx.analyze(
    df,
    "Is there a salary difference between engineering and sales?",
    backend="gemini",
    api_key=os.environ["GEMINI_API_KEY"],
    model="gemini-2.0-flash",        # or "gemini-2.0-flash-lite"
    temperature=0.0,
    max_tokens=512,
)
```

---

### Groq (free tier, very fast)

```python
result = hx.analyze(
    df,
    "Is employee satisfaction correlated with tenure?",
    backend="groq",
    api_key=os.environ["GROQ_API_KEY"],
    model="llama-3.3-70b-versatile",  # or "mixtral-8x7b-32768"
    temperature=0.0,
)
```

---

### OpenAI

```python
result = hx.analyze(
    df,
    "Is salary correlated with years of experience?",
    backend="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o-mini",              # or "gpt-4o"
    temperature=0.0,
    max_tokens=256,
)
```

---

### Ollama (local, offline, free)

```python
result = hx.analyze(
    df,
    "Are there differences in performance scores across teams?",
    backend="ollama",
    model="phi4",                     # default: llama3.2
    host="http://localhost:11434",
    timeout=120,
)
```

---

### Azure OpenAI

```python
result = hx.analyze(
    df,
    "Do departments differ in performance?",
    backend="azure",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    base_url="https://<resource>.openai.azure.com",
    model="<deployment-name>",
    api_version="2024-02-01",
)
```

---

### Together AI

```python
result = hx.analyze(
    df,
    "Do groups differ?",
    backend="together",
    api_key=os.environ["TOGETHER_API_KEY"],
    model="meta-llama/Llama-3-70b-chat-hf",
)
```

---

### Mistral AI

```python
result = hx.analyze(
    df,
    "Is there an association between region and sales tier?",
    backend="mistral",
    api_key=os.environ["MISTRAL_API_KEY"],
    model="mistral-small-latest",
)
```

---

### Perplexity AI

```python
result = hx.analyze(
    df,
    "Compare satisfaction across customer segments",
    backend="perplexity",
    api_key=os.environ["PERPLEXITY_API_KEY"],
    model="llama-3.1-sonar-small-128k-online",
)
```

---

### HuggingFace Inference API (cloud, free tier)

```python
result = hx.analyze(
    df,
    "Are gender and department related?",
    backend="huggingface",
    token=os.environ["HF_TOKEN"],
    model="HuggingFaceH4/zephyr-7b-beta",
)
```

### HuggingFace Local

```bash
pip install transformers torch
```

```python
result = hx.analyze(
    df,
    "Is income different across regions?",
    backend="huggingface",
    model="microsoft/Phi-3.5-mini-instruct",
    use_local=True,
    device="cuda",   # or "cpu"
)
```

---

### Custom callable

Wrap any `callable(messages: list) -> str` as a backend:

```python
result = hx.analyze(
    df,
    "Is height correlated with weight?",
    backend=lambda msgs: my_llm_function(msgs[-1]["content"]),
)
```

---

### Custom LLMBackend subclass

Subclass `LLMBackend` to integrate any LLM that's not yet built-in:

```python
import hypotestx as hx

class MyCompanyLLM(hx.LLMBackend):
    name = "my_llm"

    def chat(self, messages: list[dict]) -> str:
        """
        messages: [{"role": "system", "content": ...},
                   {"role": "user",   "content": ...}]
        Must return a JSON string matching the RoutingResult schema.
        """
        prompt = messages[-1]["content"]
        return my_internal_api.complete(prompt)

result = hx.analyze(df, "Is satisfaction higher in Q4?", backend=MyCompanyLLM())
```

The `chat()` method only needs to return a valid JSON routing response — all
prompt construction, JSON extraction, and validation is handled by the base class
`route()` method.

---

## Custom OpenAI-compatible Endpoint

For self-hosted models (vLLM, LiteLLM, Ollama OpenAI mode, …):

```python
result = hx.analyze(
    df,
    "Compare groups",
    backend="openai",
    api_key="any-string",              # required field even if unused
    base_url="https://my-vllm/v1",
    model="my-fine-tuned-model",
)
```

---

## Security: API Key Best Practices

**Never hard-code API keys in source code or commit them to version control.**

```python
import os

# Load from environment
result = hx.analyze(
    df, "Do groups differ?",
    backend="gemini",
    api_key=os.environ["GEMINI_API_KEY"],
)
```

With `python-dotenv`:

```python
from dotenv import load_dotenv
load_dotenv()   # reads .env file into os.environ
import os, hypotestx as hx

result = hx.analyze(df, "...", backend="groq",
                    api_key=os.environ["GROQ_API_KEY"])
```

Add `.env` to your `.gitignore` to prevent key leaks.