Quick Start

Get COREtex running locally in minutes with Docker and Ollama.

Prerequisites: Ollama running on the host machine, and Docker or Podman with Compose.

1. Pull a model

ollama pull llama3.2:3b

2. Start the stack

docker compose up --build
Service URL
OpenWebUI http://localhost:3000
Ingress API http://localhost:8000

3. Send a request

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"input": "Compare Kubernetes and Nomad"}'
# → {"intent":"analysis","confidence":0.9,"response":"..."}
# Request file reading via tool call
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"input": "Read the file /etc/hostname"}'

If your input contains an apostrophe (I'm, don't), it will close the shell string and curl will appear to freeze. Use '\'' to escape, or write the payload to a file and pass -d @body.json.

Using OpenWebUI

Browse to http://localhost:3000, create a local account, select the agentic model from the dropdown, and type any message.

Single-turn only: the /v1/chat/completions shim extracts only the most recent user message. Prior turns are visible in the chat history but are not sent to the pipeline — each request is processed independently. This is deliberate.

Configuration options

COREtex v0.5 routes inference through the ModelProvider abstraction. The default provider is "ollama", so existing deployments keep working without any new configuration.

Use a remote Ollama instance:

OLLAMA_BASE_URL=http://192.168.1.50:11434 docker compose up --build

Change models:

CLASSIFIER_MODEL=llama3.2:3b WORKER_MODEL=llama3.1:8b docker compose up --build

This lets you use different models for classification and response generation while still going through the same registered provider.

All settings are overridable via environment variables or a .env file.

Variable Default Purpose
OLLAMA_BASE_URL http://host.docker.internal:11434 Ollama endpoint
CLASSIFIER_MODEL llama3.2:3b Model used for intent classification
WORKER_MODEL llama3.2:3b Model used for response generation
CLASSIFIER_TIMEOUT 60 Classifier HTTP timeout (seconds)
WORKER_TIMEOUT 300 Worker HTTP timeout (seconds)
MAX_TOKENS 256 Max tokens generated by the worker
LOG_LEVEL INFO DEBUG, INFO, or WARNING
DEBUG_ROUTER false Log event=router_decision at DEBUG

Run tests (no Docker required)

pip install -r requirements.txt
python3 -m pytest tests/test_smoke.py -v