Quick Start

Get COREtex running locally in minutes with Docker and Ollama.

Prerequisites: Ollama running on the host machine, and Docker or Podman with Compose.

1. Pull a model

ollama pull llama3.2:3b

2. Start the stack

docker compose up --build

Service	URL
OpenWebUI	http://localhost:3000
Ingress API	http://localhost:8000

3. Send a request

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"input": "Compare Kubernetes and Nomad"}'
# → {"intent":"analysis","confidence":0.9,"response":"..."}

# Request file reading via tool call
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"input": "Read the file /etc/hostname"}'

If your input contains an apostrophe (I'm, don't), it will close the shell string and curl will appear to freeze. Use '\'' to escape, or write the payload to a file and pass -d @body.json.

Using OpenWebUI

Browse to http://localhost:3000, create a local account, select the agentic model from the dropdown, and type any message.

Single-turn only: the /v1/chat/completions shim extracts only the most recent user message. Prior turns are visible in the chat history but are not sent to the pipeline — each request is processed independently. This is deliberate.

Configuration options

COREtex v0.5 routes inference through the ModelProvider abstraction. The default provider is "ollama", so existing deployments keep working without any new configuration.

Use a remote Ollama instance:

OLLAMA_BASE_URL=http://192.168.1.50:11434 docker compose up --build

Change models:

CLASSIFIER_MODEL=llama3.2:3b WORKER_MODEL=llama3.1:8b docker compose up --build

This lets you use different models for classification and response generation while still going through the same registered provider.

All settings are overridable via environment variables or a .env file.

Variable	Default	Purpose
`OLLAMA_BASE_URL`	`http://host.docker.internal:11434`	Ollama endpoint
`CLASSIFIER_MODEL`	`llama3.2:3b`	Model used for intent classification
`WORKER_MODEL`	`llama3.2:3b`	Model used for response generation
`CLASSIFIER_TIMEOUT`	`60`	Classifier HTTP timeout (seconds)
`WORKER_TIMEOUT`	`300`	Worker HTTP timeout (seconds)
`MAX_TOKENS`	`256`	Max tokens generated by the worker
`LOG_LEVEL`	`INFO`	`DEBUG`, `INFO`, or `WARNING`
`DEBUG_ROUTER`	`false`	Log `event=router_decision` at DEBUG

Run tests (no Docker required)

pip install -r requirements.txt
python3 -m pytest tests/test_smoke.py -v