Quick Start
Get COREtex running locally in minutes with Docker and Ollama.
Prerequisites: Ollama running on the host machine, and Docker or Podman with Compose.
1. Pull a model
ollama pull llama3.2:3b
2. Start the stack
docker compose up --build
| Service | URL |
|---|---|
| OpenWebUI | http://localhost:3000 |
| Ingress API | http://localhost:8000 |
3. Send a request
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"input": "Compare Kubernetes and Nomad"}'
# → {"intent":"analysis","confidence":0.9,"response":"..."}
# Request file reading via tool call
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"input": "Read the file /etc/hostname"}'
If your input contains an apostrophe (
I'm,don't), it will close the shell string and curl will appear to freeze. Use'\''to escape, or write the payload to a file and pass-d @body.json.
Using OpenWebUI
Browse to http://localhost:3000, create a local account, select the agentic model from the dropdown, and type any message.
Single-turn only: the
/v1/chat/completionsshim extracts only the most recent user message. Prior turns are visible in the chat history but are not sent to the pipeline — each request is processed independently. This is deliberate.
Configuration options
COREtex v0.5 routes inference through the ModelProvider abstraction. The default provider is "ollama", so existing deployments keep working without any new configuration.
Use a remote Ollama instance:
OLLAMA_BASE_URL=http://192.168.1.50:11434 docker compose up --build
Change models:
CLASSIFIER_MODEL=llama3.2:3b WORKER_MODEL=llama3.1:8b docker compose up --build
This lets you use different models for classification and response generation while still going through the same registered provider.
All settings are overridable via environment variables or a .env file.
| Variable | Default | Purpose |
|---|---|---|
OLLAMA_BASE_URL |
http://host.docker.internal:11434 |
Ollama endpoint |
CLASSIFIER_MODEL |
llama3.2:3b |
Model used for intent classification |
WORKER_MODEL |
llama3.2:3b |
Model used for response generation |
CLASSIFIER_TIMEOUT |
60 |
Classifier HTTP timeout (seconds) |
WORKER_TIMEOUT |
300 |
Worker HTTP timeout (seconds) |
MAX_TOKENS |
256 |
Max tokens generated by the worker |
LOG_LEVEL |
INFO |
DEBUG, INFO, or WARNING |
DEBUG_ROUTER |
false |
Log event=router_decision at DEBUG |
Run tests (no Docker required)
pip install -r requirements.txt
python3 -m pytest tests/test_smoke.py -v