Local-First and Fully Traced: ... Note

Local-First and Fully Traced: Routing Between Ollama, Foundry Local, and Microsoft Foundry

Publishing agent projects involves a tension between powerful cloud-based AI behaviors and the limited patience of users who will only try a project for a few minutes. To address this, a hybrid approach routes requests to different tiers of models based on availability, under a single contract. This ensures that even if cloud services fail, a local fallback with the same schema and code path is used. Forkability, or the ability to run a project on another's machine, is made reliable through this approach. Observability, through detailed logging and tracing, builds user trust by making it clear which path served each request and why.The system prioritizes local models but can seamlessly fall back to cloud-based Foundry models if local options are unavailable or encounter errors. This resilience is managed automatically within functions like create_chat_completion, which handles multiple failure modes without the caller needing to intervene. When a fallback occurs, it is explicitly logged and made visible in the replay log, providing a transparent record of the process. The system allows for per-role routing, enabling different agents within the system to utilize specific models, whether cloud or local. Runtime configuration can be adjusted through a settings console, permitting changes to routing modes and model assignments without restarting the application. Timeouts and retries are strictly bounded to prevent the system from stalling, ensuring a fast and informative error experience for users.