Dominic Williams has thrown open the doors to a fresh AI experiment on the Internet Computer, inviting developers to engage with what he calls ‘AI workers.’ This feature, still in beta, allows smart contract developers to integrate AI-driven capabilities directly into their applications via ICP APIs. The goal is to provide deterministic inference today, with trustless replicated inference and custom models in the pipeline.
AI workers function as stateless nodes, dedicated solely to processing large language model (LLM) prompts. The system operates in a straightforward sequence. Canisters send prompts to the LLM canister, which holds them in a queue. AI workers poll the canister for tasks, execute the prompts, and return responses back to the LLM canister. The response then travels to the originating canister, completing the loop. This first iteration is an MVP, meaning developers should expect changes as the model evolves based on user feedback. DFINITY is taking an iterative approach instead of spending time perfecting an API before opening it to developers. The LLM canister and AI workers remain under DFINITY’s control for now, but decentralisation is on the roadmap.
To make adoption easier, DFINITY has rolled out supporting libraries in Rust and Motoko. Contributions in other languages such as Typescript and Python are encouraged, opening the door for community-driven development. Developers eager to test the waters can access source code and demos to get started.
At present, only the Llama 3.1 8B model is supported, but this is set to expand based on community demand. The service is free during the early phase, with pricing to be determined once the system and its applications mature. There are some restrictions, including a cap of ten messages per chat request, a 10KiB maximum prompt length, and a 200-token output limit. These constraints will be refined over time to improve usability.
Privacy remains a work in progress. While AI workers do not log individual prompts, someone running an AI worker could, in theory, see them—albeit without being able to trace them back to a user. DFINITY is exploring confidential computing solutions, such as trusted execution environments (TEEs), though their feasibility depends on community interest and practical implementation costs.
For now, the source code for the LLM canister and AI workers is not publicly available, as DFINITY considers the current implementations to be experimental prototypes. Open-sourcing will happen as the technology matures. In the meantime, efforts are underway to improve request latency and offer non-replicated processing modes to enhance speed. The broader vision includes fully decentralised AI workers, with options ranging from deployment across node providers to enabling individuals to run AI workers from home or private data centres.
DFINITY has previously explored running LLMs inside canisters but faced bottlenecks in performance. Even with optimisations such as enhanced input/output handling and refined matrix multiplication, only relatively small models—up to around one billion parameters—would be feasible. The new AI worker approach sidesteps these issues, allowing more complex models (8B, 70B, and beyond) to be used without compromising the Internet Computer’s trust principles.
By rolling out AI workers now, DFINITY aims to enable developers to start building AI-driven applications immediately. The team remains keen to hear from users, especially regarding decentralisation, performance improvements, and potential new AI models. Those interested can explore the system, test different applications, and share feedback to shape the next steps in decentralised AI evolution.