Developers working with the Internet Computer (IC) can now integrate large language models (LLMs) directly into their canisters with just a few lines of code. The latest update from the DeAI working group introduces a streamlined way to access AI-powered agents, marking a significant step towards more intelligent, decentralised applications.
This development means that AI interactions on the IC no longer require external services or complex integrations. With this update, developers can prompt LLMs from their canisters using Rust or Motoko, the two primary programming languages for the IC. This includes simple queries as well as dynamic chat interactions, where multiple messages can be exchanged within a session.
AI workers are at the heart of this system, ensuring that prompts are processed efficiently. These stateless nodes retrieve queued prompts, execute them, and return responses. While the system is currently managed by the DFINITY team, the long-term plan is to transition control to the protocol and fully decentralise the AI worker network. Feedback from developers will play a crucial role in shaping this process.
At present, the LLM canister supports the Llama 3.1 8B model, with more models planned based on user demand. Costs are not yet in place, but as the system matures, a pricing model will be determined. There are limitations on message length and output size, but these constraints will be adjusted over time to improve usability.
Privacy remains a key consideration. While individual prompts cannot be linked to specific users, AI workers currently lack full confidentiality guarantees. The team is exploring options such as trusted execution environments (TEEs) to address this issue. In the meantime, DFINITY does not log individual prompts but does track overall usage metrics to refine the system.
For those eager to experiment, libraries for Rust and Motoko are already available, making it easy to integrate AI into new or existing canisters. Contributions in other languages, such as Typescript or Python, would be welcome additions to the ecosystem.
Looking ahead, improving latency is a top priority. Current efforts focus on reducing response times and enabling non-replicated modes for lower-latency experiences. Another area of exploration is the decentralisation of AI workers, potentially allowing node providers or even individuals with suitable hardware to contribute to the network.
This shift in strategy represents a pragmatic approach to AI on the IC. Early research into running LLMs inside canisters revealed performance bottlenecks that limited scalability. While optimisations for handling I/O and matrix multiplication are in progress, the current AI worker model provides a faster and more scalable solution in the interim.
Developers are encouraged to test the system, build AI-powered agents, and share feedback. This iterative approach ensures that improvements are driven by real-world use cases, ultimately leading to a more robust and decentralised AI ecosystem on the Internet Computer.