A new release from Onicai is aiming to simplify how developers run large language models directly on blockchain infrastructure, with the launch of llama_cpp_canister v0.9.0.
The update builds on earlier versions by refining deployment tools and improving compatibility with the latest icp-py-core framework. At its core, the canister allows developers to run models from llama.cpp as smart contracts on the Internet Computer, enabling AI systems to operate fully on-chain.
The approach centres on using the gguf file format, which supports a range of open-source language models. Developers can upload these models to a canister and run inference directly within the network, positioning the tool as part of a broader effort to move AI workloads away from traditional off-chain infrastructure.
Onicai highlights security and transparency as key motivations behind the release. Running models on-chain can reduce reliance on external servers, which may appeal to projects that prioritise data control and verifiability. At the same time, the system is designed to remain accessible, with prebuilt files included in the release to simplify setup for developers who may not want to compile components from scratch.
The project is open source under an MIT licence and includes documentation, testing frameworks and continuous integration checks. A smoke testing setup using pytest is also available, giving developers a way to validate deployments before moving into production environments.
Several early-stage projects are already using the canister as the core engine for on-chain AI agents. These include funnAI, IConfucius and ICGPT, each experimenting with different use cases ranging from token-driven ecosystems to conversational interfaces powered by decentralised models.
While the release introduces new capabilities, it also highlights the technical constraints of running AI directly on-chain. Limitations such as instruction caps and resource usage mean developers need to manage how models are loaded and executed, often breaking tasks into multiple steps to stay within network limits.
There are also practical considerations around performance and scalability. On-chain inference can offer transparency and control, though it may not yet match the speed or efficiency of traditional cloud-based systems. This leaves room for ongoing experimentation as the technology matures.
The update reflects a wider push within the blockchain space to integrate AI more directly into decentralised environments. By making it easier to deploy and run models within the network, tools like llama_cpp_canister suggest a future where AI agents operate with fewer external dependencies, though adoption will likely depend on how these systems perform under real-world conditions.
Dear Reader,
Ledger Life is an independent platform dedicated to covering the Internet Computer (ICP) ecosystem and beyond. We focus on real stories, builder updates, project launches, and the quiet innovations that often get missed.
We’re not backed by sponsors. We rely on readers like you.
If you find value in what we publish—whether it’s deep dives into dApps, explainers on decentralised tech, or just keeping track of what’s moving in Web3—please consider making a donation. It helps us cover costs, stay consistent, and remain truly independent.
Your support goes a long way.
🧠 ICP Principal: ins6i-d53ug-zxmgh-qvum3-r3pvl-ufcvu-bdyon-ovzdy-d26k3-lgq2v-3qe
🧾 ICP Address: f8deb966878f8b83204b251d5d799e0345ea72b8e62e8cf9da8d8830e1b3b05f
Every contribution helps keep the lights on, the stories flowing, and the crypto clutter out.
Thank you for reading, sharing, and being part of this experiment in decentralised media.
—Team Ledger Life





Community Discussion