Builder Spotlight: NodeGhost
Builder Spotlight
What the Acurast tunnel and NodeGhost do, and what happens when you put them together
The usual way to add an LLM to a product is to point an OpenAI-compatible client at a hosted API and send the prompt to a third-party provider’s servers. The model isn’t yours, and the machine isn’t yours. Self-hosting normally just moves the workload onto rented cloud GPUs.
This example, from the Acurast app-tunnel family, runs the model on a single attested smartphone instead. The Acurast tunnel makes that locally-bound model reachable at a public HTTPS URL, and NodeGhost serves it as a drop-in OpenAI-compatible endpoint through its Bring Your Own Model support. Two capabilities, and one registration step between them.
It’s an ordinary OpenAI call. The model answering it runs on a phone.
Project At A Glance
01 /
What It Is
A quantized Qwen2.5-3B model running on one attested Acurast processor, consumed as an ordinary OpenAI-compatible endpoint.
02 /
What Makes It Possible
Two capabilities: the Acurast tunnel for reachability, and NodeGhost’s BYOM gateway for a standard API surface. One endpoint registration joins them.
03 /
Status
A working reference example, deployable on Acurast canary today, and honest about the throughput of a 3B model on a mobile CPU.
01 / SECTION
What the Acurast Tunnel Does
Make a locally-bound service on a NAT’d device reachable from anywhere, with no inbound connectivity.
An Acurast processor sits behind carrier NAT with no public IP. A service bound to 127.0.0.1 on it is, by default, invisible from outside. The tunnel is what changes that.
Inbound reachability with no inbound connection. The phone never accepts an incoming connection. It dials out to a set of relays and keeps that connection open, and traffic sent to the tunnel’s public URL travels back down the same line to the local port. Carrier NAT and the lack of a public IP don’t get in the way, because nothing on the phone is waiting for inbound traffic.
A public HTTPS URL with automatic TLS. Each tunnel comes up at
https://<clientId>.<your-domain>:8443, with certificates issued automatically over ACME. The caller talks plain HTTPS to an ordinary subdomain.A fresh identity on every run. Each deployment generates a new P-256 key as its tunnel identity, and that key derives a new subdomain. The URL is per-deployment: it works only while the deployment runs and stops when it ends, and the deployment reports its actual URL back through a callback as it starts up. If you want a stable address, you can persist the key in the bundle (a leaked bundle then carries the address) or have the consumer re-register when the URL changes.
It forwards a port, not a protocol. The tunnel has no idea it’s carrying LLM traffic; it forwards a TCP port. The sibling app-tunnel/cargo example runs SSH over the same tunnel client, which shows the tunnel is a general-purpose primitive and the service behind it is interchangeable.
It runs on attested devices. The deployment only matches attested Acurast devices, and the tunnel client runs in the Shell runtime’s proot sandbox alongside the workload, coordinating with the processor over a local JSON-RPC bridge.
02 / SECTION
What NodeGhost Does, and the BYOM Hand-off
A drop-in OpenAI API in front of a backend that can live anywhere, including on a phone.
The README describes NodeGhost in narrow terms, and that is all this example relies on: it routes inference over Pocket Network, exposes an OpenAI-compatible API, and supports Bring Your Own Model.
A drop-in OpenAI-compatible API. NodeGhost exposes the same interface as the OpenAI API, so existing clients, libraries, and tooling keep working after a base-URL change.
Routed over Pocket Network. Requests route over the POKT decentralized layer rather than hitting a single provider’s endpoint directly.
Bring Your Own Model. Instead of a model NodeGhost hosts, an operator points an API key at their own OpenAI-compatible backend. In this example that backend is the llama-server on the Acurast processor, reached over the tunnel.
The hand-off. Once the tunnel reports its URL, you register it with NodeGhost as a BYOM endpoint. (The example’s register call omits the endpoint key, since llama-server runs unauthenticated by default.) After that, a standard chat-completions request routes through NodeGhost and over the tunnel to the model on the phone, and the response carries the model name, which confirms where it was served.
03 / SECTION
How the Two Compose
One registration turns reachability plus a standard API into a request that lands on a phone.
On their own, tunnels and OpenAI-compatible gateways are both ordinary. The point of the example is the join: with one extra step, registering the tunnel URL as a BYOM backend, a phone-hosted model becomes a drop-in inference endpoint, and the calling application doesn’t change at all.
Device · Acurast Processor (Shell runtime / proot)
llama-server :8080
Qwen2.5-3B-Instruct-Q4_K_M
Qwen2.5-3B-Instruct-Q4_K_M
↓
Tunnel · Acurast Tunnel
dial-out to relays -> https://<clientId>.<your-domain>:8443
(ephemeral P-256 identity, ACME TLS, callback-reported URL)
(ephemeral P-256 identity, ACME TLS, callback-reported URL)
↓
Gateway · NodeGhost (OpenAI-compatible, POKT-routed)
tunnel URL registered as a BYOM endpoint
POST https://<gateway>/v1/chat/completions
-> routed over the tunnel -> inference on the phone
POST https://<gateway>/v1/chat/completions
-> routed over the tunnel -> inference on the phone
04 / SECTION
What’s Working Today
A deployable example that lands an OpenAI request on a phone-hosted model.
On deploy, the processor sets up a proot Ubuntu environment, builds a small loopback shim so the local server binds correctly inside the sandbox, downloads llama.cpp and the Qwen2.5-3B model at runtime, and starts the server behind a health-gated readiness loop. Only once the model has loaded does the tunnel open.
You follow the sequence through the callback receiver: environment setup, the downloads, the model load, the model going ready, and finally a
started event carrying the public tunnel URL. Register that URL with NodeGhost, send a normal chat request, and the answer comes back from the phone.Try it now.
Explore the example, deploy it on canary, and watch a request land on a phone: github.com/Acurast/acurast-example-apps/tree/main/apps/app-tunnel/llm
05 / SECTION
The Honest Ceiling
A capability has limits, and this one is easy to name.
A 3B model quantized to Q4, running on a mobile ARM CPU, generates around 3 tokens per second. A reply of roughly 120 tokens takes about 45 seconds, and the model makes basic mistakes on multi-step reasoning. That’s the ceiling of this model on this hardware, not a limit of the tunnel, the runtime, or the network.
In practice the pattern fits asynchronous, batch, or short-context work: background summarization, classification, queued generation, agent steps that aren’t on a human’s critical path. It isn’t built for interactive chat. Picking a smaller model for throughput, or accepting the latency because you want the model on hardware you control, are both fair calls.
Where This Generalizes
The tunnel doesn’t care what it forwards, and NodeGhost doesn’t require the backend to be hosted. The cargo sibling proves the point by running SSH over the same tunnel. Any locally-bound service on an attested device can be reached this way, and any OpenAI-shaped workload can sit in front of it unchanged. The model here is just what happened to be on the other end.
About the Stack
NodeGhost is a decentralized AI inference gateway that routes over Pocket Network, exposes an OpenAI-compatible API, and supports Bring Your Own Model. Read more at nodeghost.ai.
Pocket Network is the decentralized layer NodeGhost routes requests over. Read more at pocket.network.
Acurast is a decentralized network of attested smartphones providing distributed edge compute, where workloads are deployed to attested devices across the network. Read more at acurast.com.
This example is an Acurast reference build, published in the acurast-example-apps repository.
Building on Acurast?
If you’re running your own models, agent backends, or other services on attested devices, come talk to us. Join the Discord.