slaveOftime

Open Relay node federation only matters if supervision survives across machines

2026-04-21

Open Relay gets more interesting when it stops being tied to one terminal on one machine.

That is the part I want to push harder now.

The README says oly is for running interactive CLIs and AI agents like managed services: start a command once, detach, inspect logs later, send input only when needed, reattach when you want full control. It also says you can control it from more than one place and route work to connected nodes.

I think that last part is easy to misread.

Node federation is not interesting to me because "distributed systems" sounds impressive. It is interesting because real work is messy:

  • maybe the fastest GPU box is not the one I am sitting at
  • maybe the build or installer only makes sense on a Windows machine
  • maybe a long-running agent should stay next to the repo cache, the browser session, or the credentials it already needs

So yes, I want to route work to other machines.

But routing work alone is not the value.

If all I can say is "the process is now running over there," I have not solved much. I just moved the fragility to another box.

What matters is whether one operator can still supervise the session cleanly after the work moves:

  • see what is running
  • inspect logs without attaching
  • wait for likely human checkpoints
  • send the missing input at the right moment
  • reattach and take over when needed

That is why the README examples around federation matter:

oly start --node worker-1 --title "nightly task" --detach claude
oly logs --node worker-1 --wait-for-prompt <id>
oly send <id> "continue" key:enter

That is not a cluster scheduler story. It is a supervision story.

I care about this distinction because Open Relay is not trying to be a flashy terminal replacement. The project is a control layer for long-running CLI and agent sessions. The promise is simple: the daemon owns the session, not one fragile terminal tab, and the human stays able to step back in when the work actually needs judgment.

Once connected nodes enter the picture, that promise gets tested for real.

Suppose I launch a coding agent on another machine because that node has the right environment. The useful question is not whether the process can start there. The useful question is whether I can still manage it like it is part of one coherent system:

  • does the primary still show me the session clearly
  • do logs still come back in a way I can trust
  • can I notice the input-needed checkpoint before the run stalls forever
  • can I intervene surgically instead of opening a full remote desktop just to press Enter

If the answer is no, then federation is mostly theater. The job moved, but control did not.

That is why I increasingly describe Open Relay as "run interactive CLIs and AI agents like managed services" instead of just "a tool to detach terminal jobs." The service feeling comes from supervision, recoverability, and human handoff staying intact even when the underlying work is long-running, approval-heavy, or happening on another node.

I want one operator to be able to start on a local machine, route the next session to a connected node, check logs from the web UI or CLI, and still stay firmly in the loop when the session needs a decision. That is the real value proposition behind routing work across machines.

So my current view is simple:

Node federation in Open Relay only matters if supervision survives across machines.

If it does, then connected nodes stop being a demo and start becoming a practical way to place work where it belongs without giving up human control.

Project: https://github.com/slaveOftime/open-relay

Do you like this post?