The Human Harness

By Joel Dare - Written April 28, 2026

The way I interact with open weight AI models is different, today, than the way I interact with foundation models.

I haven’t found a harness that works well with agentic processes in small models that I can run on constrained systems. My MacBook Air only has 16GB of RAM (a decision I regret). The model weights and context must fit into RAM to maintain speed.

There’s a common prediction that the costs of foundation models will rise sharply and it feels like we’re already starting to see that today.

So, that takes me back to the “old school” method of being the Human Harness for the AI model. Doing so, however, may help me learn these models and, maybe, apply those lessons to a future software harness.

This means breaking the instructions down to very small pieces that I can instruct the AI model with.

Write an HTML page using pure HTML…
Add a chat box at the bottom…
Add a label to the chat box…
When the user hits enter in the chat box…

This is an iterative approach, which I prefer anyway, but it also tends to be a copy/paste approach, which is where I really feel the pain.

One the plus side, when I’m having the model write the code in very small pieces, I tend to understand the code. When I let the AI agents write the code, I really only understand the UI and the behavior of the software.

We have good AI models that can run on modest hardware like Qwen 3.6:35b (24GB), Gemma 4:e4b (9.6GB) or the older but smaller Qwen 3.5:9b (6.6GB). The Qwent 3.5:9b model is probably the sweet spot for my laptop right now.

I even built my own experimental harness, called Peen, to see if I could improve on it. It works, but just barely. Given enough time and attention I’m sure it could be improved, but it takes a lot of time and experimentation.