The Room That Runs Itself
I ran a full design thinking workshop on a Sunday afternoon, no flights, no sticky notes, just nine AI agents configured to disagree. For under $10, they reframed the problem entirely.
The hotel meeting room has been configured for creativity. Someone has removed the rectangular conference table and replaced it with clusters of small round ones. There are markers in four colors, uncapped. The sticky notes, three colors, are arranged in neat bricks beside each workstation, and there is something almost touching about the optimism of that arrangement, the belief that the right stationery in sufficient quantity will help twelve people from four countries who mostly communicate by Slack, to agree on what their company should build next.
Someone has flown in from Tel Aviv. Someone else from Singapore. The VP of product has blocked his calendar through Thursday and made arrangements for his children that his wife described as complicated. The hotel has charged the company four hundred and twenty dollars per night per person, roughly the cost of two months of cloud computing credits, and the methodology those twelve people are about to spend three days running is one that, in its entirety, can now be completed on a laptop in an afternoon for the cost of a cup of coffee in API fees.
It is worth saying clearly: this room has produced remarkable things. The products most of us use every day were conceived in rooms exactly like this one, by people arguing around small round tables with markers in their hands. I have attended and led dozens of these workshops across four continents and they were among the most stimulating professional experiences of my career. Smart people, freed from their calendars, forced to listen before they are allowed to propose, produce a quality of thinking that is difficult to replicate on a Tuesday afternoon between meetings.
I know this because this past winter, sitting alone on a Sunday, I did exactly that. I ran a full five-phase design thinking workshop with nine AI agents, each built to disagree with the others, on a problem that would normally have required the room, the flights, and the sticky notes. What came out the other side reframed the product conversation the following week in a way three days of human workshopping might not have.
Both things are true simultaneously, the room has earned its reputation and the room is now optional, and that tension is what makes this particular moment in the history of software so interesting to be standing in.
Design thinking is not complicated. It is elegant in the way that most genuinely useful methodologies are elegant: the logic is obvious once you see it, and the difficulty is entirely in the execution. You begin by listening, deeply, to users before anyone is permitted to propose a solution. You distill what you hear into a single Point of View statement so precise and specific that it makes the team slightly uncomfortable, which is a reliable sign that it is correct. You then generate ideas in two strictly separated steps, every idea welcome first, evaluation only after the full field is visible. You build something testable rather than something final. And you put it in front of real users who are explicitly permitted to reject it, and you learn from the rejection.
The methodology is not the problem. The methodology is excellent. It has produced genuinely important products and reoriented genuinely misguided organizations. The question is whether the specific ingredients that make it work, the physical room, the facilitators with the markers, the twelve people who have flown in from four continents and are now being asked to generate ideas without judgment, are actually required. Or whether they are simply the delivery mechanism we built because, until very recently, we had no other.
I spent that Sunday afternoon the way a child might spend it with a new chemistry set, running scenarios, swapping problems, watching the agents argue. A pricing strategy question. A supply chain disruption. A patient engagement platform. Each one took minutes to set up and produced outputs that would have taken a human team a full morning to reach, if they reached them at all. The agents were fast, cheap, and, in the specific way I had configured them, productively difficult with each other.
Then I tried a healthcare use case that I had been genuinely uncertain about: whether a hospital-based healthcare data platform could meaningfully address the problem of heart failure patients who return to the hospital within thirty days of discharge. Roughly one in four do. The majority of those readmissions are considered preventable. Many involve medication, a discrepancy between what the hospital prescribed at discharge and what the community pharmacist dispensed, a gap that opens in the hours after a patient goes home and widens quietly for days. The technology to close this gap exists. The business case, measured in avoided readmission penalties, was straightforward. It looked, from the inside, like a solvable problem.
I built a team of nine expert AI agents and a workshop moderator agent and asked them to run the workshop without me.
The agents are not assistants, and this distinction is the entire point. A generic AI assistant asked to evaluate a healthcare product problem produces a well-organized, largely useless answer. It names the relevant stakeholders, identifies the key risks, and recommends a phased implementation approach. It sounds exactly like the slide deck that gets photographed at the end of the workshop and distributed to people who were not in the room and will not read it.
What I built were nine agents with specific, irreconcilable perspectives, each configured to protect something different and to push back when that thing was threatened:
- A clinical systems architect who thinks natively in medical data standards and will cite a clinical study in a product meeting if the evidence contradicts the assumption
- A workflow integrator who has been through enough hospital software rollouts to know which alerts get clicked through without reading and why the charge nurse still keeps a paper backup
- A platform engineer who names the specific technical dependency behind every proposed feature and has seen simple integrations take eighteen months
- A governance architect who is, as I wrote in her configuration, not anti-innovation but anti-naivety, someone who speaks before the lawyers have to
- A business strategist who asks, for every concept that reaches the ideation table, whether it can survive a CFO review in year three
- A skeptical urban hospitalist managing twenty-two patients a day who will abandon any tool the moment it creates more work than it removes
- A change-resistant community family physician who validates technology not by reading studies but by asking whether doctors he trusts have used it in a setting like his
- A community health worker at a federally qualified health center serving a predominantly Spanish-speaking rural population, someone who works daily with patients the system has already failed once and who is configured, explicitly, to say so when a design ignores their constraints
- A session facilitator whose only job is to enforce the process, protect the phase boundaries, and ensure that nobody proposes a solution before the listening is done
Every agent carries a version of this closing instruction in their configuration: you never agree just to agree. If you see a flaw, you name it. If the group is wrong, you say so and explain why.
In the hotel meeting room, this instruction is also given. It is called the brainstorming rules. The difference is that in the hotel meeting room, the VP of product is also present, and the person who would otherwise name the flaw has learned, through long professional experience, to read the room before speaking.
Setting up the nine personas and writing the business challenge statement took about two hours. Running the full session took one more. Total cost in API calls: under ten dollars.
Then I gave my little team of minions the heart failure problem and watched what happened.
My hypothesis going in was specific: could a hospital-based healthcare data platform, the kind that sits inside a health system's walls and connects clinical records to financial data to operational workflows, close this gap in a meaningful way? The technology existed. The integration architecture was plausible. The business case, measured in avoided readmission penalties alone, was straightforward. It felt like a solvable problem for exactly this kind of platform.
The agents disagreed. Not immediately, and not all at once, but in the particular cumulative way that good workshop dynamics produce, where one observation from an unexpected corner of the room quietly changes the frame for everything that follows.
It was the community physician agent who said it first. When a heart failure patient walked into her clinic three days after discharge, she often did not have the discharge medication list. She did not know what had been changed in the hospital. She was doing medication reconciliation from the patient's memory and whatever documents they had managed to bring. The patient was attempting to comply. They were complying with the wrong information.
The hospitalist agent had not said this, because this failure happens after her patient leaves. The clinical architect had not said this, because his lens is the data infrastructure inside the hospital, not the community clinic three miles away. It came from the one agent positioned to see the patient on the other side of the handoff, and it landed with the particular weight of an observation that is obvious in retrospect and invisible beforehand.
What followed, as the facilitator agent synthesized all nine early outputs, was a realization that cut directly against my original hypothesis. The problem was not inside the hospital. It was in the community, in the gap between the moment of discharge and the moment a community physician, a neighborhood pharmacist, or a health worker at a federally qualified health center could act. And that community layer, the clinic without Epic, the pharmacy without a FHIR endpoint, the health worker doing outreach by phone in Spanish, was precisely where a hospital-based data platform had no reach.
The Point of View statement that emerged was not the one I had carried into the session. The naive framing was: heart failure patients need better medication reminders because they don't take their medications. The framing that came out was this: community physicians who receive discharged heart failure patients need accurate medication information within hours, because the handoff process fails reliably and silently in the space between institutions, and what looks like patient non-adherence is often compliance with the wrong instructions delivered to the wrong people too late.
The platform I had been asked to evaluate could automate the hospital side of that handoff elegantly. It could not reach the other side. That is not a failure of the platform. It is a finding that, without the workshop, might not have surfaced until year two of a product that had already been built, sold, and deployed into the wrong problem.
The ideation stage enforces a rule that human workshops enforce imperfectly: all nine agents generate ideas without seeing each other's output, and no clustering or evaluation begins until every idea is on the table. The diverge-converge boundary is structural, not aspirational. In the hotel meeting room, someone always starts evaluating during the diverge step. They cannot help it. The evaluation instinct in smart people under time pressure is nearly impossible to suppress with a brainstorming rule and a marker.
The governance agent raised the question that the methodology saves for the prototyping stage, exactly on schedule. What happens when the system flags a medication discrepancy and the alert is wrong, and a physician acts on it, and a patient is harmed? Is an AI component that generates a clinical recommendation a regulated medical device? This is not an abstract concern. It is the reason several hospital AI products have quietly been pulled from deployment after passing every internal review. It arrived on schedule because it was written into the agent's mandate. It did not get softened because the agent had a flight to catch.
The community health worker agent abandoned the prototype during testing.
This is not a failure. It is the session working correctly, and it is the moment I find hardest to explain to people who have not run this before. The prototype that emerged from the preceding stages was a physician-facing tool, genuinely useful for the hospitalist, partially useful for the community physician, and effectively invisible to the patients with the highest readmission risk and the fewest resources to navigate a new system. The community health worker agent named exactly why, in the specific language of her configured reality: the reminder was in English, the follow-up appointment was three weeks out, the clinic assumed broadband and a smartphone. Her patients were not failing to adhere. They were failing to appear, for reasons the prototype had not been designed to address.
The product lead agent's final verdict was: proceed, but name the equity gap as the primary risk to monitor in the first ninety days, and do not position this as a comprehensive heart failure readmission solution until the workflow for underserved populations is designed alongside the academic medical center workflow rather than after it.
I have attended product reviews where this conditional judgment was available but not given, where the roadmap commitment in the room was stronger than the dissenting observation at the edge of it. The agent had no roadmap to protect.
Three hours. Nine agents. One reframe that changed the direction of the following week's product conversation.
The human workshop is not over. The actual community health worker, the one who knows things no configuration captures, has to be in the room eventually. But she joins a different conversation when the Point of View statement is already written, already stress-tested across nine irreconcilable perspectives, already carrying the finding that the problem is not where everyone assumed it was. The sticky notes, when they go up, mean more.
The VP of product is still on the plane. The hotel meeting room is still configured for creativity, markers uncapped, sticky notes in three colors arranged with optimistic precision. These things will not stop mattering. But the question of whether they should happen before this kind of preparation, or instead of it, is one that the past year has made genuinely answerable for the first time.
I have written the full methodology as a step-by-step practitioner guide: nine agent personas with complete system prompt configurations, all twenty-two prompts across the five phases, a Python orchestration script for running the session programmatically, and a troubleshooting section for the failure patterns that appear most reliably the first time. Any product manager, without an engineering background, can run a complete session on a laptop in an afternoon.
The flights are optional.