DARK (RE)FACTORY · ISSUE 01
THE MODEL IS ONE ARTIFACT
Mapping the Dark (re)Factory, Issue 1
Mohammad Moradi, MD, on why clinical AI governance keeps missing the actual clinical decision. Convergent evidence from inside the clinic: the audit-trail break happens nowhere near the model.
The AI governance industry is busy auditing the model. A physician just told me the model isn’t where the evidence goes missing.
Mohammad Moradi is a physician working at the intersection of digital health measurement, decision-making under uncertainty, and human-centered AI. I asked him three questions about clinical AI governance that had been sitting wrong in my head. His answers reframed something I’d been getting wrong about my own product.
He reviewed the piece before publication. The quotes below are his words. The framing around them is mine.
Where the evidence actually goes missing
I’d assumed the audit-trail break was a single technical point. The model spits out a probability. Someone records it. Someone overrides it. The override is where the chain breaks.
Moradi pushed back hard:
“The audit-trail break does not happen at one clean technical point. It is rarely only the model output. Evidence often becomes diluted when it moves from the AI system into real clinical practice: clinician judgement, incomplete patient history, fragmented laboratory systems, inconsistent reporting formats, resource limitations, patient communication, family pressure, affordability, and local availability of care.”
“The weak point is often not the AI output itself, but the translation of that output into human clinical action.”
I’ve been saying govern the artifact, not the agent. Moradi added the sharper version: the artifact is the entire chain, not the model output. Same diagnosis, finer resolution.
He framed the chain as four layers worth preserving:
- What the model suggested.
- What evidence, data quality, and uncertainty the model exposed.
- How the clinician interpreted, accepted, rejected, delayed, or modified the recommendation.
- How the final decision was communicated to the patient.
Substitute “patient” with “team,” “downstream service,” or “auditor” and you have the same four layers for any AI-assisted decision. PR diff. Risk classification and uncertainty. Reviewer reasoning. Merge and deploy. The same chain, in a different vertical.
A clean output from dirty input is dangerous if the uncertainty is hidden
This is the line that stayed with me hardest.
Moradi was talking about real clinical data: lab values from different machines with different reference ranges, incomplete patient histories, fragmented record systems. An AI model that produces a confident recommendation against that input is hiding instability behind a clean output.
“If an AI system produces a confident recommendation from unstable input data, the audit trail should show that instability. It should not hide it behind a clean-looking output.”
The principle carries. The contexts do not. Clinical decisions sit on top of human, ethical, and communication layers that have no clean analog in code governance. Patient autonomy, informed consent, follow-up feasibility, communication framing. Those concerns shape clinical evidence in ways code review never has to consider. What carries across is the narrower point that polished output can mask upstream instability and the evidence trail should surface it.
In code governance, that means a static analyzer passes a PR, an LLM reviewer signs off, the evidence bundle records “approved.” What it does not record is that the file had zero test coverage, the model that wrote the change was sampling at temperature 1.2, and the diff touched a path the team had previously listed as high-risk. Clean output. Dirty input. Uncertainty hidden behind the green checkmark.
The evidence bundle’s job is to surface what the green checkmark hides. Not to replace it. To require it carry its receipts.
Decision quality under uncertainty is the governance question
This is the reframe I most needed.
“A good decision is one that is transparent, proportionate, clinically defensible, reversible when possible, and reviewable based on what was known at the time. Governance should evaluate decision quality under uncertainty, not only outcome correctness after the fact.”
Doctors are not judged solely by whether the patient turned out fine. They are judged by whether the decision was reasonable given what was knowable at the time. SR 11-7 model risk management already says this for financial models. Moradi is saying clinical AI needs the same lens.
So does code governance.
The auditor’s question is not “was the model right.” The auditor’s question is “was the decision reasonable given what was knowable, and can you reconstruct it.” That requires a different artifact than the one most tools produce today. It requires a record of the uncertainty state at decision time: what was known, what was missing, what alternatives existed, why some were rejected.
A PR-approved checkbox does not capture that. It captures the outcome. The reasoning is in the comments if anyone wrote it, in the reviewers’ heads if they did not.
That gap is what auditors are starting to point at.
What changed in my thinking
Two things shifted from this conversation.
First, the artifact is bigger than I had been pitching it. In code governance, the artifact isn’t only the PR diff and the approval timestamp. It is the diff, the risk classification, the uncertainty the system exposed, the reviewers’ reasoning, the merge event, the rollback path, the post-deploy outcome. The chain. Moradi made me see that the chain is the artifact, not the diff.
Second, decision quality under uncertainty is a governance category that applies to every AI-assisted vertical I care about. Clinical decisions. Code changes. Loan denials. Claim approvals. Hiring decisions. The model produces an output. The decision happens around it. Most of what an auditor needs lives in the around, not in the output.
The line
“We may govern the model while missing the real clinical decision.”
Substitute “clinical decision” for “code change,” “deployment,” “claim approval,” “loan denial.” Same gap, different vertical.
The model is one artifact. The decision is a chain. Govern the chain.
If you’re working on AI governance and this lens helps, reply. The next issue is in conversation. I’m collecting the people who see this from inside the work, not from outside it.
David Youssef Founder, GuardSpine cal.com/davidyoussef/guardspine
Mohammad Moradi, MD, is a physician focused on digital health measurement, decision-making under uncertainty, and human-centered AI. He reviewed the piece before publication.
Reply if the lens helps. Skip if it doesn't.
Interviews are 20 to 30 minutes. Writeup goes to the interviewee for sign-off before publish. If you're inside the deployment chain and you see something the dashboards don't yet show, the door is open.
Evidence over opinions. Every time.
David Youssef. Founder of GuardSpine, an open-core code governance platform. guardspine.com