Psychological Motion Capture: The Missing Framework for AI-Ready Knowledge Work

Everyone’s talking about breaking down thinking into code. Functions, pipelines, prompts chained together. It’s the dominant paradigm for building AI workflows. But I think we’re using the wrong metaphor.

Code assumes you already know exactly what steps to take. It assumes the process is deterministic. But knowledge work isn’t like that. When an expert reviews a clinical trial protocol, writes a legal brief, or diagnoses a manufacturing defect, they’re doing something messier. Something more like movement than calculation.

Here’s what I’ve been calling it: Psychological Motion Capture.

The idea is borrowed from industrial engineering’s original time-and-motion studies, but applied to cognitive processes. Instead of filming someone shoveling coal and counting arm movements, you’re capturing the micro-movements of thought. The goal: break down your thinking into processes and steps that are auditable, replicable, and ready for AI agents to execute.

This isn’t a new idea. It has deep roots. But nobody’s synthesized these threads specifically for the AI workflow era. Let me walk through what I’ve found.

The 120-Year History You Didn’t Know About

Frederick Taylor and the Original Motion Capture (1880s-1910s)

In 1911, Frederick Winslow Taylor published The Principles of Scientific Management. His insight was simple but radical: work that looks like a single activity is actually a sequence of discrete motions, and most of those motions are inefficient.

Taylor broke down pig iron loading at Bethlehem Steel into individual arm movements. He found that workers moving 12.5 tons per day could move 47.5 tons with optimized rest intervals—a 280% improvement without working harder. His method: observe, decompose into atomic motions, time each one, eliminate waste, reassemble.

Frank and Lillian Gilbreth extended this with actual motion capture. They filmed bricklayers and discovered that the standard 18 motions per brick could be reduced to 5. Their cameras were the first tools for making invisible physical movements visible and optimizable.

The lesson for knowledge work: what feels like a single cognitive act—“reviewing a document”—is actually a sequence of micro-decisions. Until you capture that sequence, you can’t systematically improve it or hand it to an AI.

Cognitive Task Analysis: Taylor’s Method Applied to Thinking (1980s-Present)

By the 1980s, researchers in human-computer interaction and expert systems development faced a problem: how do you extract the knowledge from an expert’s head and put it into a computer? Physical motion studies didn’t work because the “motions” were invisible. They were mental.

The answer was Cognitive Task Analysis (CTA). The core methodology: hierarchically decompose a cognitive task into sub-tasks, map the decision points, identify the information required at each step, and document the mental models experts use to make judgments.

A 2021 study from the Naval Air Systems Command (NAVAIR) developed an enhanced version called CTAWC (Cognitive Task Analysis and Workload Classification). The framework decomposes cognitive tasks “in enough depth to allow for precise identification of sources of cognitive workload.” They found the process is iterative—you start with high-level tasks, break them into physical sub-tasks, then decompose those into cognitive actions including “recalling basic facts” up to “evaluating criteria and decision-making.”

Here’s the key finding that matters for AI: conscious access to one’s own knowledge is estimated at only about 30% of what we actually know. Experts automate their decision processes through practice, and those automated processes “operate outside of conscious awareness.” This means you can’t just ask an expert how they do their job. You have to extract it systematically.

Think-Aloud Protocols: Capturing Thought in Real Time

The think-aloud protocol was developed by psychologists K. Anders Ericsson and Herbert Simon in the 1980s. The method is exactly what it sounds like: you ask someone to verbalize their thoughts continuously while performing a task. “Say whatever comes into your mind as you complete the task. What you are looking at, thinking, doing, and feeling.”

This creates what researchers call a “verbal protocol”—a transcript of cognitive processing. Unlike retrospective interviews (“How did you solve that problem?”), think-aloud captures the actual sequence of mental states as they happen. The research on this is clear: asking people to verbalize their thinking does not necessarily impair task performance or produce inconsistent reports.

Jakob Nielsen, the usability researcher, calls think-aloud “the single most valuable usability engineering method.” But its real power goes beyond usability testing. It’s a form of psychological motion capture—recording the invisible movements of the mind so they can be analyzed, improved, and replicated.

Knowledge Elicitation: The Expert Systems Era

In the 1980s and 1990s, the expert systems movement tried to encode human expertise into rule-based computer programs. The goal was ambitious: capture what doctors, lawyers, and engineers know and make it available in software.

The bottleneck was always knowledge elicitation. As one 1990 paper from the American Association for Knowledge Engineers put it: “The knowledge engineer has a dual task. This person should be able to elicit knowledge from the expert, gradually gaining an understanding of an area of expertise. Intelligence, tact, empathy, and proficiency in specific techniques of knowledge acquisition are all required.”

The techniques developed included structured interviews, observation of experts at work, protocol analysis, sorting techniques, and concept mapping. Each method addresses different aspects of knowledge: declarative (facts), procedural (how-to), and conceptual (relationships between ideas).

The expert systems era largely failed. The technology wasn’t ready. But the knowledge elicitation frameworks remain valuable—and they’re exactly what we need for training AI agents today.

The Gap: Why Code Isn’t the Right Paradigm

Today’s AI workflow tools—LangChain, AutoGPT, Claude’s artifacts—use code as the organizing metaphor. You define functions, chain prompts, build pipelines. The assumption is that knowledge work can be represented as deterministic sequences of operations.

But this creates several problems:

First, code requires you to know the steps before you start. In knowledge work, the steps often emerge from the content. A lawyer reviewing a contract doesn’t follow a fixed sequence—they adapt based on what they find. The “algorithm” exists in their head, shaped by years of pattern recognition that they can’t fully articulate.

Second, code obscures the mental model. When you write a function called review_contract(), you’ve abstracted away exactly the thing you need to capture: the sequence of judgments, the criteria for flagging issues, the weight given to different factors. The implementation details are hidden inside the function, but those details are the knowledge.

Third, code isn’t auditable by domain experts. A lawyer can’t review Python. A clinical researcher can’t debug a LangChain pipeline. The people who know whether the process is correct are locked out of the verification process.

Motion capture offers a different paradigm. You’re not writing instructions for a computer—you’re recording the actual movements (physical or mental) that an expert makes. The output is a capture file, not a program. It can be analyzed, edited, replayed, and taught. It’s auditable by anyone who understands the domain.

The Tacit Knowledge Problem

Michael Polanyi, the philosopher, famously said: “We can know more than we can tell.” This is the tacit knowledge problem. Experts hold knowledge in their bodies and intuitions that they cannot articulate, even when asked directly.

Nonaka and Takeuchi’s 1995 book The Knowledge-Creating Company distinguished tacit knowledge (personal, context-specific, hard to formalize) from explicit knowledge (codified, transmittable, systematic). They argued that organizational innovation comes from converting tacit to explicit and back again—a spiral of knowledge transformation.

Harry Collins, a sociologist of science, refined this further. He identified three types of tacit knowledge:

Relational tacit knowledge (RTK): Knowledge that could be articulated but isn’t, for contingent reasons. Maybe no one ever asked. Maybe there are motivational barriers to sharing. Maybe it’s just never been written down. This is the low-hanging fruit for AI capture.

Somatic tacit knowledge (STK): Knowledge embodied in physical skills. A surgeon’s hand movements. A musician’s finger positions. This can’t be captured in words, but it might be captured in motion data or replicated by machines.

Collective tacit knowledge (CTK): Knowledge that exists only in social contexts and can never be made fully explicit because it depends on shared understandings, norms, and relationships. This is the hardest to automate.

Most knowledge work contains all three types. Psychological motion capture aims primarily at the first—making RTK explicit so it can be systematized. But it also provides scaffolding for the second and signals when the third is in play.

What the AI Research Is Actually Showing

The AI research community has been converging on similar ideas, though they use different language.

Chain-of-Thought Prompting

In 2022, Google researchers discovered that instructing LLMs to “think step by step” dramatically improves performance on complex tasks. This is Chain-of-Thought (CoT) prompting. The key finding: “CoT transforms big tasks into multiple manageable tasks and sheds light into an interpretation of the model’s thinking process.”

This is psychological motion capture in reverse. Instead of capturing a human’s thought process and encoding it for AI, you’re prompting the AI to externalize its “thought process” so humans can verify it. The researchers at Tokyo University found that simply adding “Let’s think step by step” to a prompt improved accuracy on reasoning tasks—forcing the model to generate intermediate steps rather than jumping to conclusions.

Task Decomposition in AI Agents

Modern AI agent frameworks—LangChain, AutoGPT, BabyAGI—all use task decomposition as their core mechanism. Lilian Weng’s influential blog post on LLM agents describes the planning module: “The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.”

But here’s the gap: these systems assume the task decomposition happens automatically (via the LLM) or is specified by a programmer. Neither approach captures domain expertise systematically. A biotech protocol reviewer knows which sections of a document matter most, which combinations of findings signal problems, and which corner cases require escalation. That knowledge doesn’t emerge from the LLM and isn’t obvious to a programmer.

Psychological motion capture fills this gap. You extract the expert’s actual decomposition strategy—the mental moves they make when approaching a task—and encode that as the agent’s planning framework.

Process Mining and Digital Twins

In enterprise software, process mining tools like Celonis create “digital twins” of business processes by analyzing event logs from ERP systems. They discover how work actually flows—not how it’s supposed to flow, but what really happens.

A 2024 paper in the International Journal of Production Research proposes a “cognitive digital twin”—extending the concept to capture workers’ knowledge and experience. Their three-layer model includes: an ontology layer (foundational knowledge structures), a knowledge layer (real-time data mapping), and a cognitive layer (machine learning, reasoning, and knowledge mining).

This is exactly where psychological motion capture fits. It’s the methodology for building the cognitive layer—the part that represents human judgment and expertise, not just transactional data.

A Practical Framework for Psychological Motion Capture

Based on the research and my own work training 200+ professionals in AI workflows, here’s a framework for capturing cognitive processes in a form that AI agents can execute.

Step 1: Identify the Capture Target

Not all knowledge work is equally suitable for capture. Look for tasks that meet these criteria:

Repeatable: The task happens frequently enough that systematization pays off.
Bounded: There’s a clear start and end point. “Write a novel” is too open; “review a clinical trial adverse event report” is bounded.
Consequential: Errors have real costs. If mistakes don’t matter, there’s less value in systematic capture.
Expert-dependent: The task is currently done better by experienced people than by novices or naive AI prompts.

Step 2: Conduct Think-Aloud Sessions

Record experts performing the task while verbalizing their thoughts. The key rules from the research:

Use concurrent verbalization (during the task), not retrospective (after). Memory is unreliable, and experts rationalize their decisions after the fact.
Ask them to verbalize what they’re doing, not why. “I’m checking section 4.2… looking for the dosing schedule… this seems higher than typical…” Not: “I’m checking this because in my experience, dosing errors are common.”
Capture multiple experts on the same task. Experts often have different strategies, and the differences reveal what’s essential versus idiosyncratic.
Record everything: screen capture, audio, even eye tracking if available. The more data, the richer the analysis.

Step 3: Decompose into Atomic Moves

Analyze the recordings and break down the task into “atomic cognitive moves”—the smallest units of judgment that can be isolated. Think of these like the arm movements Taylor identified in shoveling, but for thinking.

Categories of atomic moves I’ve found useful:

Retrieve: Pull information from the source material or from memory. “Find the primary endpoint in section 3.”
Compare: Evaluate something against a criterion. “Is this within the normal range?”
Classify: Assign something to a category. “This is a safety signal, not an efficacy issue.”
Infer: Draw a conclusion from evidence. “Given X and Y, this suggests Z.”
Flag: Mark something for further attention. “This needs specialist review.”
Prioritize: Order things by importance. “Check dosing before checking labeling.”

The depth of decomposition matters. The NAVAIR research suggests going “deep enough that the hypothesis can be tested” but not so deep that “the analysis explodes into too many variables.” A good rule: decompose until each atomic move could be given to a competent junior person (or an AI) as a standalone instruction.

Step 4: Map the Decision Tree

Atomic moves don’t happen in isolation. They’re connected by decision logic. Map the sequence and branching:

What triggers each move? (Completion of prior move, finding specific content, etc.)
What are the possible outcomes? (Pass/fail, categories, scores)
What does each outcome lead to? (Next move, escalation, completion)
Are there loops or iterations? (“Re-check if X changes”)

The output here looks less like code and more like a clinical decision tree or a troubleshooting flowchart. It should be readable by domain experts who’ve never seen a programming language.

Step 5: Identify the Knowledge Requirements

For each atomic move, document what knowledge is required to perform it:

Declarative knowledge: Facts that need to be known. “Normal range for X is 10-20.”
Procedural knowledge: Steps to follow. “To check Y, first do A, then B.”
Contextual knowledge: Information that modifies the task. “If this is a Phase I trial, the tolerance is wider.”

This creates the knowledge base that an AI agent needs access to. Some of it can be provided as context in prompts; some needs to be retrieved (RAG); some needs to be hard-coded as constraints.

Step 6: Validate and Iterate

The capture isn’t complete until experts confirm it. Use “teachback”—have the knowledge engineer explain the process back to the expert. Where did we get it wrong? What’s missing? What would make a novice fail?

Then test on real cases. Run the captured process (manually or with AI assistance) on new examples and compare the outputs to expert judgments. Track where divergences occur and refine.

This validation loop is what separates psychological motion capture from simple documentation. You’re not writing a description of how work should be done—you’re creating an executable model that can be empirically verified.

Why This Matters Now

LLMs changed the game for knowledge work automation. Before, you needed to encode expert knowledge in formal rules—which was expensive, brittle, and incomplete. Now, you can encode it in natural language and let the LLM handle the execution.

But this makes the capture problem more important, not less. A bad rule in an expert system fails obviously. A bad prompt in an LLM fails subtly—the output looks plausible but is wrong in ways that only experts notice. The stakes of capture quality have gone up.

A 2025 study from MDPI on tacit knowledge conversion found that NLP pipelines can now “identify, extract, and organize tacit knowledge embedded in textual data, such as expert interviews, discussion transcripts, or informal communications.” The technology is ready. The bottleneck is the input—systematic capture of expert processes.

Another piece of research put it bluntly: for the last 30 years, IT systems automated explicit knowledge. Now AI can automate tacit knowledge too. But only if that tacit knowledge is first made visible.

That’s what psychological motion capture does. It takes the invisible movements of expert thought and makes them visible, auditable, teachable, and executable by AI.

The Management Implication

Here’s the uncomfortable truth: AI adoption is a management problem, not a technical one.

Most organizations are trying to adopt AI by training employees on ChatGPT prompts. That’s like trying to adopt manufacturing automation by teaching workers how to press buttons on machines. It misses the point. The hard part isn’t operating the tools—it’s designing the workflows that the tools execute.

Psychological motion capture is workflow design for the AI era. It requires:

Management commitment: This takes time from your best people. Experts need to be relieved of some production work to participate in capture sessions. That’s a resource allocation decision, not a technology decision.

Process ownership: Someone has to own the captured workflow, maintain it as requirements change, and be accountable for its accuracy. This is a management role, not a technical role.

Quality systems: How do you know the AI is doing the task correctly? You need verification frameworks, sample audits, feedback loops. This is operations management, not prompt engineering.

The organizations that will win with AI are the ones that treat it like workforce onboarding—systematic, managed, measured—not like installing new software.

What I’m Still Figuring Out

This framework isn’t complete. Some things I’m still testing:

Capture efficiency: Think-aloud sessions are time-intensive. I’m experimenting with lighter-weight methods—structured interviews, worked examples, exception-based capture (“only document when you do something non-obvious”). Early results suggest a hybrid approach works best, but I don’t have enough data yet.

LLM-assisted capture: Can you use an LLM to interview experts and extract their knowledge? Some early research suggests yes, with caveats. AI-led interviews are more structured but miss nuance. The current answer seems to be: use AI for initial structure, humans for depth.

Transfer limits: How much expertise can actually be captured this way? There’s ongoing debate in the knowledge management literature about whether tacit knowledge can ever be fully externalized. My working assumption: you can capture enough to get 80% of the value, but the last 20% requires human judgment and may always require it.

Maintenance burden: Captured workflows need updates as the domain changes. I don’t yet have good frameworks for tracking when a captured process is out of date or for efficiently updating it. This is a known problem in knowledge management and I don’t have a novel solution yet.

The Bottom Line

Everyone in AI is talking about agents, pipelines, prompts. That’s the code paradigm. It assumes the hard part is execution.

Psychological motion capture offers a different starting point. The hard part isn’t execution—it’s capture. The hard part is making the invisible visible. Once you’ve recorded the motion, the playback is comparatively easy.

Taylor transformed manufacturing by treating motion as something that could be seen, measured, and optimized. The same transformation is possible for knowledge work. But it requires taking seriously what the cognitive science literature has known for decades: expertise is mostly invisible, even to experts themselves.

The organizations that learn to capture psychological motion—systematically, rigorously, with the same seriousness that manufacturers brought to time-and-motion studies—will have a durable advantage. They’ll be able to scale expertise without scaling headcount. They’ll be able to automate judgment, not just procedure.

That’s the prize. And unlike most of what’s hyped in AI, this one requires management discipline more than technical sophistication. Which means the bottleneck isn’t the technology.

It’s whether you’re willing to do the work.

David Youssef trains teams to build AI workflows in regulated industries where mistakes are expensive. Book an AI Readiness call to explore what systematic knowledge capture could look like for your team.

Key Sources and Further Reading

Taylor, F. W. (1911). The Principles of Scientific Management.
Ericsson, K. A. & Simon, H. A. (1984). Protocol Analysis: Verbal Reports as Data.
Nonaka, I. & Takeuchi, H. (1995). The Knowledge-Creating Company.
Collins, H. (2010). Tacit and Explicit Knowledge.
Knisely, B. M. et al. (2021). Cognitive Task Analysis and Workload Classification. MethodsX.
Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
Su, C. et al. (2024). Cognitive digital twin in manufacturing process. Int. J. Production Research.