How Will You Measure Your Product? Lessons from Christensen

Clayton Christensen's How Will You Measure Your Life? asks Harvard Business School graduates a question: are you measuring the right things?

‍

Christensen wrote it for people trying to figure out how to live: how to build a marriage and how to make the ethical calls that add up over a career. The subject matter has nothing to do with roadmaps, but its foundation in business theory serves as a mirror for how we build products. If Christensen's frameworks can expose whether you're living according to your values, they can certainly expose whether your product roadmap reflects your stated priorities.

‍

At dualoop, we work with teams who know their frameworks and still struggle with the gap between strategy and what really gets built. That gap often comes from measurement choices that feel reasonable in the moment but end up becoming something that no longer resembles the product the team set out to create. Christensen's work lives in that gap. What follows draws on four of his central ideas: deliberate vs. emergent strategy, resource allocation, Jobs to be Done, and marginal thinking. Each one, applied to product decisions, surfaces something that standard PM frameworks tend to paper over.

‍

Deliberate vs. emergent strategy: the roadmap paradox

The distinction Christensen draws between deliberate and emergent strategy is one of the most frequently misused.

‍

Deliberate strategy is the plan you make. It reflects your informed hypotheses about market needs and about which sequence of product investments will build into competitive advantage. Emergent strategy is what develops when reality arrives: a competitor launches something you didn't anticipate, a key enterprise account threatens to leave unless a specific capability is delivered, a regulation passes that restructures your compliance picture, a technology shift moves faster than your planning cycle.

‍

Product managers live in this tension constantly. The teams that handle it well are the ones who have, in advance, built a process for deciding which surprises deserve a response and which deserve to be noted and set aside.

‍

The distinction matters because the default, for teams that haven't built that process, is urgency. Whoever shouts loudest gets the sprint. Calling that emergent strategy is generous; it is reactivity with a better vocabulary. And over time, it produces a product that reflects the accumulated weight of the loudest voices rather than a coherent set of decisions about what users need.

‍

What distinguishes adaptive judgment from reactive drift

Christensen's insight is that great companies don't just respond to change. They have a deliberate mechanism for evaluating what to respond to. A genuine emergent strategy response involves a structured evaluation: does this new information change your understanding of who your users are and what they need, or does it only change the surface features of how you serve them?

‍

A competitor launching an AI powered querying feature in 2023 did not necessarily mean every product team should build one. It meant product teams needed to ask whether their users' jobs to be done had changed, and whether the new capability addressed those jobs better than existing solutions. Many teams that skipped that question spent quarters building AI features their users found impressive for about two weeks and then ignored.

‍

A product team with a functioning strategy process keeps a short list of things that, if they changed, would actually alter the strategy, alongside a longer list of things that, if they changed, would warrant monitoring but not immediate action. Without that distinction written down somewhere and revisited deliberately, every new signal becomes equally urgent, and emergent strategy becomes a polite word for chronic distraction.

‍

Building the process for deciding what matters

What does this look like in practice? The most functional version involves keeping a live document alongside the roadmap, something pulled out specifically when a surprise arrives. The document answers two questions: what would have to be true about your users or your market for this new information to require a strategic response, and is that true right now?

‍

A team building a B2B analytics product, midway through a planned quarter of performance improvements, received word that a well funded competitor had just shipped a natural language querying feature. The immediate instinct was to pause and pivot. The question they asked instead was whether their users' core jobs, specifically getting data into the hands of people who need it to make decisions quickly, required natural language querying. The answer, after two days of customer calls rather than two weeks of strategy offsites, was no. Their users' bottleneck was getting the right data model in place to make any query useful at all. The performance work continued. Six months later, the competitor had good adoption among new users, and the team's existing users had stopped complaining about the thing that had been slowing them down.

‍

The decision was right because the team had a specific enough picture of their users' situation to evaluate the signal against something concrete. That specificity is what distinguishes emergent strategy from reactive drift.

‍

Where your resources go

Christensen's most uncomfortable observation is that your actual strategy is not the one you present at company all hands or write in your product vision document. Your actual strategy is revealed by where you allocate your resources: where engineering capacity and the team's attention go over a sustained period.

‍

He frames this through a personal lens. Parents who say their family is the priority but consistently work weeks that run past seventy hours are not failing to live their values through laziness or bad faith. Their calendar reveals the real priority structure, whatever their stated values might be. Product teams fall into the same pattern with striking regularity.

‍

What the audit usually reveals

The exercise is simple. Take the last two to four completed quarters. Break down where engineering capacity went across categories: new features, technical debt fixes, infrastructure work, internal tooling, compliance requirements, custom work for specific customers, and bug fixes. Then compare that breakdown to your stated product strategy.

‍

Most teams that do this discover a gap they already sensed but hadn't put a number on. A team claiming to be deeply focused on customer outcomes often finds that somewhere between thirty and fifty per cent of its engineering capacity went to things that were never on the publicly visible roadmap. Technical debt from a previous growth phase and internal tooling the sales team needs for demos are common findings. So is compliance work that nobody wanted to put on a publicly visible roadmap because it sounds unglamorous.

‍

One head of product at a B2B fintech ran this exercise after her team had spent months expressing frustration about never making meaningful progress on the product areas that mattered most to customers. The audit revealed that 58% of engineering capacity over the previous six months had gone to compliance work: GDPR remediation, a new open banking certification, an audit trail feature required by a large enterprise customer's procurement team, and a set of data residency changes demanded by a second customer in Germany. None of this was on the product roadmap, but all of it was necessary. The frustration, she realised, wa coming from the mismatch between what the team was told the strategy was and what they could see the work actually being.

‍

The allocation didn't change. Compliance work that keeps the product legal and enterprise contracts that fund the business are not optional. What changed was the stated strategy. For the following quarter, the product narrative was built around what the team was doing: consolidating the regulatory foundation to enable a sustainable product build phase afterwards. That framing, which required honesty about where the company was rather than where it wanted to be, reduced team frustration significantly. People can accept working on unglamorous things when they understand why.

‍

When the gap is justified, and when it isn't

The resource allocation audit forces honesty, but it's worth distinguishing between gaps that are justified and gaps that point to a real problem.

‍

Some allocation gaps are environmental. A team in a regulated industry will always spend more on compliance than a consumer app team, just as a team that shipped fast during a growth phase and accumulated technical debt will need to address it before moving quickly again. Teams serving a small number of enterprise customers with specific requirements face a different version of the same pressure: more custom work, less standardised product investment. These gaps are real, and acknowledging them honestly is more useful than pretending they don't exist.

‍

A different kind of gap emerges when the allocation reflects a values mismatch: where the team's stated priorities and its actual behaviour diverge not because of external constraints but because of internal choices made under pressure that were never revisited. A team that spent 60% of its capacity on internal tooling for two consecutive quarters, not because it was forced to but because the engineering team found it more interesting than product features, has a problem that roadmap revision alone won't solve without also addressing the underlying incentive structure.

‍

Christensen's point, translated for product teams: the audit's purpose is getting an accurate picture of what the organisation values, not assigning blame for the gap. Once you have that picture, you can decide whether it matches what you intend to value, and if not, what specifically needs to change.

‍

The job beneath the feature request

Christensen's Jobs to be Done framework is well known in product circles, which is partly why it gets applied poorly. The core insight is that the job the customer describes is almost never the real job.

‍

His milkshake research is the classic example. A fast food chain wanted to sell more milkshakes. Customers, when asked in traditional surveys, said they wanted thicker shakes, more flavours, lower prices, and faster service. When researchers observed actual purchasing behaviour, they found that the majority of morning milkshake purchases were made by commuters who wanted something to occupy their hands and their attention during a long, boring drive. They needed to get to mid-morning without getting hungry again, and without making a mess. The job was not "I want a delicious treat." It was "I need something that helps me survive a tedious commute without eating again until lunch."

‍

The survey data was not wrong. Customers did prefer thicker shakes and more flavours when asked. But those preferences described the milkshake as an object. They said nothing about the job the milkshake was being hired to do. Optimising for expressed preferences would have produced a better milkshake that solved the wrong problem.

‍

The gap between what users ask for and what they need

The practical application for product managers is that feature requests are almost always solution descriptions. When a user asks for a chatbot, they are not asking for a chatbot. They are describing one possible solution to a problem they have not put into words, often because they are not sure how to articulate it, or because they assume you already understand the context they're operating in.

‍

A team building a project management SaaS for enterprise clients received consistent requests for better reporting functionality. The job users described was "we need better reports for our stakeholders." The team invested a quarter in building a flexible report builder with customisable templates and multiple export formats. Adoption was poor. A follow-up research cycle, which involved sitting with project managers during the week before their monthly steering committee meetings rather than interviewing them about product preferences in a conference room, revealed the actual job: project managers needed to look on top of things in front of their steering committees without spending three hours on Thursday evening manually copying data into PowerPoint slides.

‍

The actual job wasn't "better data presentation." It was "looking confident in an important meeting without the preparation becoming its own project." The solution that worked was an export that generated a clean, stakeholder ready slide with the right metrics already formatted, available in a single click. That feature took two weeks to build. The report builder took a quarter. The simplified export had five times the usage within a month of launch, and support tickets about reporting dropped significantly within two months.

‍

The gap between the job a user can describe and the job they need addressed is where most product effort goes to waste, and closing that gap requires research methods that observe behaviour rather than just collect preferences.

‍

What Jobs to be Done means when AI enters the picture

This distinction matters more, not less, as AI capabilities become part of product decisions. There is currently significant pressure on product teams to add AI features, and that pressure often manifests as requests that follow the same pattern as any other feature request: "Can we add an AI assistant?" or "Can we build an AI summarisation layer?"

‍

All of these are solution descriptions. The question that needs to come before any of them is: what job is the user trying to do, and does AI address that job better than the existing approach? We explored this further in our piece on whether AI is reinventing or replacing product management.

‍

Some AI features do address a real job better than alternatives. AI powered anomaly detection in a monitoring product addresses the job of "find the thing that's wrong before it becomes a customer incident" better than manual threshold alerting, because it can identify patterns across dimensions that a human analyst would need days to correlate. The job is clear and the improvement is measurable.

‍

Other AI features address the job of looking innovative. They exist because a competitor shipped something similar, or because an executive saw a demo at a conference and came back energised. These are organisational signal responses dressed up as product decisions. The Jobs to be Done lens cuts through that: if you cannot describe the specific job the feature addresses and show evidence that users encounter that job at the moment the feature would appear, you are likely building the wrong thing, regardless of how impressive the underlying technology is.

‍

The core principle holds across every capability shift. Start with the job, then evaluate whether the new tool addresses it better than what exists. The product teams that get AI decisions right are generally not the ones with the best access to models. They are the ones who haven't abandoned the habit of asking what problem they're actually solving before writing a single line of code.

‍

How integrity erodes through marginal decisions

Christensen's warning about marginal thinking is the framework with the most direct relevance to AI product decisions, and probably receives the least attention because of that. The "just this once" decisions that seem harmless in isolation add up into something unrecognisable over time.

‍

The personal version in the book is familiar: the executive who skips one important commitment because of a genuine emergency, then skips another because the first absence didn't seem to matter, until the pattern has set without any single decision feeling decisive. Each individual choice was defensible. The accumulated pattern was something else entirely.

‍

How small decisions add up over time

In product work, marginal thinking shows up in decisions about data collection and testing quality that each seem reasonable under pressure and collectively produce something nobody would have signed off on if asked to approve it in advance.

‍

A team ships a feature without adequate testing because a key customer has given a hard deadline and the business development team has built expectations around it. The following quarter, a different team ships a different feature on a similar timeline, citing the same logic. Within a year, "we'll expedite testing for strategic customers" has become an informal standard that nobody officially authorised but everyone accepts. The bugs that result get attributed to complexity, not to the quiet erosion of a practice.

‍

The slope is not visible at any single step. What makes it a slope rather than a series of discrete decisions is that each new exception shifts what normal looks like. Product managers are exposed to this because so many of their decisions involve real trade-offs with legitimate arguments on multiple sides. The specific risk is making individually defensible calls that, accumulated over time, produce a product and a team culture they did not intend to build.

‍

Why AI raises the stakes on marginal decisions

The marginal thinking framework matters especially in AI product decisions because the consequences of each individual decision are less immediately visible.

‍

Collecting an additional data signal for model training seems harmless when the Terms of Service technically permit it and competitors are presumably doing the same. Shipping a feature that performs well on benchmark tests but hasn't been evaluated for bias in edge cases seems acceptable when the release date is fixed and the edge cases are uncommon. Each of these decisions, made in isolation, has a plausible justification. What they create over time, if not interrupted deliberately, is a product that misrepresents its reliability and that the team cannot clearly defend when something eventually goes wrong.

‍

The test Christensen proposes for personal decisions translates directly: if this became our standard practice, what kind of product team would we be? The question is whether you would be proud of the product you were building if this became standard practice, not whether the policy technically permits it this once. That question is harder to answer, and harder to avoid, than it looks.

‍

Where these frameworks break down in practice

Christensen's frameworks are useful, which means they are also susceptible to the kind of misuse that sounds rigorous but produces nothing. Several failure modes are worth naming explicitly, because each one is common enough to be recognisable and subtle enough to be mistaken for sound practice.

‍

Emergent strategy as a cover for missing conviction

The most common failure mode is using emergent strategy to explain away a lack of conviction in the deliberate strategy. Teams that haven't done the hard work of identifying a clear thesis about their users and their market will respond to almost any external signal by treating it as strategically significant. They call this being responsive to the market. In practice, it means the roadmap changes every time something appears in a competitor's changelog or a customer complaint thread.

‍

The giveaway is that the team cannot articulate what it would take for a signal to NOT warrant a response. When the deliberate strategy is too vague to act as a filter, every new piece of information looks potentially strategy-altering. The emergent strategy framework only works as a complement to a deliberate strategy concrete enough to guide decisions. Without that foundation, it becomes a more sophisticated way of describing a strategy that isn't clear enough to act on.

‍

When audit findings don't lead to change

The resource allocation audit is most useful when it leads to a decision. When it doesn't, it tends to produce a shared understanding of the gap without any accountability for closing it, which can harden cynicism in the team and reinforce the belief that measurement is performative.

‍

The audit only produces value if someone with authority over resource decisions is willing to either change the allocation or change the stated strategy. Both of those changes have political costs. The engineering team has commitments. The product strategy has been communicated externally. The compliance work nobody wants to acknowledge publicly is still legally necessary. These constraints are real, and the audit alone cannot resolve them.

‍

The insight has a political dimension that the analysis alone cannot address. Someone with actual authority over resource decisions needs to make a call that is uncomfortable in the short term in order to close a gap that will otherwise grow. For a deeper look at how this connects to product operating model design, including who owns which decisions and how accountability is structured, it's worth reading our full guide on the topic.

‍

Jobs to be Done as a research ritual

The most technically well-executed misuse of Jobs to be Done is doing contextual research, correctly identifying the real job, and then building a solution that addresses a different job anyway. This happens when the research findings run into a product decision that was already made for reasons that had nothing to do with product: a partnership commitment, a feature promised during a sales cycle, an executive's conviction about what users need based on intuition rather than evidence.

‍

Jobs to be Done does not protect against this failure mode on its own. What it requires is that the person with authority to approve the product decision is willing to change course based on what the research reveals. Without that, the research becomes a compliance activity, something done to demonstrate rigour rather than to inform decisions. The job is identified correctly, the insight is shared in a review meeting, and the already-decided feature ships anyway. The framework fails not because of how it was applied, but because of the organisational dynamic it was applied within.

‍

How will you measure your product?

Christensen closes his book by dismissing the standard markers of professional success as proxies. The question that matters is how many people you helped and what their lives looked like because of that. What you made possible for others is the measure, not what you accumulated along the way.

‍

The parallel for product managers is exact, and just as difficult to act on.

‍

Most product teams measure velocity, story points, features shipped, and OKR completion rates. These are not wrong to measure. You cannot manage what you cannot see, and proxy metrics serve a real function in keeping teams honest about pace and output. But they are proxies, and the question worth sitting with is: what do they proxy for?

‍

Story points proxy for engineering throughput, which is a means rather than an end. OKR completion rates proxy for strategic execution, but only if the OKRs were correctly specified in the first place, which requires knowing who your users are and what they need in real detail. Features shipped look like product progress, but only if those features address jobs that users have. Each of these metrics tells you whether the team is doing the work efficiently. None of them tell you whether the work is worth doing.

‍

Engagement metrics might reflect dependency as much as value, and NPS scores measure satisfaction at a moment in time rather than sustained improvement in the user's core task. What you want is evidence specific to the job the product was hired to do: are the people who use this product better at the thing they hired it for, and can you show evidence of that? That question requires a clear theory of change connecting product decisions to user outcomes, and a commitment to measuring those outcomes directly rather than through proxies.

‍

The way you answer that question, or whether you try to answer it at all, is probably the most honest signal of what kind of product team you are building.

‍

Frequently asked questions

How do I know when an emergent signal warrants a strategic response, rather than just monitoring?

A useful threshold: a signal warrants a strategic response when it changes your understanding of who your users are or what job they are trying to do, not just when it changes the landscape of available solutions. A competitor shipping a new feature changes the competitive landscape. It warrants a response only if that feature addresses your users' actual job better than your existing approach, or if it attracts users whose job you were planning to serve. If neither of those things is true, the signal is worth noting and monitoring, but not acting on. The test is whether the new information updates your model of the user, not whether it creates urgency in a board meeting.

‍

How detailed should a resource allocation audit be?

Detailed enough to be honest, but not so detailed that it becomes its own project. You want to distinguish between broad categories of work: new feature development, technical debt work, infrastructure, internal tooling, compliance and regulatory requirements, custom work for specific customers, and bug fixes. A rough sense of the percentage split over the last two to four quarters is enough. You are not auditing individual sprint tickets. You are getting a picture of where the team's attention went at the category level, which is usually sufficient to reveal whether the allocation reflects the stated strategy. If it doesn't, the gap is your next conversation, not a more detailed spreadsheet.

‍

Is Jobs to be Done still useful when evaluating AI features, or does the framework break down for something this new?

Jobs to be Done is more useful for AI evaluation, not less. The mistake most product teams make with AI is starting from the capability, asking "we can now do X, what do we build with it?", rather than starting from the job. AI capabilities are new in some dimensions, but the jobs users need to do have not fundamentally changed. What has changed is the set of solutions available to address those jobs. The framework works best when you treat AI as one possible solution to be evaluated against an already-identified job, rather than as a category of features to be added because competitors are adding them.

‍

How do you apply the marginal thinking test in a team setting, where each decision involves multiple stakeholders?

The question "if this became our standard practice, what would we be?" works best as a team conversation rather than an individual decision. Present the decision context explicitly and ask: if this was how we handled this type of situation every time, what would our product and the way the team makes decisions look like in two years? Most teams that run this exercise find that individual decisions which seemed reasonable become obviously problematic when stated as a policy. If the answer to "what would we be?" makes people uncomfortable, that discomfort is information worth taking seriously.

‍

What if the resource allocation audit reveals a gap we cannot close because of external constraints?

Then the right response is to update the stated strategy to reflect reality, not to continue describing the strategy as something it currently cannot be. Teams that honestly communicate "we are in a consolidation and stabilisation phase for the next two quarters, and here is why, and here is what that enables us to do afterwards" tend to get more patience from both the team and stakeholders than teams that maintain an innovation narrative while shipping almost no new customer-facing capability. The gap between stated and actual strategy creates frustration. Closing that gap honestly, even when the honest answer is unglamorous, tends to reduce it.

‍

What does dualoop's work with product teams on strategy and operating model look like?

Dualoop's Snapshot engagement typically starts with exactly this kind of audit: mapping where the team's resources went against where the stated strategy said they should go, then identifying the gap and its causes. From there, depending on what the audit reveals, the work might involve redefining the product operating model: who owns which decisions and how the team distinguishes between signals worth responding to and signals worth monitoring. The goal is to give teams a clearer picture of what they are building, so they can decide whether that matches what they intend to build, and if not, what specifically needs to change.

‍