Many teams now run constant experiments. Forms are tweaked, buttons are tested, copy is swapped. Dashboards fill up with charts and winners.
What is less clear is what all of this is supposed to be teaching.
Without a shared model of how people behave and why, experiments turn into a stream of local tweaks. Some changes help, some do not, but very little of it builds understanding. The same problems reappear in new places. Decisions remain hard to explain.
WHO IS THIS NOTE FOR
This note is for product, marketing and digital teams who:
Are running a steady volume of tests but cannot clearly state what they have learned this year
See local wins that do not add up to meaningful movement in the measures that matter
Feel pressure to “keep the tests running” even when the pipeline is thin on real questions
This field note looks at how experiments without a model show up in practice, why they are so common and what to do if you suspect your experimentation programme has drifted into busywork.
By a model we mean a simple, explicit view of:
The model does not need to be perfect. It does need to exist.
Experiments are then ways of testing parts of that model. If you do not have one, tests tend to answer shallow questions:
The results are hard to apply anywhere else.
Even teams that start well can drift. Common patterns include:
Over time, experiments become a series of unrelated micro changes. The backlog is full, but the story about users is thin.
The symptoms are familiar.
From a distance it looks like a busy programme. From close up it feels like turning a crank.
NOTE
A high volume of experiments can make an organisation feel evidence led. If no one can explain the assumptions behind those tests, it is not evidence. It is decoration.
When teams suspect something is off, there are a few usual responses.
More templates are added. Hypotheses are forced into standard wording. Review steps multiply.
Process discipline is useful. It does not create a model by itself. You can run very neat experiments that still answer shallow questions.
Teams focus on test ideas that are likely to “win” and avoid those that may be neutral or negative.
This can improve local metrics, but often at the cost of learning. Many important questions do not have tidy positive outcomes in the short term.
Ownership of experimentation is moved to a central team that approves or rejects tests.
That can reduce noise, but it does not guarantee that tests are linked to a shared understanding of behaviour. It can also slow things down without improving quality.
Instead of asking “how do we run more tests”, it is often more useful to ask:
If you cannot answer those questions clearly, the problem is the model, not the tooling.
A product team responsible for sign ups and trial conversion had a healthy looking experimentation programme.
Over eighteen months they had:
The win rate was respectable. Local conversion metrics improved by a few percentage points.
When they tried to summarise what they had learned, the story was thin. They had:
Under pressure to keep “showing impact”, the backlog tilted toward low risk cosmetic tests.
When the team stepped back and looked at sign up behaviour by segment and upstream source, they discovered that a significant share of trials that looked healthy at the funnel level never activated in the product. Most tests had been aimed at the wrong part of the problem.
A healthier pattern is simple, if not always easy.
The model will never be perfect. It will change as you learn. The point is to have something that experiments can refine.
If you suspect your experiments have drifted, a modest reset can help.
For example: first purchase, repeat purchase, trial activation, plan change, cancellation.
For each behaviour, capture your working beliefs. What do you think drives it. What do you think blocks it.
Look at the past few months of experiments. For each one, note which belief or behaviour it was meant to address.
Identify behaviours with lots of tests but shallow understanding, and important behaviours with very few tests.
Use the gaps to shape the next round of test ideas. Add research where needed, instead of forcing an experiment where there is no clear question.
This is not about rewriting your entire programme. It is about gently pointing experiments back at questions that matter.
EXAMPLE:
One organisation realised that almost all of its experiments were focused on the top of the purchase funnel, because those were the easiest journeys to test.
When they mapped tests against behaviours, it became obvious that there were almost none aimed at first use or early repeat behaviour, even though retention was the real concern. The next quarter of work shifted a portion of the backlog toward that early usage window, supported by fresh research instead of yet another homepage test.
Experimentation is often sold as a way to de risk decisions. In practice it can become another source of pressure.
When teams are judged by the number of tests they run or the size of short term gains, it is hard to protect space for model building. Yet without that space, the tests become less useful over time.
A small number of well designed experiments, linked to a clear model, will usually do more for performance and understanding than a large number of disconnected tweaks.
From a Corpus perspective, experiments are one part of a wider system of learning.
When we work with teams in this area, we typically:
The goal is not to run more experiments. It is to build enough shared understanding that when you do run them, you know why, and you know what to do with the results.
