Let’s play out a scene that’s painfully familiar in EdTech. The Product Manager runs a classic A/B test on a new feature.
The results come in. Version A is the clear winner. It has a 40% higher completion rate and users spend 60% less time on the page. It’s a slam dunk. The team celebrates, high-fives over Slack, rolls out Version A, and moves on to the next ticket in the 'Feel-Good Feature' epic.
The problem? We didn't answer the only question that matters: Which version made the student learn more?
We’re all so addicted to the clean, quantitative validation of traditional software that we’ve forgotten we’re not building a CRM. We’re building for the human brain. And in EdTech, ‘Does it work?’ (like a well-oiled machine) is the wrong question.
The right question is: ‘Does it teach?’
We’re a data-driven culture, but we’re often worshipping the wrong metrics. Let’s start by separating out two important buckets of data:
Here’s the hard truth: These two are often in direct conflict.
Another example, to illustrate: In our house, we’re big fans of edutainment. We can spend hours on YouTube watching farmfluencers building goat shelters and milking stands, while believing that what we are doing is productive because we are ‘learning’ how to apply these methods on our own farm. It’s fun, it’s easy, and it’s time well spent.
But when we stand in the hardware shop a week later, ready to buy the materials and get going, we suddenly have no clue where to start. Why? Because we didn’t actually learn anything from those hours of content.

The most effective learning is hard. Cognitive science calls it ‘desirable difficulty’ (Bjork, 1994).
It’s the struggle of retrieving a memory, the frustration of applying a new concept, the mental friction that actually makes learning stick. But our validation methods are designed to eliminate friction.
We tend to grab the easiest data, which is almost always the wrong data. This leads us straight into the usability trap, where we obsess over metrics that feel productive but tell us nothing about learning. Here are some things we tend to over-emphasise in our industry.

This is a sugar-rush metric. It feels great to report '85% of users completed the new module!' But what does that mean? Did they 'complete' it by just clicking 'Next' five times? The easier, less-effective option will always win on completion.
If we care too much about completion rates, we're optimising for clicks, not cognition.
Here's an unpopular opinion of mine: learners are awful judges of what helps them learn. We want to believe the user is always right, but in learning, they often aren't.
Studies (like Soderstrom & Bjork, 2015) show that learners consistently prefer passive, easier methods (like watching videos) because they feel fluent and easy. They mistake that feeling of fluency for mastery. But 'desirable difficulty' (like forcing them to retrieve information) is what actually works, even though it feels harder and less satisfying.
When you ask 'Did you like this feature?' you are, in some ways, running a poll on which feature was the least effective.
This is the most meaningless metric of all.
The metric is useless. Stop reporting it.

Okay, so the standard SaaS playbook is out. What now? We can't all run multi-year, randomised controlled trials (RCTs) for every feature.
You don't have to. You just have to be more creative. You need practical, scrappy EdTech validation. Here are three methods you can start using this week.

This is the most basic, powerful tool we have for measuring learning efficacy.

This is my favourite way to blow up a useless user-interview script.

Your quiz and assessment logs are a goldmine, but you're probably looking at the wrong thing.
I know what you're thinking. 'This is all great. But my CPO just wants to see the engagement chart go up and to the right.'
This is the unruly part of the job. Your role isn't just to build features; it's to educate your own organisation.

Translate 'learning' into 'business.'
If your product actually works (i.e., learners get smarter, get promotions, pass their exams), they will stay, and they will become evangelists. Efficacy is your single best long-term retention and growth strategy. It's not 'fluffy' pedagogy; it's your core business asset.

Don't fight data with feelings. Fight bad data with better data. Build a simple dashboard with your new metrics.
This looks as serious and data-driven as any usability dashboard, but it's 100x more valuable.

Don't try to boil the ocean. Pick one upcoming feature. Tell your boss: 'Let me run our usual usability test, but I'm also going to run a scrappy efficacy test in parallel. Let's compare the findings.' This is a low-risk, high-reward proposal they can't refuse.
Before you start wireframing, can you confidently answer ‘yes’ to these questions?
Validating for usability is easy. It gives you clean charts, happy stakeholders, and the illusion of progress.
Validating for learning is messy. It's qualitative, it's slower, and it often gives you uncomfortable answers. It forces you to admit that your beautiful, 'intuitive,' Dribbble-worthy feature taught absolutely nothing.
But this is the job. You’re not just a feature-pusher. You are, whether you like it or not, an educator.
So the next time you're reviewing a feature, stop asking 'Does it work?'
Start asking 'Does it teach?'
Q: What's the main difference between usability and learning efficacy? A: Usability is about ease of use: Is the feature intuitive, fast, and frictionless? Efficacy is about the outcome: Does the feature cause a measurable change in the learner's knowledge or skill? The most effective learning often has more friction, not less.
Q: Can't I just use A/B testing for EdTech? A: You can, but you must measure the right thing. An A/B test that measures completion rate (a usability metric) will almost always favour the easiest, least effective option. A good EdTech A/B test would measure the post-test score between Group A and Group B.
Q: How long should a learning validation test take? A: It can be very scrappy. You can run a 5-question pre/post-test with a cohort of 20 users. You can add 'teach-back' questions to 5-10 of your normal user interviews. The goal isn't to get a statistically perfect sample size for an academic paper; it's to get a directional signal that's stronger than a simple 'like' button.