[Podcast] Stop repeating your user research & experimentation mistakes

Find out how to combine qualitative and quantitative sources to effectively understand what and why it’s happening.

In this podcast episode, join special guest Andrew Michael, CEO of Avrio, and our host Claudiu Murariu, CEO and Co-Founder of InnerTrends, as they dive into user research and experimentation.

Here’s what they talked about:

The most used research methods by product and data teams
The need for thorough documentation of research and experiments
Defining a failed experiment
Collaboration in experimentation
When data is valid, and when an experiment is complete
The biggest use cases of successful research collaboration

Welcome to The Data-led Podcast: A podcast dedicated to helping folks become data-led to build better products and services.

Want to listen to the entire episode? Check it out here:

Subscribe on your favorite streaming platform for more episodes:

Listen on Apple Podcasts Listen on Spotify Listen on Google Podcasts Listen on Stitcher

The most used research methods by product and data teams

[A]: When we hear the word data, our minds typically go to numbers and statistics. But really, there’s data throughout the organization, and data can live in our support and feedback tickets, in the user interviews that we conduct, and can even live in sales calls and onboarding experiences that we have with our users.

Pretty often, product teams will use tools like surveys and open ended questionnaires that they send out. There are also heat map session recordings, and then we get to the data, the A/B testing, and the statistical modeling around it.

So there are many different tools that we can use when we go about conducting research.

But the true power lies in being able to combine qualitative and quantitative sources to effectively understand what is happening and why it’s happening.

So the data and numbers tell us what’s going on in your product or service, and the qualitative side of it, like user interviews and feedback, tell you why those actions are happening. And when we combine these two is when we see the power of user research.

The need for thorough documentation of research and experiments

[A]: There are two sides to it – the first being user research that we conduct all along. It’s important for us to be able to share this knowledge throughout our team.

One of the things I noticed at Hotjar was that we were using OKRs (objectives and key results) as a framework. Typically, the company had maybe three main objectives for six months of the year. And we were all going after the same metrics. In our own individual practices, we were trying to move the needle on those numbers. So when we did conduct research, that information that we were finding was relevant to all teams.

This was actually the trigger that got me to start Avrio. One quarter, product would do an amazing job pulling together bits and pieces of user feedback; they’d have scoured and done research on review sites and pulled the data and insights together. And that’s some really useful information of how they were going to go about moving this specific needle. But nobody could take advantage of that information because it was lying in a static page that nobody else had access to, and it was a real shame.

What you mentioned with experimentation is the next level – you start to have this information, and now you want to go out and learn more.

When it comes to experimentation, the key is the culture within the organization, starting with:

Why do we run experiments to begin with?

At the core premise, and this is something we believed in deeply at Hotjar, experimentation wasn’t necessarily about seeing the needles moving up, but rather, experimentation was about learning.

When you approach it from this mindset, you can understand how important documentation becomes. A failed experiment is not because you didn’t hit your targets – because you’re not really trying to prove your hypothesis. You’re trying to learn something new from your own experiment.

So if you’re learning something new, you’re documenting it, and you can take action from that, you have a successful experiment on your hands. If you don’t learn something new, you have a failed experiment.

Failed experiments are rare. Normally, if you do the post-analysis and look into your experiments and your research enough, you can learn something new each time that will help give you new actions on the following point.

Documenting these learnings then really helps us to take a look back and say,

“Okay, why did we take the actions we took?”

and you can see a clear path of where we are today. And now, if you want to start designing new experiments, if you want to try new layouts or new onboarding paths, you have the learnings of the past to understand what worked and what didn’t.

What’s a failed experiment?

[A]: A failed experiment is when there’s no impact on the test, so you can’t really determine a winner or loser (otherwise known as a no-impact test).

In some cases, you do learn something in that one isn’t better than the other. But it more often happens when you might have data mistakes in the experiments. So you might ask,

“What are some of the common mistakes and challenges we have with experimentation?”

And a common answer is having good, clean data.

But it typically only happens when there’s a failure in the setup; when you haven’t set things up correctly, or you don’t have the metrics needed to really understand the learnings you wanted to pull out and derive.

So very rarely do you have failed experiments. It’s really only a failure of the initial setup or not having access to information or data that you wanted, because you’re always learning something new when you’re experimenting. Even if it means that what you thought was going to be a clear, outright winner is not in fact, but you still learned something new.

Failed experiments don’t have to do with the results themselves, but rather with the preparation.

[C]: I would add that a failed experiment for a survey is asking the wrong questions because if you ask the wrong questions, you’ll look at the answers and won’t know what to make of them.

So actually preparing for a good experiment is a lot harder than what most people expect, and it’s much harder because they do not properly document, prepare, or write things down.

[A]: Totally agree.

One of the key elements of effective experimentation is having a really good, solid template that you can start with, and you know and understand. That way each time you go to perform any new experiments, you start with what the end result should look like in mind, and know how you determine effectiveness.

What we have at Avrio now, and what we had at Hotjar, was an experiment template that really helped in guiding us as to how we’d go about setting up. And it started with a few different areas.

The first area was: what is the duration and sample size needed? We had a standard set of calculators – there are different calculators that you can get online, like AB Tasty and a few others – that allow you to calculate things like:

How long do you need to run your experiment for?

What is the start and end date going to be?

What are the sample sizes needed?

What are the baseline conversion rates you expect to have statistical confidence?

What is going to be the minimal detectable effect?

The MDE is really critical in experimentation for understanding how big of a lift up or down you need to have in order to know if the experiment was successful if we actually had statistical significance.

Then we’d start with a hypothesis, based on the evidence that we have – maybe there would be evidence that we have from the past, that we’d learned from previous experiments. We believe that changing xxx and then the condition xxx. So moving the navigation from x to y, and then we would specify for whom this was relevant (in Hotjar’s case: was it for users, sites, or accounts). And then,
What would be the expected outcome? So in our minds, we’d already decided what we thought was going to happen. And we’d know it was true when we saw effects xxx happen to this metric. And we would be that specific before we even got started with experiments.

So we had a really clear outline of:

What we were trying to achieve from the start,
What success looked like, and
The evidence that supported that.

Let’s talk about collaboration when doing experiments

We have these internal biases of wanting to see a “winner” (winning, again, is about learning). But you want to see the success and an effective outcome from your experiments.

Having different team members chime in to this process really helps remove some of your own internal biases. But let’s not forget that the goal is to learn, it’s not necessarily to call the experiment “successful” or “a winner.”

Another thing to note is that if we had really high confidence in something, we wouldn’t run an experiment for that. Because experimentation is expensive, it takes time, and it takes resources to set up and perform properly. So when there was high conviction within the team because we had a lot of supporting evidence, we wouldn’t consider running that experiment. Typically, we might just launch something and then monitor the metrics and data to make sure that we didn’t really break anything significant.

Having a tool like Avrio where you can actually do this process with your team and be able to collaborate on these experiments, hypotheses, pull in the different data sources from across your tool stack, and bring these insights together allows you to give your team the full perspective of why you’re running the experiment, what problems you’re trying to solve, if you’re using OKRs, what is the OKR that you expect to move, and if there’s any evidence that helps support our assumptions.

It removes that bias that you have of your own internal use case and wanting to win, and brings a new perspective for your team.

[C]: We used to do experiments in silos. So the product team would do their own experiments, while the marketing would do their experiments, etc. But starting this quarter, we’ve decided to go cross functional. So experimenting is a company function, and when we run an experiment everybody knows about it and can participate even though it may run on the product or the marketing side.

I tend to be very confident when I communicate with my team, so I’ll say

“Hey, this is what’s happening, I talked to the customer, and here’s what needs to be done.”

But I have a really great team member who’ll ask:

“What data validates that?”

Being in a data company, I can only appreciate that question. They’re really focused on validation, and they’ll look at both weak and strong validation points. If I’ve already talked to the customer and they said, “This doesn’t matter; it’s not valid, we need to have another validation point from somebody else.”

When do you consider data to be valid? What do you consider to be the end of an experiment?

There are a few different aspects to consider here.

When it comes to the type of data, there’s different amounts that you need. With quantitative data, you typically need quite a lot of data to have statistical significance.

For a lot of early stage startups that are just getting off the ground; we typically don’t have enough data to rely on statistically significant results. So when we see the data and patterns, we can use those as signals, but not necessarily as the truth.

We rely a lot more, then, on the qualitative side of trying to understand and listen to customers. You don’t really need to speak to more than 10 people to get a good signal or understanding from that perspective. So if you’re hearing it from two or three people, the likelihood is that you’re hearing it from 20, 30, 40 people; there’s a multiplying effect.

When we think about the types of feedback we receive, people are much more likely to complain. But very rarely do people actually bother to complain, at the same time. So when somebody does complain, you can almost immediately multiply that five or 10 times over.

Recently, we had people reaching out saying “Hey, I haven’t received my authentication email. What’s going on?” This happened once and I thought, I think this is an issue. It happened again and I thought, I think this is a big issue. And as it turned out, it was a serious issue that people weren’t receiving these emails through a specific path.

So it depends on the severity of the situation, and the types of feedback that are coming through in order to be able to understand, “Is this something we need to be taking seriously now, or is this just another data point that we can add to the arsenal and come back to?”

You can and should document these things. With Avrio, it’s quick and easy to capture a piece of feedback or create a highlight from your user interview that you’ve conducted, give it a tag, and then later, you can come back and actually search through the repository and say, “Show me every time somebody mentioned a specific bug,” either through feedback, user interview, or an email.

In that sense, you can draw on “now, it’s not only two or three users, we’ve heard it in some of the user onboarding sessions, we’ve had some of the feedback,” and you can start to build a more powerful story that way.

There’s no black and white answer that says: this is enough data; it does require a bit of intuition and feeling. But there are certain areas where we can understand that some things are more serious than others.

[C]: The way I address data quantity, especially for startups is: when you look at the data set and want to ask yourself,

“Is this enough?”

double it, and if the numbers don’t change, it means it’s enough.

But more importantly, focus on the right questions. I can talk with three customers, but if I handpick them, I can get any story I want out of the conversation. I can get: my product is amazing, or I can get: my product needs to be redone immediately, if I’m guiding that discussion.

That’s why documenting upfront, creating your template and determining what questions you’re going to ask and who you’re going to ask those questions is much more important than the answers themselves, because they set the context.

Is it true that people in our industry try to validate the first proof they see of something being right?

[A]: This goes back to the discussion of: what is experimentation? It’s a win and a loss.

Many people go in thinking,

“I want to prove that I’m right.”

Instead of,

“We need to invalidate this experiment so we can understand their case and learn from it.”

If there are valid results, that’s great. But it’s not about that; it’s about the learning that happens.

There’s a really good saying about data: “If you torture data long enough, it will confess to anything.”

The same principle applies when it comes to good survey design and bringing things together; it’s a very powerful skill to know and understand what the right questions to be asking are, and avoid your own internal biases.

What are the biggest use cases? What’s the impact you see from companies that do this well?

[A]: There are quite a few good cases now, and there are a lot of companies putting good content out. Spotify is one of the companies leading in the space of bringing together mixed methods research, combining data science and analytics with user research.

The really powerful thing is when your data tells you what’s happening with your products, and your customers tell you why.

For example, when you have high traffic to your pricing page and high bounce rates. The data is telling you what’s going on: you’ve got a lot of traffic, but there’s a high bounce rate. But it’s not necessarily telling you why that has happened. But by bringing in more qualitative sources like session recordings or launching a poll on the page asking: Is there anything we can help you with on this pricing page? Or, is there something that is stopping you from proceeding today? You get a qualitative understanding of what’s happening or what’s not happening.

This is where it’s really, really powerful because it gives you the full picture of understanding what’s happening and why it’s happening, and it helps you fulfill new hypotheses and understanding of what you can do to help improve those experiences for your customers.

[C]: I’m a data person that believes in following your gut feeling. Most data people usually go against gut feeling, but I think your gut feeling is important because it tells you what should be analyzed, so you go to the data, analyze, and from that data you find correlations.

And then you need to find causation (the why), and you find causation through experimentation. Data alone never gives you causation; it only gives you correlation (the what). So you go into an experiment and you do an A/B test, or you go into user interviews, and you find the causation, which is the why you are talking about. You close the circle – and when you close the circle, your gut feeling becomes stronger in asking better questions next time.

Wrap-up

Experimentation is about learning.
Document your experiments upfront to make sure you’re setting things up properly and asking the right questions to avoid your own internal biases.
Encourage team collaboration.
Combine qualitative / user research (why / causation) with quantitative / data (what / correlation) research. Validate qualitative insights with quantitative research, and vice-versa.
Know which tools to use and when for the best results – sometimes, a long and expensive experiment can be easily replaced by a simple survey or poll.

About our guest, Andrew Michael

Andrew Michael is an entrepreneur with 12 years of experience in digital growth companies as a founder or senior manager with a focus on customer retention, product development, strategic planning, and marketing.

He is the CEO and Co-Founder of Avrio and host of The Churn.FM podcast.

Have any additional questions about the role data plays in proactive customer success, or any other topics you would like to hear covered on The Data-Led Podcast? Comment below! We can’t wait to hear from you.

Subscribe for more episodes on your favorite streaming platform or subscribe for InnerTrends updates here.

Listen on Apple Podcasts Listen on Spotify Listen on Google Podcasts Listen on Stitcher

Acquisition andrew michael avrio churn.fm data team data validation data-led documentation experiment experimentation PLG Podcast product Product-led Growth qualitative research quantitative research research Retention The Data-led Podcast user experience user research valid data

[Podcast] Stop repeating your user research & experimentation mistakes

The most used research methods by product and data teams

The need for thorough documentation of research and experiments

What’s a failed experiment?

Let’s talk about collaboration when doing experiments

When do you consider data to be valid? What do you consider to be the end of an experiment?

Is it true that people in our industry try to validate the first proof they see of something being right?

What are the biggest use cases? What’s the impact you see from companies that do this well?

Wrap-up

About our guest, Andrew Michael

Request offer for Data Deep Dive Analysis

Data Deep Dives

Company

Get in touch