Hey Data PMs!
Welcome back to another edition of the Data PM Gazette. Things have been busy at work — hence the late post. However, this edition is special, it combines and talks about two different things that I like: product and analytics. I got into product because I love doing analytics and telling stories from data that otherwise wasn’t obvious or possible. So, product analytics work in general is my sweet spot as a PM.
When I first came across tools such as Mixpanel, Amplitude, and FullStory, I thought the product analytics space had matured so much, especially when I was managing products with which the users directly interacted — a visual interface. However, my perspective changed completely when I was hit with finding data/metrics for platform products. This is how I felt:
In this post, I talk about how you can define and use the right metrics for analyzing your platform metrics.
What layers constitute a data platform?
Depending on what makes your platform, you can have one or more of the following “layers” or sub-parts of your stack:
Administrative Tools: Internal/Customer-facing tools that help you configure data sources, flows, users, etc.
Querying Layer: An exclusive Query Engine or an internally grown mechanism (such as a library) to query data correctly.
ETL / Data Transformation Tools: Internal services or off-the-shelf tools for conducting ETL jobs.
Data Store: Either a simple database (Postgres, Mongo, etc.) or a data warehouse to store data.
Storage / Compute / Cloud Services: Basic cloud services such as AWS to host the data, or various other tools, etc.
There could be more services such as a data catalog, analytics tools for data science teams, etc. However, these different layers together create a data platform, and as a PM of a Platform, it’s your job to understand how your platform is doing, and what can you fine-tune from time to time.
What kind of data does a product person want?
How do you tell whether your data platform that powers multiple workflows is performing well? Two things happen with data platform products in general, either they become oblivious (if the performance is at par or better than what users expect) or they become the biggest thorn if the users are not getting the expected performance they need. And, as a PM, your job is to make the platform oblivious. Therefore, your platform should be performant, reliable, and scalable, in short, known as PSR.
So, as a PM, you need to set up the PSR metrics to know how your platform contributes to business. You must know the following:
(a) define what constitutes the platform,
(b) which major services contribute to the performance of the platform? There will be 100s of APIs, but some APIs/Services are more crucial than others.
(c) how to best measure the performance? how will you summarize that performance?
(d) What slices will you make for the data? do you want to see the data per customer? Or, per service is enough? Or, by category of the customers? Or, by volume/size of data?
(e) who will you review that performance with?
(f) how can that data make a business impact?
How can you measure platform metrics?
Most of the data that’s available is usually logged by engineers or metrics created in some tool and usually caters to an Engineering perspective. You need to make a product twist to the same information. The use case for both Engineering and Product is “diagnostic” but the lens is different. Engineering looks at whether or not services are working as intended, while product usually thinks “what can be improved” and “where can we bring in the maximum ROI of the improvement.”
Here’s a list of things that you can measure and how the Engineering perspective differs from the Product perspective and what you need to know across different categories mentioned below:
How can you set up something like this?
Having done a similar exercise recently, I would do the following:
Step 1: Dirty Manual Work
First, do the manual dirty labor. First bring in some data across sources (so that engineers don’t have to tweak APIs, build new tools, etc.)
Bring all the data that you need in an Excel sheet for each customer and marry some business data — size of the customer, # of users, # of active users, Current FY ARR, etc. to know the business knobs, and then create performance, scale, and reliability metrics.
Step 2: Automate the metrics and monitoring in different tools
Whether your various teams use Sumo, DataDog, Grafana, etc. First, try and automatically, set these up in various tools, even though that would mean inconvenience for the product person, it would make sure that you get Engineering bandwidth to at least start tracking this data regularly, and you can start making informed decisions.
Step 3: Setup a custom solution to bring the data together
Next, maybe use a datalake / data warehouse-based solution to bring all the data together in one tool, especially if you want to start exposing this data to Execs so that you are not making multiple presentations every month just to report on the current state of affairs.
Step 4: Close product workflow loops. (almost will never happen)
Take control of the workflows that engineers own so that you can control the experience for the end customer: scaling pipelines, creating indexes, creating requests for specific queries that are taking too long, etc.
🔗 Links of the month
- ’s take on Starburst’s journey is a must-read.
- ’s great deep-dive on “Where to build that metric?” could be a handy resource to pass on to your data science team if you are working with them to set up metrics.
And, lastly, heard
’s latest podcast about ’s new book Make Time - it’s about focusing on a “highlight” and how to build “laser focus” to get to that highlight every day.
Hope this helps in understanding how to do Product Analytics for a Data Platform. :)
Cheers,
Richa
Your Chief Data Obsessor, The Data PM Gazette.