The Data PM's Guide to Conquering Data Chaos with Unified Data Models
Hi Data PMs,
Hope you are well! Thank you for being so supportive for the recent technical posts, these posts do take a ton of research, and writing modification. For example, this one has been sitting in my drafts for a month now. And, I am glad to be back with another important technical concept: Unified Data Models.
If you've ever felt overwhelmed by disjointed data sources as you develop your product, buckle up! We're about to make sense of the chaos.
What Exactly is a Unified Data Model?
Think of a Unified Data Model as the maestro of an orchestra which is all your databases serving your applications. Each section (or data source) has its own sound, but under the direction of a UDM, it all comes together in harmony. A UDM integrates various data types and sources into a single, organized framework, making data coherent and actionable across your entire organization.
Here’s a visual depiction of the world with and without a unified data model:
As you would assume, making unified data models will take considerable time/dedicated resources, and foresight, therefore, it’s an investment that you as a PM should evaluate whether it’s worth undertaking. Let’s first discuss the benefits of Unified Data Models, and then discuss, when an organization needs one.
The Strategic Advantage of a Unified Data Model
1. Cohesion & consistency: UDM breaks down barriers between data silos, making cross-functional analytics as smooth as your favorite jazz ensemble. But, it also, brings cohesion to your application queries. So, if your product has scaled to lots of complex queries, and you are now dealing with 2-3 databases powering multiple application modules, probably a good time to start thinking about a UDM.
2. Agility: Given the data model is laid out, and hopefully, well understood by the team, it brings agility in teams to deliver new features, or build new analytics reports. However, sometimes, thinking through how a new data object (that a feature demands) fits into the existing unified data model can slow a team down. This is to say, while a unified data model can bring efficiency, implementing it earlier when product teams are experimenting with multiple use cases in the early days of a product can slow you down. The data model maturity is only reached once you have tested enough.
3. Cost Efficiency: If a UDM is in place, the chances of different product teams trying to come up with their data requirements, and maintaining their databases is lower and therefore, reduces chances of duplication of data. Therefore, creating storage & compute efficiencies.
As a PM, you should think through whether you want to trade off growth for cost.
Key Qualities of a Stellar Unified Data Model
Based on insights from Mixpanel, here’s what makes a unified data model stand out technically in terms of architecture, and therefore, bake these parameters in your requirements:
1. Scalable: As your data grows in volume and diversity, your model should effortlessly expand to accommodate the increased objects or even added columns to your schema. Remember schema migrations can be a pain. (reference).
2. Flexible and Integratable: It should seamlessly mesh with new platforms and data sources as your tech stack evolves, minimizing manual effort and maintaining synchrony with critical analytics tools. If your UDM doesn’t support Flexibility and Integrability, gear yourself up for a 5-year migration project ahead.
3. Structured and Intuitive: Despite its comprehensive nature, the model must remain navigable and clear, enabling all relevant teams to extract value without getting lost in data complexity. If this doesn’t happen, your feature can create havoc in your UDM, or get ready to waste months in implementing your analytical features. Keeping a visual map or implementing a data catalog can be helpful for dev teams.
4. Accessible: Make sure your data is not just collected in one place but also readily available across your organization. This democratization ensures that all teams can leverage data insights for informed decision-making. A data catalog or an observability tool can again be helpful in this pursuit.
5. Automatable: To keep up with the growing data and its demands, your model should support automation across all stages—from data collection to analysis. If your UDM is likely to break, your system should alert the right teams.
6. Business Oriented: Also, it is important to keep the unified data model oriented to your business needs, otherwise you can create a complex model that slows your organization down.
These qualities ensure that your unified data model is not just a repository, but a dynamic and effective tool that empowers your entire organization.
B2B Tech Pioneers Utilizing Unified Data Models
Many big data teams have started utilizing unified data models for the benefits sited above, here are two examples from a B2B and a B2C product space.
Salesforce’s Customer 360 data model. Salesforce prescribes a Customer 360 data model for the customers to help them take full advantage of their data in Salesforce. It organizes different “data objects” under different “subject areas” around a customer such as sales, loyalty, market, engagement, etc. should be organized. Study this image below to understand how well it solves the needs of salesforce as a business, as well as sales leaders as a consumer of this information:
Netflix creates a unified data model to bring agility to the team. Netflix engineers explained why they created their unified data model and use their internal metadata management tool Metacat (read more here) to ensure conformance and visibility into their unified data model.
Netflix is probably a better example to study if you are on a journey to bring B2B or B2C teams some common definition.
Data Product Managers' Checklist for Implementing Unified Data Models
This is a good checklist to follow as you are thinking through the requirements for your Unified Data Model project.
Before writing requirements:
Audit Existing Data Infrastructure: Know what you have and what you need.
Prioritize Data Needs: Not all data is created equal. Focus on what drives value, make sure you have a list of prioritized use cases that you are supporting, what are the use cases you are likely to follow, describe them exhaustively for your dev team. The more you tell about this, the better the data model will be.
Engage with IT and Data Science Teams: They’re your best friends in the journey to data unification.
Know your constraints around tools they use, and infra restrictions they have. Your team is not going to move from Postgres to ClickHouse even though it makes sense for your project.
Consult the Security team too, so that you can follow the best practices internally.
As you are writing requirements.
Work with Engineers across teams. Work with engineers who work on the platform to understand the feasibility of the UDM, work with the engineers on application/data science team to understand whether their needs will be fulfilled, and only then propose a UDM.
Prototype and Test: Before finalizing requirements, implement a small-scale model to see real-world applications and tweak them as necessary.
Align stakeholders and gain buy-in across the organization, lest you face the wrath of disgruntled data deities.
Show business value in terms of efficiency gains for the platform and apps team, cost optimization, as well as the velocity of the org to ship products.
Plan for Continuous Improvement: Set up processes for regular updates and refinements to your requirements.
For the requirements itself.
Define various objects, interaction models, clear keys, and relationships in a well-defined way, and test your assumptions across different use cases.
Define querying, storage, and integration possibilities as well.
Define a comprehensive data governance strategy: establishing the laws that governs your data kingdom. Define clear data ownership and stewardship roles for the unified data model to safeguard its integrity.
Implement robust data quality and validation processes, for even the mightiest unified data model is rendered useless by corrupted data.
Define scalability, reliability, accessibility, and integrability metrics.
Plan to define SLO (Service Level Objective) and an owner for service requests for adding a new object, changing an existing object, introducing a new data source, updating the data model, documentation, etc.
Experiment and suggest a few tools to help with data model observability.
When to Shift Towards a Unified Data Model
Timing is Everything: Consider transitioning to a unified data model when:
Your data landscape becomes too complex to manage efficiently.
You need faster insights to keep up with market demands.
Your data-driven decisions need to be more accurate and timely.
In essence, if your data environment feels more like herding cats than conducting an orchestra, it’s time to consider a unified approach.
That’s all folks on the Unified Data Model! What other technical concepts would you like to know more about? Reply to this email, and I shall get your response :)
Cheers,
Richa
Your Chief Data Obsessor, The Data PM Gazette.