top of page

Reflections on Single Source of Truth for Data

  • Writer: Nitish Mathew
    Nitish Mathew
  • Oct 24, 2024
  • 5 min read

Updated: Apr 5


Clean data is like pure water from from a single stream

Single Source of Truth (SSOT) for data is a difficult but achievable organizational challenge. It requires leaders' collaboration to determine the most suitable data provider, work closely to smoothly transition out redundant data pipelines, celebrate and communicate efficiency wins, and over time, make it a part of organizational data culture. The solution to SSOT may not lie in adding more technology, but rather, in streamlining existing technology.


Achieving SSOT via Simplification


This problem needs to be solved one table or subject area at a time. Mutual trust is a pre-requisite. Trust is built over time, with consistent demonstration of competence, transparent communication and actions supporting the common good.  If you are one of the leads contributing to Multiple Versions of Truth(MVOT), here is an approach:

  1. Start with a list of widely accepted organizational pain points due to MVOT.

  2. Shortlist cases that have only a single duplicate (if possible) source.

  3. Select the simplest use case and kick off a discussion to figure out the best go-forward option. Remember that this is not a negotiation where you go in with the vested interest of keeping your pipeline. You should communicate and act in a way that demonstrates your true intent of optimizing for the whole company. Some questions to help drive the discussion:

    1. If the company was to start from scratch what would have been the ideal architecture? This helps people take a step back, and think about the problem from a fresh perspective. Often times we are prisoners of the current state architecture. Taking a carte blanche approach can facilitate new ideas to emerge.

    2. Are there existing economies of scale? For example, if they have built deep integrations into, say risk processing third party vendors, and have 10 of them already, and you have one, it may make sense to shutdown your pipeline and transition it to them. This reduces security risks as well with lesser points of entry, not to mention savings in engineering effort and processing costs.

    3. Respect Conway's Law and play to its strengths. If the company leadership has taken the step to reorganize in a certain way, your job is to make that effort successful. So, move things to new teams if those teams have been clearly set up to solve specific challenges. The faster you can do this, and support them to be successful, the better your reputation will be.

    4. What can improve the data consumer experience? From a downstream consumer perspective, is there an opportunity to make it easier for them to have a single team to talk to? You can directly ask the consumer teams for their preference. Enabling them have a say in the end-state gets buy-in from them, for any migration where they need to dedicate effort in updating their downstream SQL scripts, dashboards etc. For example, if 90% of finance needs are served by a team, moving the rest to there makes life easier for them as they have a single team to have to interact with.

    5. What are the current, organizationally widely known, pain points with data? Look at the data pipelines with most challenges with data quality concerns, stability issues, cost etc. Discuss how can they be solved. If it emerges that a team is in a better place with a certain class of problems, that may be an idea to explore.

  4. Once you come to an agreement, initiate the migration and decommission the redundant pipeline. Sometimes, decommissioning a pipeline, without breaking existing downstream processes, that may take years to totally remove, may be achieved by changing materialized tables to views that points to the new SSOT. Intentionally take on technical debt, with a view to pay it off overtime.

  5. Upon completion, communicate the successful transition. Highlight associated advantages, such as cost reduction in storage, processing, backup, engineering time, and enhanced efficiency for analysts. Call out the person who agreed to shutdown their pipeline as an example of a true leader. This communication is crucial to get the wider community to get inspired to think if this may be a pattern that they can also adopt.


Repeat the steps until you have addressed all items on the shortlist. Don't get carried away and attempt to extend this to all datasets in the company for the sake of analytical purity. Move on to more relevant things.


Eliminating Data Pipelines to Support SSOT helps Data Engineering Teams Elevate Their Value


Our world keeps changing. The market your company operates in will evolve. Technology evolves. As Fivetran eloquently put it in 2020, in Data engineers don’t want to do ETL, Data Engineers who evolve with technology, will move on to higher value work. Gen AI is the Fivetran-like disrupter of 2024. What should Data Engineers do now? As Barr Moses articulated in How to avoid being replaced by a robot, lean into Gen AI and not be afraid . That requires upskilling. What can you do to make time for learning? Get rid of all pipelines that are not needed, or hand it over to other teams who have already solved it or would like to. You have already built and run them, and there is nothing more for you to learn there! So, partner with your manager, do your job the best you can, be curious about emergent problems, pick new skills: This is the The Magic Loop  for career growth (Hat tip: Ethan Evans).


Engineering Manager - The Most Critical Person in Bringing This Mindset Shift


In April 2021, Nature published a fascinating article Adding is favoured over subtracting in problem solving. They asked people to make a Lego structure stronger. They observed that it never occurred to people that removing blocks is a simpler and more effective solution, despite giving them clues. See Operations: The Power Of Subtraction. So, motivating people to eliminate things is hard as it goes against not just typical org culture, but also how we are hardwired to think as humans. This is where managers can help. Instead of trying to tackle such a natural human inclination head on, coach people to developing life skills, to approach change as an opportunity, and embrace the following mindset (Source: The Art Of Continuous Growth - Reinvent Yourself And Kill Your Ego).


In a world where the pace of change is accelerating, the ability to reinvent oneself becomes a strategic advantage. It fosters resilience, creativity, and an entrepreneurial spirit—qualities that are increasingly valuable in navigating the complexities of modern life


Some Reasons for Multiple Versions of Truth(MVOT)


There are various reasons for this phenomenon, which depend on the stage of a company's development and its organizational culture. In start-ups and scale-ups, in the absence of a dedicated central team for data provisioning, different teams independently create their own pipelines from the source, out of necessity. In some organizations, although there is a central team, they struggle to effectively manage expectations, leading other teams to develop their pipelines. In certain instances, individuals opt to construct end-to-end pipelines on their own to maintain full control over the process, facilitate modifications, or due to a lack of trust in data from others. They may emerge due to compliance, legal, privacy, and security considerations.


Behavioral factors also play a role. Teams that have overseen a data pipeline for a long time develop a sense of ownership, pride, and fulfillment, making it challenging for them to relinquish control due to emotional attachment. This sentiment extends beyond data to encompass tooling as well. Conway's Law is equally relevant. These strong organizational motivations contribute to the perpetuation of redundant pipelines. Understanding these human incentives, and organizational behaviors is crucial for the success of any SSOT project.


Is MVOT always bad?


No. The primary measure of success for data programs is their effectiveness in aiding companies achieve their business objectives through their operational lifetime. Circumstances evolve. The Defense vs Offense strategy framework outlined in the influential 2017 HBR Article What's your Data Strategy? provides a helpful perspective. If MVOT is effective, maintain the status quo. As Prof. Michael Porter emphasized in his groundbreaking 1996 HBR article, What is Strategy?, 'The essence of strategy is choosing what not to do'.



Photo by zhang kaiyv on Unsplash

Comments


bottom of page