Why is advanced data lineage fundamental for Financial Services organizations?
Understand why mapping data flows through your business is critical for success
Cloud migrations, regulatory compliance and AI are all business-critical initiatives, that are also heavily reliant on data. AI in particular is a topic that is high on the priorities of CIOs and CDOs. It’s the fastest-growing priority for organizations. And yet key to its success is accurate data. Analyst firm Gartner says in a blog on AI readiness, “For data to be AI-ready, it must meet five criteria. It is secure, enriched, fair, accurate and is governed by the lighthouse principles.” The challenge is that most companies can’t see in which systems all their data – or AI for that matter – is used – and so can’t know whether it meets these criteria. Before using AI, and in order to attest to its accuracy, you need to have an overall picture of all your systems and where data flows between them. You need to understand details about datasets that flow into your AI models, as well as what has happened to your data before it gets into an AI model and is used for all sorts of business use cases. The same applies for understanding data used in core reports such as annual reports, for core decision making, cloud migrations, and in order to stay compliant with constantly evolving data governance regulations.
For those working on machine learning and AI products, you might identify datasets and train models – and only then, realise there is a quality issue, or that it contains information that you shouldn’t use. It’s hard for those using AI to understand where data comes from, where it is used and where the issue is – resulting in delayed projects. Then once you have gone live, someone 5 departments away might make a change, and your products stop working. Poor data is often the cause of the problem – and yet most companies have little insight into the exact root cause of such an issue, as they can’t trace the data’s use back through its journey from source, through transformations, to use case.
But there is a solution – and it’s advanced data lineage from Solidatus. Solidatus helps you discover, assess and prove the complete journey of your data from its source, through multiple systems, to business use case – so you can understand and truly trust the data at the foundation of your business decisions, governance, transformation projects, AI, and more. We do this through a visual map of your data’s journey and transformations through all systems in your organization:
- From a complete view of the source and destination of data through every system – not just a subset (end-to-end visualization)
- So you can drill deep into a column in a table, with full insight into every transformation – so you can analyze the root cause and business impact of issues and changes (fine-grain lineage)
- And look back to show how it was in the past, how you’ve resolved issues, which systems were affected – and how it might be with new systems and technologies in future (bi-temporal version control)
This isn’t a static snapshot. It’s interactive and constantly updated, enabling tasks to be assigned, alerts received and more.
Not just tech for tech’s sake, but data lineage as a fundamental business enabler
What is important however, is for IT departments working on understanding their data flows, to know the importance this has on business stakeholders. This isn’t just a technology project for IT. Understanding where data comes from, how it flows and the quality, ownership, and whether it’s fit for purpose is necessary so that key business stakeholders can make their business decisions.
For example, how do you know in a bank, that the margins you’re reporting are accurate? These are the end results of calculations from multiple, disparate sources. You need to prove that your margin calculations are correct and rely on accurate data from the IT team. If someone makes a change in a single application, it may impact the end result. It isn’t just an IT issue, but crucial for business stakeholders and decision makers. It can be used for impact analysis, querying, reporting and more.
Regarding impact analysis, this is particularly useful in cloud transformation projects. Some customers want to find out what will be impacted when a new system is put in, which system owners will be impacted, and which data needs to be protected. For most cloud transformations, you want to understand your current data map as a first step. Solidatus allows you to scan all your systems and data flows as they are currently – and plan a future estate to be sure of not breaking things downstream. You can predict future states of data flows – to fork off the current version, make the changes, see what will happen on this parallel track and then merge it back. It’s forward looking and predictive, as well as able to look backwards.
Many regulations, such as BCBS 239, the EU AI Act, GDPR, and many more, require you to have control over your data and your customers’ data, and how it is used. BCBS 239 stipulates categorically that you must have a ‘complete’ view of your data, as well as be able to see it down to a granular, or ‘attribute’ level. We’ll cover this briefly in the next section, but you can read more in our blog about BCBS239 and data lineage.
Regarding regulations around AI – for the UK, the previous UK government adopted a framework, rather than a legislative approach. But it is understood per the King’s Speech, that the new government intends stricter rules for AI. As regards Europe, the EU AI Act was approved in May 2024 by the European Parliament. It lists AI use under various risk headings, including unacceptable risk – which are banned – and high risk. For those not considered high risk like generative AI, there are transparency requirements. These include disclosing that the content was created using generative AI, designing the model to prevent it from generating illegal content and publishing summaries of copyrighted data used by training. These regulations require you to be transparent about your use of AI, as well as about the detail regarding datasets used in AI. In order to know where you use AI, you can use data lineage to trace it. You can classify and tag AI models just as you would your data – and trace any information regarding it. For example, you may want to name an AI model as high or low risk, or for Solidatus to calculate a certain number of uses in critical business systems as quantifying a high risk. Advanced lineage will be critical in supporting AI governance and trust. And it will give you confidence to know where in your business AI is being used.
What is Solidatus’ approach and how does it differ from others?
Historically it was deemed sufficient to understand what is going on in terms of your data flows and systems only at a table-to-table level. Now it is more than that and you need to understand it at a column -or field – level. It’s far more granular. Without going to this level of depth, you can’t do proper root cause analysis and rapidly find out the cause of a data issue. You could spend days, weeks or months trying to find it – and businesses these days don’t have that amount of time. You also need to see data’s transformations. Some systems hide them from you, but you need to see what has changed, to solve the problem of any issues. Without insight, there is no solution.
You also need to see all your systems. Seeing just a few near each other isn’t enough for accuracy in business decisions and reporting, in AI and more. You need to be able to trace all the way back from say an error in an annual report to the source – to see the full view of the data’s flow to that specific use through every system. It’s referred to as a requirement in BCBS 239 regulations as a ‘complete’ view and by some as ‘end-to-end.’
Solidatus has always approached lineage from this advanced perspective of fine- grain, end-to end. We always believed it was not good enough to simply show table to table – referred to as coarse- grain format in many data catalog systems, where lineage is more of an afterthought. We always understood the importance of looking at a field level within a table – to see all the transformations and to create and visualizse that all the way from the source of data to its targeted use – across hundreds of systems. Additionally, as we know the inherent business value of this, we’ve always layered over business context, such as:
- Which data policies are applied to certain technical systems?
- Do any of these have personally identifiable information (PII) or sensitive data?
- What kinds of regulations are impacting certain systems and data flows?
So we have always had the product that lets you view hundreds of systems linked together with the business context of policy, sensitivity etc, as well as the ability to drill down deep.
Some might wonder how much time it takes to get this going. Customers are used to this process of creating map views of data flows as being a manual one. With Solidatus however, there is so much more automation – and one can see implementations within as little as 3 months. That’s because we have connectors which extract data from a host of systems and technology. And what we call ‘auto mappers’, which then suggest the data flows between the systems – making it far less manual.
A business user can very quickly have this picture of everything, bringing rapid value to the business and the ability to quickly resolve any issues in underlying data. And that’s why data lineage is so important to the business – be it cloud migrations, AI or regulatory compliance.
Summary
Advanced data lineage is vital for business-critical initiatives like AI, cloud migrations, and regulatory compliance. Solidatus delivers granular, end-to-end visibility into data flows, enabling businesses to ensure data accuracy, traceability, and governance. With automation and rapid implementation, it empowers decision-makers to trust their data.