Belgian software company helps digital businesses use data with confidence
Today, the value of data simply cannot be underestimated, especially from a business perspective. Data tells us a lot about a company’s processes and activities. It can tell you if they’re going in the right direction, reveal what features can be improved, and point out how they could improve.
Such is its importance; entire businesses are now built on data. Companies like Spotify, Amazon and Deliveroo have shown that it is possible to disrupt markets at scale with software and data. But in a world producing 5 exabytes of data every day, how can companies be sure that all that data they collect, process, and retain is fit for purpose? Being data driven is one thing, but what if the data is inaccurate or wrong in some way?
And what is the potential impact on consumers that many of these companies now depend on data? According to a recent Prosper Insights & Analytics survey, a growing number of adults aged 18 or older are now taking steps to protect their digital privacy, whether it’s the 30.5% who have turned off mobile tracking (up from 5 .1% YoY), or even posting less on social media (28%).
To answer these questions, I spoke with Maarten Masschelein, CEO and co-founder of Soda, a data reliability and observability platform, to understand how data-driven decision-making and trust fuels the digital world.
Gary Drenik: Tell me a bit about your background and why you started Soda.
Martin Masschelein: I was one of the first employees of a data governance company when I started thinking about the importance of understanding data health, especially for companies that were increasingly reliant on data for decision making.
Back then, roles like chief data officer didn’t exist, and organizations struggled to understand what was happening with their data. The tools were outdated and there was no real way to have observability of data quality and reliability, which meant that the underlying issues were left unresolved.
These challenges inspired us to launch Soda, a data reliability and observability platform that helps companies define what good data looks like and resolve issues quickly, before they have an impact.
Drenik: How would you define data quality and how has that definition evolved as digital businesses have become increasingly dependent on data?
Masschelein: A good way to contextualize this is to take the example of Deliveroo, the food delivery company. Deliveroo is essentially a “data product”, which means that it has created software that uses data to generate results such as automated decision making, algorithms, derived data, etc. The data allows Deliveroo to give its customers real-time updates on delivery times that, certainly at first, were unique and integral to their business model. It is not a service that can be built on human instinct, but on the confident use of quality data.
For a system like this to work, there are two key considerations, and the first is that the data must be reliable. Deliveroo uses datasets and historical data from restaurants on things like how long meals cook, how many orders they have on a given day and how many passengers are nearby. These data sets must be reliable, recent and accurate or the product will fail.
The second aspect is to take a proactive view and look at new things we can do with data. This is where data quality comes in. Is the data we have today suitable for creating a new product or starting to automate? If there isn’t a good process in place to collect and store the data, chances are it won’t be ready for use.
Data quality is about enabling new insights and aligning data producers, in other words, the people who collect the data, with data consumers, i.e. the people who use the data , to make sure it works for them and continues to work for them.
Drenik: In what ways could organizations operating in the digital world use data more confidently to drive their business?
Masschelein: There are a few good use cases. Let’s start with a retail example: Black Friday. The goal of Black Friday is to sell as much as possible to as many people as possible, which requires automated processes for recommending and classifying customers. Retailers are able to develop a very precise classification of their customers as they click through websites, which helps maximize sales opportunities, but if the controls and checks in place are not monitored, the data used to inform the recommendation engine may be wrong. , which means it will not work properly.
Another example is in financial services where there is currently huge interest in green investments based on data that was not available before. Until recently, we didn’t have data on things like a company’s carbon footprint, but now there’s an entire industry based on providing that kind of data. This is an area that is very “patchy” in terms of data quality, as it constantly changes as we find new ways to collect data. As companies become more data-driven, they’re going to have to improve their “explainability” as more and more people make decisions to invest or divest based on this kind of data.
Drenik: What is data observability and how can it help these organizations in their quest to get quality data they can trust?
Masschelein: Data observability is fundamentally about having the data available and ready to use, and the notion of being less reactive and more proactive with the data. To do this, there must be a discipline of knowing there is a problem with a data set before it can impact the business.
We created Soda because we believe that most organizations today don’t have the tools to facilitate this discipline or the systems in place to identify and resolve issues with their data. As data moves through an organization, it inevitably changes as a result of things like software upgrades, schema changes, or even just human error. Over time, it becomes more and more difficult to coordinate these changes, and the problems worsen as systems continue to operate and process bad data.
By giving data quality teams a platform to track their health and workflow to collaboratively identify and resolve issues as they arise, we can prescriptively address this issue by creating a common framework for Set and manage expectations for changing data behavior. This creates consistency, clarity, and trust between the teams collecting the data and those consuming the data, eliminating guesswork and continuously improving data quality.
Drenik: And finally, how have you seen data management evolve by learning from other fields such as system observability, software engineering, etc. ?
Masschelein: that’s a great question. It’s really fascinating to see how the principles and practices of things like site reliability engineering, software engineering, and system observability are integrated into data management. Most important, I think, is software engineering that has transformed itself to support the creation of highly reliable software through its adoption of best practices and a readily available tool ecosystem.
Software engineers have been automating software delivery processes for quite some time, with things like “as-code” automation that originated with the DevOps movement and infrastructure-as-code (IaC) now being adopted in data management space to orchestrate tasks that are too large to be run manually. We are now seeing the application of “as-code” in areas such as “testing-as-code” to enable data teams to anticipate data issues in a much more sophisticated way. This means it’s easy for data engineers (producers) to test data quality at the point of ingestion, and data product managers (consumers) can validate data before it’s released. used.
The ability to unify around a common language or code, just as software engineers have done for years, is beneficial because it means data teams can specify what good data looks like throughout the data value chain, from ingestion to consumption, regardless of roles. , skills or subject matter expertise.
Drenik: Thank you, Maarten, for sharing your perspective on the ever-changing data management landscape. In our brave new data-driven world, collecting huge volumes of data is one thing, but being able to maintain it and use it with confidence to create new products and services is how we could see the next Spotify, Amazon and Deliveroo.
Comments are closed.