- Alice Parker
Popular (ok, tech) culture likes to depict Data Scientists as the gods (and goddesses) of AI. Data Analysts have super-human abilities to extract important business insights. And Data Engineers can magic datasets from one part of the organisation to another (spaghetti pipelines are just fairy dust, right?).
Except, wait, these technically capable people are still people. They are still humans interacting with computers (rather large computer systems for that matter). They are still limited by what is humanly possible – their memory and cognitive load are nothing compared to the storage and CPU capabilities of the computers they work with.
Neither is our biology going to change at the same exponential rate as technology. Moore's Law just doesn't have jurisdiction over the human brain, unfortunately.
Designing for a usable data architecture is little talked about
We have Human Factor teams that design for usability of complex systems such as nuclear power plants and train control rooms. Yet designing for a usable data architecture is little talked about. How often would you say that user requirements of data architecture are given the same priority as business and technical requirements?
'Minimising the risk and the undesirable consequences of use errors'
Usability has its own ISO standard. It's about 'enabling users to achieve their goals effectively, efficiently and with satisfaction.' But also it is about 'minimising the risk and the undesirable consequences of use errors'. By improving the usability of the systems and the data we work with, we can improve the quality and accuracy of the results we deliver.
Designing for usability in data architecture has multiple aspects. We can design usable data products for downstream consumers to use, and we can design user-friendly self-serve data platforms for data producers and data consumers to interact with. The first is for data producers to take action on, but the latter is for the data platform and by doing so can ensure that the data producers also are enabled to design more usable data products.
The tl;dr : understand your data (product and platform) users, their limitations and their aims, by opening a dialogue with them.
Data Product Usability: Know, Understand, Work, Trust
By speaking with people in DNB who work with data (data engineers to data analysts to senior managers), we found a trend in how people consume data. This is to say, they themselves are not the owners or producers of the data, but have received it from elsewhere. Data users need to be able to know, understand, work and trust their data in order to interact with it.
In order to be able to interact with data, one must know that the data exists. How can you interact with data if you do not know it exists? How can we help our downstream users be aware of data that may potentially be useful for them and how can we intentionally expose relevant data in a safe and secure manner? Now that we have established that data consumers have challenges in discovering data, we can create a task-force to explore how in which we can optimise discoverability and how in which one discovers data.
There is a difference between reading instructions on a screen about how to ride a bike and actually being shown how to ride the bike. When we provide documentation about the data for other teams to use, how can we improve the transfer of knowledge about the data? Understanding the data comes at different levels of granularity, the domain as a whole, the schema or perhaps column name.
What format is the data in and how easy is it to get the data into a format that allows the user to interact with the data using the tools that they are experienced with? Kafka streams are great for some users, but an export to excel button is necessary for another. Keeping data locked in silos that have an unnecessarily high learning curve to be able to access the data means the data remains out of reach for those who may need it.
Trust is the forgotten element, but the most important when wanting to deliver reliable and accurate results. How can data producers reassure data consumers that the data they provide is trustworthy? From discussions with data consumers at DNB we found that by delivering on the three points above, we can already enhance the trust in the data they use before we even get to discuss the quality of the data.
An empathetic data architecture
By opening up discussions with data consumers on how they experience data interaction in terms of knowing, understanding, working and trusting data, data producers are able to make steps towards designing data products that their data consumers can use. Research from the likes of IBM and Microsoft show that data users need to have domain expertise, time and conversation with their data in order to interact with it efficiently.
This research was also conducted by using traditional UX methods, the same that we now apply. These human factors of data interaction provide the boundaries in which we can design data products. In a domain orientated data architecture, we need to be able to facilitate the transferral of knowledge for data consumers to assume the domain expertise that data producers have about their domain.
Data producers are able to facilitate the conversation with data that data consumers, for example by providing sample SQL queries.
Data Platform Usability
Now that we have dumped all these demands onto the data producers to provide better, more usable, higher quality, faster, richer data, we perhaps ought to think about their own cognitive load and use of the data platform that allows them to create and share data products.
DNB has a self-serve data platform that serves hundreds of data scientists across the bank and we're working on designing a platform that promotes reliable and effortless self-serve use for both the data producers and data consumers. This means that not only does the platform need to be designed for discoverability, understandability and the rest of Zhamack Dehghani's data product usability characteristics, but it needs to be designed in a way that makes it easy for those making the data products. Easy to discover, easy to become discoverable. Easy to understand, easy to make explainable.
Working to make our platform product more usable for both the data producers and data consumers
Producers of data are often the software teams of source applications, or perhaps the human team has come and gone and the source application lives on, spewing data downstream to frustrated data consumers. A platform that provides usable data products to data consumers will only be populated with data products if it usable to create and deploy the data products to that platform. Being considerate of the specific team, their skillsets, their limitations and contexts will ensure the likelihood of platform use.
Going forward, you'll find us in the data platform team working to make our platform product more usable for both the data producers and data consumers, and also enabling our data producers to create usable data products for their data consumers. Like this, we'll truly create an empathetic data architecture for our amazing data scientists, data engineers and data analysts.