Enterprise Knowledge Graph in DNB

Thoughts about the life and place of data in our digital journey.

By
  • Einar Clementz
Nov. 15 20229 min. read time
vestbygget.jpg

DNB: One of the three buildings DNB use in the Barcode district.

With meaning in data defined and the meaning/semantics following data, data integration is simplified. And more.

Our organisation has a long history with data in digital stores. We have some 60 years of history.

We have large number of operational customer facing and supplier facing processes with supporting IT solutions. We have financial accounting to sum up operational event. We have analytics needs encompassing both financial, risk and other data to support reporting. We have reporting in large variations, internal and required external. How is data organized, mostly in a relation world paradigm, in specialized local data stores with local data models. Nothing changes with the same pace.

Sounds familiar? We think so. And data integration is the pain at the back of your head. How have we approached these scenarios over time? Each decade has its architecture paradigm, those were good at the time, sometime also data organisation is touched upon. Just to name a few words, 4GL, ESB with local / global messages, SOA, messages with MQ later Kafka, big enterprise models, XML, APIs, JSON. The result is the wide set of solution landscapes. Part of every contemporary “xyz” architect’s pain.

Our organisation is part of this journey, more solutions, more local specialized data models, more silos, more integration in need. Most things in our world are organized with network paradigm, multi-faceted taxonomy (networked hierarchy). Like language; people, organisations, computers, countries. Why not in data? Early on network databases was used with success also, relational way took precedence.

Can we go beyond the integration paradigm in a relation table data world? Can we take a step forward to data viewed as networks? Another word for network is – graph. A child does not start to draw a table to explain its understanding of life, it starts to draw some graphical representation. Parents are proud of its child’s first drawings, most likely visual object, and lines to connect – a graph. A natural way for our brains.

Our organisation is on a journey with many facets to make financial accounting, reporting, data science and analytics a better place. Better for people, better for reporting, better for data, better for quality. Here we take on many contemporary given best practices. We go to cloud with its infrastructure flexibility, we go agile or lean as work practices, we use many external services, we go with many new tools with visual UI´s to help us forward. We look detailed into approaches of data mesh, domain orienting and data products, we should add data contracts concepts also. We have most things on some change path.

Regulatory and statutory reporting is a chapter in itself. Bank and Finance industry is regulated with many aspects of the regulations. That makes a good number of reports in many facets. Often with partly the same base facts combined different data sets with various aggregations for each report. Many regulators make data models, data silos here also. Large efforts and cooperations over years by the regulators helps. But new parts are added by each turbulence in world finance. EBA with Data Point Model (DPM) is one key regulator area. The use case we have in working is part of this space. The DPM has large number of data point and a good set of financial dimensions, the resulting cubes are complex and many.

The meaning in data

Did you make link to the concepts the attributes have relationship to ten/twenty years ago? Do you do that today? In banking account balance is not one unified definition, many contexts information is needed to have an unambiguous definition. Account balance is a concept, not the specific for your use case. Your use case should link to the concept term and your contexts. With such context information as part of your data, in a definition set that both humans and machine can read, helps a lot to clarify what you talk about, and others can get to the contexts for their understanding.

We work to have better way to express the concise knowledge in data, its meaning. We investigate using the semantic linked data approach, the semantic web.

The standard for semantic linked data had it first round about 25 years ago. Tim Berners-Lee, James Hendler and Ora Lassila wrote a key article in Scientific American in May 2001, “The Semantic Web”. With Google blog posting in May 2012 the term knowledge graph. The info box to the right in Google searches is a knowledge graph. Many best practices are there, and products are available and mature.

Key part of semantic approach is standards. World Wide Web Consortium, W3C, headed by Tim Berners-Lee, is the main standard body. Here are standards for data (RDF), for models (RDFS), for constraints (OWL), for query (SPARQL). It is built on the web technology frame. With its flexibility in linking stuff and distributed nature.

With the standards are a large set of ready base models for many areas. We mention a few: DCAT (Data Catalogue, think cataloguing and sharing data), FOAF (Friends Of A Friend, think Facebook and LinkedIn), RDF-cube (statistics and finance data build with the SDMAX definitions). Many business areas have their domain specific models, in bank and finance one central is FIBO (Financial Industry Business Ontology).

The drawing below shows the capacity in EKG model and query with FIBO use domain area. It bridges details (data) and terms (concepts). Data is mapped from each system of records into a core/ common ontology. It makes changes in form less important. Transformations uses semantic queries (SPARQL) to assemble information for the receiving side. The drawing is from one of Elisa Kendall’s presentations about FIBO. Elisa is one of the central drivers in the many years long story of FIBO.

Bilde_crop_2.png

Constraints can be simple, few or many and advanced, you can make much of the business rules in the model with good use of constraints. Better definitions and less code as a result. Think about joins in SQLs, each join is a relation. Make joins is a model defined relation the semantic graph. Your semantic query where clause will be simpler. Complex queries gain more.

Data is named Instances in the semantic graph terminology. Ontology plus Instances is a Knowledge Graph. How can we make the meaning in data be part of our data ecosystem? We need definitions in some organized form. The semantic form is a good form. How can we connect data and semantics? We need to tag and link data to give it meaning, the concepts and semantic terms.

Can we make the meaning part of the data packages that flows? Yes. And it is a straightforward way to do it. We use JSON as data formatting. There as a standard named JSON-LD, LD for Linked Data. The Context in a JSON-LD defines connections between the data in payload object to the ontology that gives meaning to the data. Such things can easily be added to the data engineers working environments. And data engineers work with knowledge creations.

A few key capabilities in the EKG approach

It is built on the base data and model facts organisation with; subject – predicate – object, that is the smallest build block. Each part is given a global URI as key. Each part can be queried in any combination. When stored in a graph store, a triple data base, both data and model is stored together as a list of triples. EKG can be a virtual knowledge graph (VKG). This gives an easy approach to have EKG as (virtual) connection layer over any siloed data base.

The drawing below show we have added data, named a few classes and point the key data for this use case segmentation values.

Bilde0_crop2.png

The drawing below shows classes or concepts with the relationships that tells its purpose. The report we work on has the sector code as grouping and a set of financial dimensions given. The one coloured part on the right hand side show the bank’s data components. It is when data is on the general ledger space. The left hand side show the data point model data components. A data point cell will have an amount filtered by grouping and dimensions special for each cell.

Bilde1_3.png

What we do now

Enterprise Knowledge Graph (EKG) and its possibilities is on our organisation mind. To better understand and build experiences we take a small part of a value chain and put EKG practices and tools to the test. We have key resources from our financial accounting area and from the information modelling and technical areas.

We select a few data in the financial reporting space. From our normalisation of the data to the data content for a given report. In the area we have many long data travel routes, and they can vary over various data providers. Can we make this a better place for data and for people handling it? Items we have in the working:

• Let the tool read the source data and make a graph and model or it. This is our IFSR9 data on a detailed level.

• Let the graph setup find data connections.

• Building semantic models.

• Connecting our data to a financial domain semantic concept model. We use a small part of the FIBO model in the PoC.

• Connecting a small part of a larger regulatory reporting to the same financial domain semantic concept model. It is one part of one report in the EBA FINREP, financial assets at amortised cost. FINREP and COREP reports are based DPM.

• We use classification methods in the modelling to make data segmentation flexible.

• Using the knowledge graph query language. With the concept terms take data on the transform steps.

• Transform our finance data in question from our internal data organisation to the external data organisation.

• Make visual dashboard and business friendly path to data.

• Using the graph approach for data catalogue and the rich linking a data catalogue needs. A natural network/graph approach. And intrinsic part of semantic graph solutions.

We use tooling from a vendor with comprehensive functional capabilities, not taking many small tools and have the need to stich them together ourselves, we are not in the tool integration business. We have strong knowledge support from a focused knowledge house in Knowledge Graph community. We can do all parts with one tool vendor.

The work is done in a proof-of-concept scope and manner. We have a set of key objectives to measure against. Wider, shorter, flexible path for data integrations and better focused business terms definitions are part of objectives. The PoC work group have the direction to show this. Part of the working is to have show & tell each week in the period, keeping the wider community informed.

We look into the path to have EKG as a virtual data integrator and important rule provider for data transformations. Helping the relational tooling with the EKG’s flexible and dynamic way to take data across without the details of the actual storage model, using concept terms in defining result set and pointing to source sets. The EKG graph model helps with this.

What we expect

Have a flexible, efficient, and clearly shorter path from our internal organized data to the data organisation at the reporting layer. We can have fewer internal data layers and take wider data sources into the data concentration point for the reporting layer. Included in this is the practices to add data and model data in a build as you go modelling practice. And the result blend nicely into what you did the previous round. Not having the need to revamp your current data to fit the new data into its organisations. Enterprise Knowledge Graph have this capability build into it. The structure encourages build models, queries and transformations stepwise. It is build-as-you-go, and by that pay-as-you-go. It is a win-win for both business needs and IT delivery flexibility and capacity.

In conclusion

Making data available in graphs, networks, gives us flexible delivery of many data sets/products both inbound source-near where our domain is consumers for domain internal use, and for outbound next consumer-near. It is the bread and butter of our daily work. Graph level up the flexibility for data organisation. What we do here is part of how we move from a world where application is in the centre to a world where data is in the centre. When you cook a good meal, the food is the key part, not the knife or stove you use, you can easily change the knife and stove, it is food centric. Data Centric world.

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of DNB.

© DNB

To dnb.no

Informasjonskapsler

DNB samler inn og analyserer data om din brukeratferd på våre nettsider.