Sharing Data Better - Apples to Apples

The digital world is fueled by data sharing. For instance, Google Maps, Yelp and Nike+ Run Club don’t directly identify where you are when giving directions, finding a nearby Mexican restaurant or tracking how far you ran. Your device has a built-in system that captures your location and provides it to each app.

We’ve become so accustomed to our apps sharing data, that it appears seamless. In reality, it’s anything but.

Harkening back to the first post in this series, Data Sharing, a Recipe for Success, this whole business of data sharing started with human communication. One person tells another the important details of their life. Those details are data. Systems do the same thing by providing important details to other systems that need them.

We are born with the capacity to communicate. We use several methods of communication including speaking, gesturing, writing and even dirty looks, all of which have been in development for thousands of years. If we’ve learned nothing else, we know that speaking with a common language is critical to communication.

HOW DATA FLOWS

Although in relative infancy, communication between applications, or data sharing, is no different. Engineers have developed standard methods for apps to share data, forming virtual pipes through which data can travel. Honestly, I’m no expert in the technical aspects data sharing. For more information on this topic, I recommend checking out the Data Integration Information website which lays out the technical aspects and challenges of data sharing in a well-organized and well-described manner.

For now, suffice it to say that a virtual pipe has been created between two applications. Let the flow of data commence! Imagine the data marching into each end of the pipe, to meet in the middle, ready for a synergistic, brave new digital world. Instead, when they meet, they find that they don’t speak the same language. They raise their voices, they gesture wildly, they begin to sob watching their dream fade away.

DATA MISMATCHES

I’d like to say that it doesn’t have to be like this, that engineers can standardize the language the same way that they standardize the pipe. But, it doesn’t work that way. At the end of the day, data is a people thing, not a technology thing. People have to clearly name and define the data, thus defining the language.

Why is that? I cover this in some detail in a previous article, Data Definitions, The Problem with Lisa. In short, computers are inanimate objects, as dumb as the rocks from which they were smelted. Every decision and every relationship has to be precisely spelled out for them. So, applications require labels on data to match exactly, to be precisely as expected; Apples to Apples, not Apples to Pippins.

Unfortunately, we’re talking about human language which, on its own, is anything but precise. Add to that the fact that there are thousands of software applications, developed separately by different teams at different companies in different countries. Standardizing the language used across them all simply ain’t gonna happen.

MATCHING DATA

What to do. What to do. This is, bar none, the biggest challenge in sharing data. No two applications label their data the same way. So, as our data languishes mid-pipe in an existential slump, how do we solve this challenge?

At the end of the day, the only way to link the data from two applications is for someone to sit down with the data from both applications and, field by field, label by label, manually match each field to its mate. The document used to capture each combination is called the Source to Target Map.

CUSTOMER EXAMPLE

Let’s look at an example of customer data being piped between SAP, “the market leader in enterprise application software”, and Salesforce, “the worlds #1 Customer Relationship Management (CRM)” solution.

First, note that we’re talking about Customer data. Customer is defined by our organization as ‘a retailer that buys and sells products to individual consumers.’ In Salesforce, a Customer Number is referred to as an ‘Account Number.’ In SAP, on the other hand, it’s labeled ‘KUNNR’ (abbreviated German for Customer Number). They each have their own label, neither of which matches what we call it, but, as we learned earlier, that’s to be expected.

In this case, we would map the Account Number and KUNNR to each other; lather, rinse, and repeat for all of the other data fields representing street address, contact name and so on. Then, the engineers creating the new pipe would use that Source to Target Map to translate the data back and forth between applications.

TELEPHONE ON STEROIDS

This is cumbersome, yet simple enough to do between two applications. But, it doesn’t stop there. We know that sharing data is good, that the more you share data the more powerful the data becomes. So, once the pipe is flowing data from application A to application B, it makes sense that if application C needs the data, we’d create another pipe to application C, creating another Source to Target Map in the process.

Here’s where that becomes a problem. Say, application C gets its data from application B which gets its data from application A. This sort of “daisy-chain” scenario is common in the real world of application landscapes. Most of us have played the game Telephone, and seen how garbled a message becomes as it’s communicated from person to person. Imagine playing that game with each participant in the chain speaking a different language. Boom. That’s what happens when you create a chain of applications sharing data. As you progress, from one translation to the next, the message inevitably degrades.

CORE THE APPLE

Let’s do a simple exercise. You can do this for yourself on Google Translate. Take the phrase, “core the apple,” a simple apple pie recipe instruction. Translate it from English to Italian, then back again. It translates forward and backward cleanly. Our recipe holds up pretty well. Great.

Now start with the same phrase, translating it first to Hungarian. Next, translate it from Hungarian directly to Italian. Finally, translate it from Italian back to English. Hmmm, what started as “core the apple” winds up as “apple seed.” Our recipe just fell apart. With each new translation in the chain, it becomes more and more difficult to maintain the original intent or meaning.

English to Hungarian to Italian to English

This is the risk of sharing data from one system to the next to the next. Even if you are doing Source to Target Maps between each application, if you aren’t starting from the original labels, the understanding of that data can get lost or diluted.

GOLD STANDARD

If each copy of data passed from one system to the next is getting further and further from the original message, how can an organization successfully share data? The answer is to standardize on the data labels as defined by the organization, not what systems choose to use. Instead of translating between two applications, insist that your data always translate to and from your language as the gold standard.

Back to the original example of sharing data between Salesforce and SAP. Initially, Customer Number, our term, is translated to Account Number in Salesforce. Instead of translating Account Number directly to KUNNR, connect it to the original term name Customer Number, even if this means back-tracking from the Salesforce application’s terminology.

When you standardize on your own terms, you make sure that each application that gets the data is getting its original and intended meaning without the risk of degradation. Admittedly, this does add an extra step, but once a business glossary of labels and definitions is created, most of the work is already done and reusable.

CONCLUSION

What you need to know is that the strength of technology being used to share data between applications only takes you so far. When data is moving between applications, the labels on either end of the pipe will vary dramatically requiring an exercise to manually translate and link them together. This translation is done by people and captured in a Source to Target Map.

By starting with a well-defined set of labels for your organization’s data and using that as the gold standard by which all other translations take place, you decrease the risk of mapping incorrect links. Better yet, you ensure that your data maintains strong, consistent meaning across all applications that share it.