Data Definition: The Problem with Lisa
Last year, my husband and I were browsing quaint, little shops at the Oregon coast. He suggested that I get a gift for Lisa; It was my friend, Lisa White’s birthday the following week. I started looking at leather cuffs and beaded jewelry. He kept steering me toward bars of soap, votive candles and other tchotchkes. I was going for personal and unique. He was pushing for generic and inexpensive. When I finally lost it and suggested through bared teeth that he should maybe, “shop for his best friend instead of mine”, he started laughing. Apparently, he’d meant a different Lisa, Lisa Green, the neighbor who was watching our cat for the weekend.
Not every communication breakdown ends in laughs and salt water taffy. The high divorce rate alone proves the breadth and impact of verbal disconnects. We’re imperfect communicators with an imperfect language.
Data is fundamentally tied to language. In the same way people use language to communicate, computers use data. The difference is that where people can navigate communication breakdowns, software can’t.
HOW COMPUTERS THINK
Let’s take a walk through the brain of a software app to understand why this is. There are thousands of software languages to choose from, but they are all trying to simulate the same thing – thinking. Their job is to make the choices necessary, and in the correct order, to complete a set of tasks. The basis of all software thinking boils down to one very simple statement:
if ‘x’, then ‘y’, otherwise ‘z’
It’s as powerful as it is simple. “If you’re hungry, then eat a banana, otherwise wait for dinner”, “If you ate the banana, then throw away the peel”, and so on. You can expand the basic statement with “and” and “or”. You can send it off on tangents, then tie them all back together in a final, grand result. It’s a building block that can be used for everything, from adding ‘2+2’ to determining the odds of an economic recession, depending on the skill and imagination of the team wielding it.
In this super-simplified code, x, y and z are the inputs and outputs, the data. How would the conversation about “Lisa’s” gift transpire in app-speak? Perhaps something like this (fig. 1):
IF you are shopping for LISA, THEN the price range is… TILT!
The app would fall over because it doesn’t have a clear understanding of who Lisa is. We know there are two different Lisa’s, but we haven’t made that clear; we haven’t provided a definition of Lisa. Software engineers spend an inordinate amount of time writing code to either prevent or manage errors, many of which are caused by data that is misused because it lacks clear definition.
All of this is to say, that applications are stuck with the data we give them, so we need to have high confidence that what we give them is both accurate and precise. A strong data definition is key to that confidence. I mean, which Lisa are we talking about anyway?
…reusabilty and connectedness are data’s superpower; ambiguity, on the other hand, is data’s kryptonite.
DEFINING DATA
In a previous post, Data is a Black Bean, I talked about how a data point is nothing more than a noun, a thing, a black bean for example. I also wrote about how a data point such as a black bean can plug into and be used in any number of scenarios. This reusability, and connectedness is data’s superpower; ambiguity, on the other hand, is data’s kryptonite.
Okay, so a strong definition is the solution to ambiguity, but what does that look like? There is a broader view of data definition which is actually pretty big, and has a scary name to prove it, “metadata”. We’ll break that down over time, and several posts. Here, I’m going to focus on the narrower view of the term “data definition” which has to do with a text description.
Anatomy of a Data Definition
There are two key elements of a data definition, is-ness and about-ness. Think of is-ness as the overly serious and buttoned-down sibling while about-ness is the fun, flamboyant, but sometimes flaky one. You may want both of them as friends but for different reasons. As data definitions go, it’s important to consider both. It’s also important to understand how to know when a definition has hit the right balance of each.
Is-ness
Is-ness describes a thing’s essence, its DNA. Going back to the black bean, is-ness would be its innate characteristics such as its nutritional values, which country it originated from, and what plant family it comes from (fig. 2). The black bean is a legume that originated in the Americas and is a good source of protein, check, check and check.
These are the facts of the black bean; they are objective and rarely raise controversy. You don’t hear reasonable people arguing about whether or not a black bean is edible. Is-ness is what you will most likely find in the dictionary; it will be core to any data definition and provide an accurate picture of your data. However, lacking the richness provided by about-ness, you will struggle to provide the precision that makes your data truly powerful.
About-ness
Imagine for a moment that you are a kindergarten teacher, and that I am picking up some groceries for you. You’ve asked for black beans and I get you a 16 oz. can of Mexico’s finest negro frijoles. I didn’t know that you wanted black beans for an art project with your students, a use that requires dried black beans. I didn’t have the context for the black beans. The outcome is that one of us is making another trip to the grocery store.
About-ness describes a thing in context. it can be broad or narrow, shallow or deep. These characteristics are situational and subjective; they are colored by perspective, highly variable and potentially controversial. It’s far more likely that you will hear people arguing over whether black beans or pinto beans are better in a burrito.
While the vast opportunities of how data’s about-ness can be defined may be boundless, the reality is that you only really need to care about your immediate context. What is its purpose in your world? How does it serve you? For a black bean, is it an ingredient in your burrito or an art supply waiting to get stuck up a 5-year old’s nose?
The Sweet Spot
Accuracy and precision are very important, but there is a balance to be struck. If you need to create a data definition, you want to find the sweet spot of both is-ness and about-ness in your definition. Why? First of all, you could really go deep in both areas, and come up with an encyclopedia-length definition that no one is going to take the time to read, so not useful.
The second reason is that the more detailed you get with your definition, the narrower and less flexible it becomes. If you specify one thing in your definition, you may inadvertently exclude another. A really good, current example of this is gender. Stating that gender is the equivalent of “male” and “female” immediately excludes gay, transgender or other groups of people who experience their lives somewhere on a spectrum outside of those values.
CONCLUSION
It’s not uncommon, even today, for data to be pulled into applications and used based on assumptions that everyone is on the same page about what it is. I have seen robust apps brought to their knees by data that had unexpected or unconsidered characteristics. Having a strong, simple, documented definition for everyone to key off, will go a long way toward minimizing issues wherever the data is used.
One thought on “Data Definition: The Problem with Lisa”
Comments are closed.
I really enjoy your writing, so accessible. I wish I had an application for the knowledge!