That’s So Meta! Make Sense of Metadata
What is metadata? It’s generally described as “data about data” or “data that describes other data.” When I hear these explanations, I immediately think of Russian nesting dolls; I imagine looking into two opposing mirrors, seeing myself repeated in smaller and smaller reflections into some obscure infinity. In reality, metadata isn’t all that hard to understand. Let’s break it down and make sense of metadata.
META
So, starting with “meta.” What does “meta” mean? It’s something that is self-referencing. It’s like a guy who always talks about himself in the third person, i.e., “Dan loves overripe bananas, but Dan can’t stand the smell of green bananas,” said Dan (fig. 1).
The word “about” is the key to knowing that something is meta. If Dan says, “Dan ate the banana,” that isn’t meta. If Dan says, “Let Dan tell you about the banana he ate,” he’s going to describe the banana. He’s moving into meta.
DATA
What is “data”? That’s actually a pretty big question. I answered it, to the best of my ability, in “What is Data? Data is a Black Bean.” Check it out if you want more detail. Basically, data = nouns; people, places, things and ideas. In other words, data is anything that can be counted or analyzed.
METADATA
Essentially, metadata characterizes other data. It paints a picture. We learned something about Dan and his bananas (whether we cared to or not). In addition, we’ve learned other things about Dan through simple observation. In this case, we learned that he’s a fast-talking, raccoon in a teal cardigan. This is all metadata.
THE RIGHT BALANCE
Though quality metadata creates clarity, you can get too much of a good thing. For instance, how much do you need to know about Dan? If you’re his feeder, then his food preferences are the extent of your scope. Perhaps you’re interviewing Dan for a job? Then it’s just an awkward piece of metadata you can discard. It isn’t just the amount of metadata that’s important. You want to get the right metadata for your situation.
TYPES OF METADATA
There’s a lot of metadata, and it’s worthwhile to categorize it. I’ll focus on three types that best describe metadata in general. They are descriptive, administrative and structural.
Administrative Metadata
Administrative Metadata is what it sounds like. Generally, it covers things required to manage data properly. This includes who’s responsible for the data (i.e., who defines it and creates it?), when the was data created, and the type of data it is (i.e., is it text, a number, or a dollar amount?).
Something that’s unique to this type of metadata is that it tends to be applicable across all data. “Last Update Date” is a common example. It identifies when the data was stored.
For instance, maybe you want to know the last time Dan’s food preference was captured. Was it yesterday or ten years ago? “Last Update Date” provides that information. At the end of the day, the freshness of data is relevant regardless of whether it’s Dan’s food preference or the cost of tea in China.
Descriptive Metadata
Descriptive metadata describes the contents and context of data. A recent post, “Data Definitions: The Trouble with Lisa,” goes in-depth on what makes a strong definition. Just like a dictionary definition, this provides a written explanation of your data and why it’s important. This is critical to the quality of data.
Another thing, language is fluid. It naturally bends to the needs of a culture, large or small. Take the word “pie”. In a bakery it’s one thing. In a pizzeria, it’s another. What if Dan is ordering from a pizzeria that offers dessert? If the two different pies on the menu are poorly defined, he could be confused. Identify your data with a strong definition to make sure that others understand your language.
Reference Data is important enough to have a post of its own. However, a good rule of thumb is that if data = nouns, then Reference Data = adjectives. Think menus, on paper or online. A menu is a standardized and predefined list that can make what you are describing clearer. Reference Data is nothing new. Traditionally, you find menus in restaurants and catalogs. Now they show up online as dropdown lists and multi-select lists (fig. 2).
This is data that anyone who has ordered food from a website has worked with. We want to limit Dan’s pizza order to ingredients like mushrooms and pepperoni, instead of the rotten bananas, burnt hair and chutney preferred by raccoons.
Structural Metadata
A Software Architect once asked me “why are all of the ‘data dudes’ women?” At that time our team was me and one other woman. My response? “It’s all about relationships.” I admit it’s a lame stereotype, but it spoke to him. Where am I going with this? Structural Metadata represents the relationships between one piece of data and another.
Things are connected in the real world. Pizza dough is dependent on flour and eggs. Mushrooms are dependent on damp conditions. Pepperoni is dependent on the pig. Dan’s order is dependent on the pizzeria owner remembering to order these ingredients. Likewise, data is connected. Structural Metadata explicitly represents these dependencies.
META-METADATA
Speaking of dependencies, the simplistic “data about data” definition infers something about the relationship between data and metadata that can be important. Data can have metadata and data can be metadata. Pizza dough is made of flour and eggs. Likewise, flour is made from wheat or various other grains. Pizza dough is also an ingredient in pizza. Anything that exists is data. Any data that can be an ingredient in something else, can be metadata.
This may seem a little brain-bending. It doesn’t have to be. This level of layered data and layered dependencies is important at times. And, in your own familiar context, it will be as natural as the pizzeria owner knowing that he needs to order flour to make the pizza to make the dough.
CONCLUSION
When it comes to understanding and working with data, metadata is the whole story. Explaining what it is, who created it and why, how it should be used, and when it was created or changed, metadata brings data to life.
In conclusion, while it’s true that metadata is “data about data,” it’s clearly more than a repeating mirror image of itself. Metadata is what brings meaning and richness to data and what makes it useful at all.