By Rich Grady
On September 28, 2011, self-described free-range archivist Jason Scott gave a talk at the New York Public Library on “The Rise of the Metadata Warrior.” In his talk, he called metadata “a love note to the future.” You can even get a t-shirt with that emblazoned on it! In geospatial metadata terms, 2011 was not the “rise of the metadata warrior” era, but already more like the Fonz “jumping the shark” [See Smithsonian, September 2017]. Geospatial metadata warriors had seen better days, perhaps “happier days” if you actually cared about metadata and were religious about it, pursuing the Holy Grail. For better or worse, those days are over in the geospatial community, where phrases such as “don’t duck metadata” or “mind over metadata” have long lost their popular appeal, if they ever had any such appeal in the first place, such as back 10-20 years ago (when some funding incentives were spread around by FGDC to encourage attention to metadata). To me, those slogans always sounded like “take your cod liver oil, it’s good for you.” Yes, it is; and it also tastes bad to most people.
We all know how useful good metadata is for discovering and perhaps accessing geospatial data and resources. And many of us have spent a lot of time compiling good metadata, regardless of difficult to use tools and tedious standards to learn and apply. When you combine a task that seems like a chore (i.e. metadata compilation) with tools that are not easy-to-use, and with no incentives to perform or consequences to ignore, a less than stellar result is predictable. And hounding people for metadata is often a nuisance for the request sender and annoying to the request receiver. Even as the tools may improve, it still feels like putting a new blanket on an old horse when it comes to FGDC Content Standards for Digital Geospatial Metadata (CSDGM) or the ISO 191** series of geospatial metadata standards. These standards have a place, but in a way, they are relics of a consensus-building process by religious warriors who were outpaced by technological developments and user expectations. And yes, I was once one of the faithful.
With the widespread adoption of the Web and general-purpose search engines, such as Google, expectations have changed about what it takes to find some potentially useful data, and then decide if you want to use it. Let’s face it — other than those of us in the geo-profession who may be familiar with Clearinghouse nodes and FGDC/ISO metadata vocabulary, most users (including non-geo-professionals) prefer to use common vernacular and their favorite search engine to find data they are seeking — and it might be stored in GitHub or an Open Data website, or another repository more modern than a typical GIS Clearinghouse. [See NYC GIS metadata on GitHub]
And with the growth of data science as a profession, data detective work is bigger than it ever was — gumshoes are making a comeback! They are willing to work with potentially useful data regardless of how thoroughly it might be described with detailed metadata, or not described. In fact, they are more likely to apply techniques from IT and computer science, such as data profiling, to derive metadata from the data itself. And then that metadata becomes data, and it can become its own map!
Visualization of Twitter metadata for NYC area showing use by locals (blue) vs. tourists (red):
Published jointly by Gnip & Mapbox and data artist Eric Fischer
While the “metadata of old” in our geospatial world has certainly not disappeared, it has faded in its importance, and is often poorly funded and under-staffed, as higher priorities get the resources — and often, rightfully so. If you were paying attention in GIS 101 or have been in the profession for awhile, then you already know that metadata used to describe geospatial data has a long history dating back to the 1990s (FGDC CSDGM) and even earlier (think of a map legend); but is now part of a much broader set of tools and standards for describing data in general, and how to find and use it. Some of these other standards include: Dublin Core for identifying and describing resources available on the Web; Data Catalog Vocabulary (DCAT) for the interoperability of data catalogs; Asset Description Metadata Schema (ADMS) for assets within a data catalog rather than the catalog (ADMS is a profile of DCAT); and Google’s Dataset Publishing Language (DSPL). This is not an exhaustive list, but is intended to illustrate the point that there are other metadata standards besides those that pertain specifically to geospatial data, and that some level of convergence is occurring. This is particularly true when considering the trend toward open data and geospatial data being organizationally related in Departments of Information Technology, within both State and Local governments. [See “Analyze Boston”]
In closing, I think metadata is indeed “a love note to the future,” but it will be more appreciated by both the sender and the receiver if it is a brief narrative that conveys the provenance of the data and why it was created, and a little bit about how — the intentions of the heart! For most purposes, detailed formatted metadata is “jumping the shark” — the leather coat is still cool, but the rest of it is waxing nostalgic.