This image from the NASA Visible Earth collection shows the confluence of the Mississippi and Ohio Rivers at Cairo, Illinois.  It’s not a satellite image, but a handheld camera photo taken by an astronaut aboard the International Space Station on January 12, 2006.  What I find especially striking is how it reveals the two rivers coming together, but as distinctly separate flows.  The flows are separate and yet combined.  They’re in the same channel, flowing at the same speed.  They’re able to carry suspended sediment and floating commercial cargo with equal ease.  To a barge pilot, the only difference is the color change from one side to the other.  Otherwise, it’s all just the same river highway flowing south.  Notice that the visible boundary separating the two flows is starting to blur at the bottom of the image.  The blurring and blending continue downstream until eventually there is no longer any separation.  

This image nicely illustrates a situation I see occurring in the realm of geospatial data.  I’m referring to the blending of government data with private-sector or commercial data.  For those of you working in the government sector, that may sound like heresy.  I will admit that I observed plenty of that thinking in my 31 years working for the State of New York, and might possibly have owned some, too.  There’s a natural tendency to think that government data, by default, is the best data for government operations and policy-making.  There are certainly some points in favor of that position, but I’m now convinced that it’s not a viable position in the long term.  In my estimation, the future will increasingly involve a blend of government and commercial data.  

What is GIS data blending?

Let’s take a deeper look at the concept of data blending between the government and private sectors.  When I say private-sector or commercial data, I am referring to data offered for sale in the marketplace, typically through a licensing agreement where ownership of the data stays with the commercial firm and terms of use are granted to the buyer.  Let’s also dispel the notion that a contractor performing “work for hire” to collect data for a government agency, such as through a Request For Proposals procurement, is producing any form of blended data.  In such situations, the contractor is simply extending the capabilities of the government agency and the result is contractor-produced government data belonging to the agency just as if it was produced by agency staff.  

I see at least four different models of blended data and there are surely more:

1. Commercial data to fulfill requirements unmet with government data

The simplest blend is where commercial data is introduced to meet requirements where suitable government data is not available.  When I was Geographic Information Officer for the State of New York, we used a licensed data source for geocoded business listings to support homeland security requirements.  In that instance, state law prohibited the Labor Department from releasing their employment data containing similar information, so a commercial data source made good sense for us.  

2. Commercial data as a replacement for government data

The next type of blended data is where data from the private sector is a direct substitute or replacement for government data.  This implies that a government agency can make a choice that weighs cost, resource requirements, quality, timeliness, and other factors.  There are growing instances where a logical choice is for commercial data.  In these cases, the commercial data blends into the various government workflows alongside government data.  An example is licensed aerial imagery as a replacement for custom contracting for government-owned imagery.  A growing number of states have already switched to licensed imagery.  

3. Commercial data to augment or trigger a government data update

The third form of blending is a bit more complex.  It might involve the use of commercial data as a source to augment the creation or update of government data.  Commercial data could trigger a review to update an incorrect data record or to insert an updated record.  In this case, the commercial data may not actually appear in the resulting government data, even though its use might be instrumental in the process.  This might be relevant for situations where the government imprint of authoritative data is important.  An example might be using a licensed dataset of mobile phone movement data to detect a likely new address that needs to be added to a 9-1-1 database.  

4. Commercial data operating as a service for a government function

A fourth model involves a role reversal where a government agency might supply their data to a private-sector firm for them to contribute value-added services.  This could be transaction processing where the firm uses its own data in delivering the value-added services, without ever granting direct access to their data to the government agency.  For example, the firm could combine government data with their proprietary data to verify or complete a transaction for the agency.  Another flavor of this could be monitoring data feeds from the Internet of Things and applying analytics to notify the agency of particular events or needs.  I’m thinking of this as a blended workflow, to distinguish it from blended data described in the other models.  Road Weather Information Systems are an example, using pavement sensors to notify transportation agencies of icing and other hazards.  

Other IT trends are accelerating data blends

Cutting across all of these models is the trend of data being bundled and delivered as part of a service.  This is increasingly prevalent with application programming interfaces (APIs) that combine software and computing power with data to seamlessly perform a specific function, such as geocoding a street address.  Data is often an invisible part of the function that occurs when a software application makes a request to the API.  There are already examples of commercial data embedded in services that align with each of the models I described above.  Licensed aerial imagery, for example, is often delivered via a cloud-hosted streaming service.  

Another trend is for software licenses to include commercial data sources bundled in as part of the deal.  Esri includes many commercial data sources in the embedded base maps and other datasets that are included with software licenses.  This is now a common practice in the software industry.  These “included at no extra cost” commercial datasets have quietly introduced blended data in countless government agencies.  I suspect that in many cases the resulting shift to blended data was made without a thoughtful or deliberate decision.  

Government geodata in the transportation sector

I’d like to illustrate how this data blending trend is unfolding by looking at what’s been happening in the transportation sector over the span of my career.  I started at the New York State Department of Transportation in 1984 and worked there until 2000.  We had lots of data at the DOT and a big slice of the organization was dedicated to the collection and management of that data to support its use for planning, capital program development, federal reporting, maintenance, and operations.  

We collected data on everything you can think of for the state highway system — physical characteristics of the roadways, pavement condition, bridge inventory, bridge condition, traffic volumes, functional classification of each road segment, motor vehicle accidents, maintenance history, sign inventories, weight and height restrictions for oversize loads, photolog with roadway photos every 1/100th mile, and vast amounts of operations management including pothole repair, roadkill removal, herbicide application, snow and ice removal, mowing, guide rail replacement, sign maintenance, drainage ditch cleaning, pavement striping, and much more.  And that’s just for the state highway system.  We also managed railroads, transit systems, airports, and the state canal system, each with its own data needs.  

That was, and is, a lot of data for a single state agency, and all of it was collected and managed in-house.  My early years involved building GIS base data and figuring out how to link the DOT data to the map.  In the process, we introduced new efficiencies and new benefits from the magic of geospatial.  It was an exciting time for an ambitious young GIS-er, and it was also, in those days, a closed system.  The DOT was in control of every part of the data lifecycle.  There was no thought of blending government data with commercial data.  

Non-government GIS data sources for transportation are proliferating

Fast forward to today.  Much of the closed system I just described now has active counterparts in the private sector.  Very active counterparts.  Competing consortiums of car companies are racing to collect huge volumes of high-resolution roadway data for autonomous vehicles.  There’s been a parallel race among the big Internet companies to offer richly detailed mapping applications on their platforms.  A proliferation of data collection platforms are gathering spatial data including satellites in low earth orbit, inexpensive drones with hi-res cameras, mobile devices connected to the Internet, and specialized data collection vehicles laden with cameras, GPS, and LiDAR.  Machine learning and artificial intelligence are mining this trove of raw data to produce detailed mapping databases.

Beyond that, many of the companies involved in transportation logistics (trucking, parcel delivery, ride-hailing, and so on) track the movement of their vehicles and freight, amassing large volumes of data to optimize their operations.  Other firms use sensors along roadways to read license plates and sell the data to law enforcement agencies.  The list goes on.  It won’t be long before virtually every movement and interaction in our public transportation space will be sensed, measured, and stored in a database.  How much of this private-sector data replicates data historically collected by the DOTs?  How much of it might be better, more recent, higher-resolution, or more innovative?  

Some of this data is already available through your favorite online mapping app or through APIs.  These enable you to determine the fastest driving route and to see upcoming road signs on your smartphone screen before they appear in your windshield, they let you know where road construction is occurring, and much more.  But that’s just the tip of the iceberg.  The private sector is creating a highly detailed digital twin of the roadways to enable fully autonomous vehicles in the not-too-distant future.  And not just one digital twin, but many of them, each closely guarded as a proprietary asset.  Some of this data will certainly be available through licensing models.  Some of it already is.  

How do government and commercial data blend for transportation?

Our public transportation spaces are now among the most data-rich environments in the world.  What a shift from the early days of my career!  Have we reached the point where it no longer makes sense for the DOTs to continue with all of their own data collections?  Instead, couldn’t they be blending data acquired from the vast trove of commercial data with their own data?  DOTs will continue to collect and use specialized data for many things, but I expect that the ready availability of roadway data may reduce or replace some current inventories.  The factors driving this shift will include cost savings, higher temporal and spatial resolution, higher accuracies of sensor-based collections over legacy field data collection methods, advances in machine learning to harvest value from the data, and a growing awareness of wasteful duplication if DOTs continue to collect data that is already available.  

Other sectors are ripe for transition to blended government and non-government data

I singled out the transportation sector to describe the trend I’m seeing towards blended data, but similar analyses could be drawn for many other sectors.  For example, we see lots of work taking place around the country to actively manage statewide street address databases, particularly for Next-Generation 9-1-1 systems.  I can envision services involving licensed data gathered from movement traces of mobile phones, package delivery services, automated imagery change detection, and other sources serving as early triggers for the existence of street addresses missing from a government address database.  These are valuable data sources beyond those normally available to government agencies to maintain address data.  Couldn’t we blend their use into authoritative government workflows? 

Come on in, the water’s fine!

Returning to the NASA image of the confluence of the Mississippi and Ohio Rivers, let’s envision data flows.  The confluence of data from the government and private sectors is already starting to happen.  We’re still in the early stages where the flows are largely parallel and distinct in the same channel.  As we move downstream, the flows will continue to blend and mix.  A barge pilot on the river doesn’t care that the flows beneath the barge came from different sources.  Likewise, most users of our GIS capabilities care mainly that the services work well and deliver accurate results.  

That’s going to be the future for geospatial data in the government sector — a data river of ever more blending.  I find that prospect energizing and exciting, unlike my earlier self that might have considered it heresy.  

A future blog will explore some related issues that arise with data blendings, such as authoritative data, liability, metadata, and governance. 

Bill Johnson, Carpe Geo Evangelist, AppGeo