How Sensor-Generated Data Enhances Your Data Warehouse's Value
We are all familiar with typical data warehouse subject areas such as products, customers, employees, vendors, sales, and financial items, but many of us will soon be involved in integrating other subject areas that result from collecting data from the "Internet of things" or IoT. Although RFID (radio frequency identification ) tags for supply chain tracking may have been one of the earliest IoT sources, real-time sensors in smart buildings and homes, phones and tablets, employee badges, security cameras, watches and personal fitness monitors, automobiles, appliances, and even smart clothing will certainly generate vast amounts of additional information. This will include a wide variety of data such as temperatures, physical locations, geocodes, call details, weather conditions, medical biosensor readings, automobile and driver operational data, EasyPass toll charges, airline flight data, and equipment status.
The addition of all this new data will likely eclipse today's "big data lakes" (or, if uncontrolled, they might be better called big data "swamps" or "dumps") in both size and management complexity. Fortunately, this new data will increase the value of our data warehouses and serve to enhance their overall analytic capabilities.
For example, we will be able to incorporate new variables into our analysis and predictive analytic techniques to determine how specific weather conditions, physical activity, traffic conditions and driver habits, and currency and trade fluctuations might respectively affect consumer behavior, medical conditions and athletic performance, insurance and warranty claims, or stock market behavior.
This will go far beyond obvious hypotheses such as "umbrella sales increase when it starts to rain" to yield new insights that are far less obvious and of far greater value. For example, although data captured by personal fitness devices can lead to personalized exercise and diet regiments, this data might also be mined to produce insights into factors that lead to a variety of health problems such as high blood pressure, diabetes, or even the likelihood of future cognitive disorders.
Integration of sensor data will face the same data quality and consistency issues that traditional data integration projects are subject to. For example, temperature data from multiple sensors should use (or be transformed into) the same unit of measure. If some sensors are measuring in Centigrade when other sensors are measuring Fahrenheit degrees (and still others are measuring on the Kelvin scale), the results of any analysis will be worthless unless the readings are converted to the same scale.
The vast amounts of data generated by the Internet of things will raise several storage issues, including how to physically store it, where to store it (e.g., in-house or in the cloud), and how long to retain it. Although the answers to these questions will likely depend on how the organization collecting the data plans to use it, compliance concerns may require that some data be retained (perhaps in archival storage) long after it is of any value for analyics.
Sensor devices may be subject to hacking, and personal privacy will definitely be an issue. From a data warehousing perspective, we need to consider the security impact of direct feeds of sensor data into our data warehouses. Each entry point could potentially represent a new security vulnerability. Security breaches harm consumers whose identities have been compromised and harm the organizations that collected the compromised data. Just ask Target, Home Depot, or Anthem about the negative consequences of security breaches and consider what the impact would be on your own organization if your data warehouses were compromised.
The Bottom Line
The ability to incorporate sensor-based data into our data warehouse will provide us with new and greatly expanded analysis capabilities. However, we cannot ignore issues such as data quality, storage, and security. Above all, we must take steps to ensure that personal privacy and organization data security are preserved.