Over the past 10 years, we have witnessed the proliferation of data discovery tools, particularly products developed by QlikTech and Tableau. The ability to connect to and discover insight from multiple data sources without modeling the data environment and creating complicated ETL processes liberated business users who wanted quick data access and instant analytic enlightenment.
Rita Sallam, research vice president at Gartner, recently noted that "Data preparation is one of most difficult and time-consuming challenges facing business users of BI and data discovery tools, as well as advanced analytics platforms." Eliminating the need for expensive ETL developers to prepare the data sets was Breakthrough Number One. Equally important, though, was removing the dependency on BI developers to model the visualization and reporting layer based on business user feedback about what they wanted to see. Two traditional bottlenecks were removed at once, creating a seemingly ideal BI and data discovery experience.
But not quite.
Organizations soon began to question the reliability of the insight these tools provided because end users could access and manipulate their own data -- sometimes from unreliable sources. "As a result of the limited governance of self-service BI implementations, we see few examples of those that are materially successful -- other than in satisfying end-user urges for data access," according to Doug Laney, research vice president at Gartner. This is a strong statement and many business end users who are productively leveraging these tools will surely disagree. Doug is right: most deployments ultimately are not successful.
Data governance is not a "nice to have" -- it is a "must have." Whether a business is guaranteeing regulatory compliance, fraud prevention, security breaches, privacy, or just old-fashioned authenticity of the data, companies have to insist that the BI tools provide at least a minimal amount of governance. Tracing the data lineage back to the source and creating logs of how the data was manipulated or transformed is a basic requirement, yet very few tools perform that function. Data preparation is always someone else's business. In businesses that have a fully managed data warehouse, this isn't a big deal until they try to join that high-quality data with lower-quality data. Gartner refers to "smart data preparation" as the solution and it will be in effect in 2017.
I believe that self-service BI tools must be able to handle their own data preparation and provide basic data governance today, not two years from now. In regulated and unregulated businesses our tolerance for bad data is decreasing at a rapid rate. I recently visited with one of my customers, a very large healthcare insurance company with a very reliable and scalable mainframe environment for transactional processing and a well-managed EDW for business reporting. While these systems are secure and responsible, the business analysts were still using desktop tools and spreadsheets, making the true security and reliability of the data unknown.
Organizations can quickly become vulnerable to chaos and a lack of accountability when they dismiss the data governance recommended by the IT organization (and the compliance analysts they usually employ) -- all in the name of self-service and agility. The pendulum has swung away from IT governance and toward self-service data analysis and agility. Because it is not swinging back any time soon to a controlled locked-down environment with policies and procedures, I see no other option than to provide data preparation and data governance within the self-service BI tools.
Smart, self-service, data preparation and agile data visualization belong together, provided by the same vendor. A data governance solution promised through the integration of two or three software vendors is a significant risk. APIs change, companies are acquired, and vendors get tired of working with each other. If your organization demands the agility of self-service BI and data discovery, don't inherit big risks by abandoning data governance. Look for a solution that offers the right amount of both.