28.1.16

SAP HANA Case Study - Insurance Business

인메모리 SAP HANA 플랫폼을 통해 보험사 비지니스에 적용 가능한 Case에 대해서 설명합니다.


1. 보험 고객 인사이트 분석

Description
새로운 방식으로 고객을 이해하고 최적의 의사결정을 제공하기 위해 고객 관련 넓은 범위의 데이터를 판매 매니저 및 판매 에이전트에게 지원 필요

Current Customer Situation
비즈니스 전략과 Cross-sell, Up-sell 기회를 최대한 활용하기 위해 고객, 계약, 지불, 청구와 같은 비즈니스 데이터를 활용, 분석, 접근하는 어려움

Value proposition
보험 회사로 하여금 올바른 고객과 적합한 활동에 집중할 수 있도록 고객 행동과 고객 가치에 대한 투명성 제공

Outcome opportunity
매출 증가 및 수익성 극대화
새로운 보험 상품의 시장 출시 시간 단축

2. 보험 채널 분석


Description
다양한 상품, 라인 및 채널 파트너를 위한 판매 계획, 실제 판매 결과, 청구 할당량, 재생 비율, 수수료에 대한 질의 기반의 속도를 향상시켜 영업 관리자 지원

Current Customer Situation
보험업자는 유통 채널의 정체성과 성능에 대한 실시간 가시성에 대한 어려움

Value proposition
채널 파트너/에이전트에 대한 벤치마킹과 동료 그룹 비교에 대한 지원

Result
더 큰 수익성과 채널 파트너의 충성도를 높여, Go-to-market 전략 수정을 빠르게 개발 및 구현
경쟁 우위를 통한 빠른 시장 접근
실시간 통찰력을 기반으로 판매와 상품 전략을 직접 연결

3. 보험 사기 패턴 감지

Description
모든 LOB을 통해 보고된 모든 청구의 상당 부분에 대한 불법 청구 요청에 대한 요약
Claim Expert가 고객 정보와 결합된 자세한 청구 데이터를 사용하는 사기 패턴을 식별하도록 지원 필요
잠재적인 사기 사건을 드러내기 위한 기초로서 청구 처리 과정에서 수집한 자세한 정보를 사용

Current Customer Situation
IT 선행 투자에 의해서 정의되고 구축할 필요가 있는 보고서로서 비정형 분석 불가능

Value proposition
제약 없는 비정형질의와 복잡질의 분석으로 사기 패턴 파악

Result
청구 지급을 줄이고 사기 탐지 프로세스를 최적화할 수 있는 최신의 청구 규칙을 설정하기 위해 잠재적인 사기 패턴을 파악
사기를 기반으로 청구 지급 회피하여 합산 비율 (보험 회사에 대한 최고 KPI)를 최적화

4. 보험 사기 조사 및 예방

Description 
모든 LOB을 통해 보고된 모든 청구의 상당 부분에 대한 불법 청구 요청에 대한 요약
고객에게 대량 데이터에 대한 실시간 감지 기능, 완벽하게 통합된 조사 및 사기 방지 기능을 지원

Current Customer Situation
비즈니스 프로세스 통합 결여로 수 많은 부당한 지급으로 인도하는 잠재적인 청구 사기를 감지하고 조사하는 기능의 제한

Value proposition
사기 사건을 식별하고 비즈니스 결과에 엄청난 영향을 주는 허위 청구에 대한 지불 피하기

Result
청구 지급을 줄이고 사기 탐지 프로세스를 최적화할 수 있는 최신의 청구 규칙을 설정하기 위해 잠재적인 사기 패턴을 파악
사기를 기반으로 청구 지급 회피하여 합산 비율 (보험 회사에 대한 최고 KPI)를 최적화

5. 보험 청구 최적화

Description 
효율적인 청구 처리와 정확한 청구 재무 데이터를 최적화하기 위해 지불과 같은 다른 데이터 소스를 포함하여 자세한 청구 정보를 분석할 수 있도록 Claim Manager를 지원 필요

Current Customer Situation
보험회사는 재무 또는 운영 성능 문제의 초기 지표 부족
Long time 질의로 인한 분석 프로세스 연장 또는 중단

Value proposition
중단 없는 실시간으로 손실 분석

Result
고객 서비스 향상을 위한 보험 청구 프로세스 최적화
중단 없는 청구 프로세스로 인한 비용 절감
금융 위험 증가로 더 나은 비즈니스 통찰력 획득

6. 보험 청구 재해 위험 관리

Description 
CAT(재해) 여파로 인한 자원을 전개하는 것은 매우 중요하고 시간에 민감함
실시간으로 손실을 추적함은 해당 지역에 적합한 자원을 배포하는데 도움을 줌

Current limitations 
위험 상황 및 고객 서비스에 신속하게 대응할 수 없음

Value proposition
더 빠르고 효과적으로 고객을 돕기 위해 더 빠른 청구 프로세스

Result
자원 구축을 통한 비용 절감과 생활 비용 및 잔여 손해에 대한 통제
사기 노출 감소
고객 만족 증가

14.1.16

Why Data Agility is a Key Driver of Big Data Technology Development

As technology advances at breakneck speed, our lives are becoming increasingly digitized. From Twitter feeds to sensor data to medical devices, companies are drowning in big data yet starving for actionable information. Most likely, you've heard a lot of talk about the volume, variety, and velocity of big data and how challenging it is to keep up with that explosion of data.
For many enterprises, their ability to collect data has surpassed their ability to organize it quickly enough for analysis and action. Executives, IS staff, and analysts alike have been frustrated with traditional rigid processes for data processing that require a series of steps before data is ready for analysis. Relational databases and data warehouses have served businesses well for collecting and normalizing other relational data from point of sale (POS), ERP, CRM, and other data sources where the data format and structure is known and doesn't change frequently. However, the relational model and process for defining schema in advance cannot keep pace with the rapidly evolving variety and format of data.
Sometimes an analyst just wants to start playing with data to understand what's in it and what new insights it can reveal before the data is modeled and added to the data warehouse schema. Sometimes you're not even sure what questions to ask. This process drives up the costs for using traditional relational databases and data warehouses because DBA resources are required to flatten, summarize, and fully structure the data, and these DBA costs can delay access to new data sources. Legacy databases are simply not agile enough to meet the growing needs of most organizations today.
What is Data Agility and Why is it Important?
Hadoop has become a mainstream technology for storing and processing huge amounts of data at a low cost, but now the conversation has pivoted. These days, it's not about how much data you can store and process. Instead, it's about data agility, meaning how fast can you extract value from your mountains of data and how quickly can you translate that information into action? After all, you still need someone to apply structure or schema to the data before it can be analyzed. Just because you can get data into Hadoop easily doesn't mean an analyst can easily get it out.
Executives want their teams to focus on business impact, not on how they should store, process, and analyze their data. How does the ability to process and analyze data impact their operations? How quickly can they adjust and respond to changes in customer preferences, market conditions, competitive actions, and operations? These questions will direct the investment and scope of big data projects in 2015 as enterprises shift their focus from simply capturing and managing data to actively using it.
This concept can be applied not just to your big data infrastructure; it can be applied across all business activities, from risk management to marketing campaigns to supply chain optimization.
When the concept of data agility was first talked about, the discussion centered on an organization's ability to quickly gather business intelligence. However, the concept of data agility can also apply to data warehouse architecture. With traditional data warehouse architectures based on relational database systems, the data schema has to be carefully designed and maintained. If the schema must be changed, it can sometimes take up to a year to make the change to an RDBMS. Even the process of extracting data from a data store and loading it into a data warehouse can take an entire day before it's available to be analyzed.
With Hadoop, storing a new kind of data doesn't mean having to redesign the schema. It's as simple as creating a new folder and moving the new type of files to that folder. By using Hadoop for storing and processing data, teams can develop products in a much shorter timeframe.
The Real Hindrance to Data Agility
Traditional databases require a schema before writing data. Couple that with the time needed to get the data into the database and the process can no longer be considered agile. Worse yet, there are times those DBAs must perform complicated processes that require dropping foreign keys or exporting data, altering table designs, and even reloading data in a specific order to satisfy the table design. Some big data technologies such as Apache Hive are able to get around the schema-on-write but still require defining a schema before users can ask the very first question.
You Will Know Data Agility When You See It
New data discovery and data exploration technologies are being developed to provide greater flexibility. Apache Drill is a great example of "the" business enabler for data agility. Inspired predominantly by Google's Dremel, Apache Drill is an open source, low-latency SQL query engine for Hadoop and NoSQL that can query across data sources. It can handle flat fixed schemas and is purpose-built for semi-structured/nested data.
What does this mean to be "the" business enabling technology? Think real-time business intelligence. Drill is opening the door to this inevitable future of shortened cycle times for data processing to support faster responses to opportunities and threats. Ultimately, the faster you can ask a question and get the right answer, the better for your business.
Drill implements schema-on-the-fly. This means that when a new data format arrives, nothing has to be done to be able to process the data with Drill. No DBAs are required to build and maintain schema designs. Commercial off-the-shelf business intelligence tools can communicate with Drill because Drill implements standards. It is ANSI SQL:2003-compliant and ships with JDBC and ODBC drivers. The business doesn't have to adopt new tools to work with all the data from all data sources.
Of course, for any new technology, an opposing view can always be considered. The question that may arise is: What innovations are fueling the need for these new technologies? The dominant change in the industry falls on the utilization of data interchange formats such as JSON. Data that comes from applications that publish data in JSON do not require a DBA to structure the inbound data because it shows up already structured, thus eliminating the personnel and process bottleneck.
Drill fuels data agility by allowing users to perform self-service data ingestion and data source management, whether due to adding a new data source or adapting for a change in the incoming data structure.
Agility in Your Enterprise
Data agility should be an important aspect of all your big data initiatives in the future. Individuals can analyze and explore data directly. Self-service data exploration eliminates the dependency on IT to set up data definitions and structures, and frees up IT staff to perform more valuable and leveraged activities.
By implementing agile technologies such as Hadoop and Apache Drill into your enterprise and existing data management and analytics capabilities, you'll be able to guide your organization's agility towards real-time business impact.

Self-Service BI vs. Data Governance

Over the past 10 years, we have witnessed the proliferation of data discovery tools, particularly products developed by QlikTech and Tableau. The ability to connect to and discover insight from multiple data sources without modeling the data environment and creating complicated ETL processes liberated business users who wanted quick data access and instant analytic enlightenment.
Rita Sallam, research vice president at Gartner, recently noted that "Data preparation is one of most difficult and time-consuming challenges facing business users of BI and data discovery tools, as well as advanced analytics platforms." Eliminating the need for expensive ETL developers to prepare the data sets was Breakthrough Number One. Equally important, though, was removing the dependency on BI developers to model the visualization and reporting layer based on business user feedback about what they wanted to see. Two traditional bottlenecks were removed at once, creating a seemingly ideal BI and data discovery experience.
But not quite.
Organizations soon began to question the reliability of the insight these tools provided because end users could access and manipulate their own data -- sometimes from unreliable sources. "As a result of the limited governance of self-service BI implementations, we see few examples of those that are materially successful -- other than in satisfying end-user urges for data access," according to Doug Laney, research vice president at Gartner. This is a strong statement and many business end users who are productively leveraging these tools will surely disagree. Doug is right: most deployments ultimately are not successful.
Data governance is not a "nice to have" -- it is a "must have." Whether a business is guaranteeing regulatory compliance, fraud prevention, security breaches, privacy, or just old-fashioned authenticity of the data, companies have to insist that the BI tools provide at least a minimal amount of governance. Tracing the data lineage back to the source and creating logs of how the data was manipulated or transformed is a basic requirement, yet very few tools perform that function. Data preparation is always someone else's business. In businesses that have a fully managed data warehouse, this isn't a big deal until they try to join that high-quality data with lower-quality data. Gartner refers to "smart data preparation" as the solution and it will be in effect in 2017.
I believe that self-service BI tools must be able to handle their own data preparation and provide basic data governance today, not two years from now. In regulated and unregulated businesses our tolerance for bad data is decreasing at a rapid rate. I recently visited with one of my customers, a very large healthcare insurance company with a very reliable and scalable mainframe environment for transactional processing and a well-managed EDW for business reporting. While these systems are secure and responsible, the business analysts were still using desktop tools and spreadsheets, making the true security and reliability of the data unknown.
Organizations can quickly become vulnerable to chaos and a lack of accountability when they dismiss the data governance recommended by the IT organization (and the compliance analysts they usually employ) -- all in the name of self-service and agility. The pendulum has swung away from IT governance and toward self-service data analysis and agility. Because it is not swinging back any time soon to a controlled locked-down environment with policies and procedures, I see no other option than to provide data preparation and data governance within the self-service BI tools.
Smart, self-service, data preparation and agile data visualization belong together, provided by the same vendor. A data governance solution promised through the integration of two or three software vendors is a significant risk. APIs change, companies are acquired, and vendors get tired of working with each other. If your organization demands the agility of self-service BI and data discovery, don't inherit big risks by abandoning data governance. Look for a solution that offers the right amount of both.

How Sensor-Generated Data Enhances Your Data Warehouse's Value

We are all familiar with typical data warehouse subject areas such as products, customers, employees, vendors, sales, and financial items, but many of us will soon be involved in integrating other subject areas that result from collecting data from the "Internet of things" or IoT. Although RFID (radio frequency identification ) tags for supply chain tracking may have been one of the earliest IoT sources, real-time sensors in smart buildings and homes, phones and tablets, employee badges, security cameras, watches and personal fitness monitors, automobiles, appliances, and even smart clothing will certainly generate vast amounts of additional information. This will include a wide variety of data such as temperatures, physical locations, geocodes, call details, weather conditions, medical biosensor readings, automobile and driver operational data, EasyPass toll charges, airline flight data, and equipment status.
The addition of all this new data will likely eclipse today's "big data lakes" (or, if uncontrolled, they might be better called big data "swamps" or "dumps") in both size and management complexity. Fortunately, this new data will increase the value of our data warehouses and serve to enhance their overall analytic capabilities.
For example, we will be able to incorporate new variables into our analysis and predictive analytic techniques to determine how specific weather conditions, physical activity, traffic conditions and driver habits, and currency and trade fluctuations might respectively affect consumer behavior, medical conditions and athletic performance, insurance and warranty claims, or stock market behavior.
This will go far beyond obvious hypotheses such as "umbrella sales increase when it starts to rain" to yield new insights that are far less obvious and of far greater value. For example, although data captured by personal fitness devices can lead to personalized exercise and diet regiments, this data might also be mined to produce insights into factors that lead to a variety of health problems such as high blood pressure, diabetes, or even the likelihood of future cognitive disorders.
Concerns
Integration of sensor data will face the same data quality and consistency issues that traditional data integration projects are subject to. For example, temperature data from multiple sensors should use (or be transformed into) the same unit of measure. If some sensors are measuring in Centigrade when other sensors are measuring Fahrenheit degrees (and still others are measuring on the Kelvin scale), the results of any analysis will be worthless unless the readings are converted to the same scale.
The vast amounts of data generated by the Internet of things will raise several storage issues, including how to physically store it, where to store it (e.g., in-house or in the cloud), and how long to retain it. Although the answers to these questions will likely depend on how the organization collecting the data plans to use it, compliance concerns may require that some data be retained (perhaps in archival storage) long after it is of any value for analyics.
Sensor devices may be subject to hacking, and personal privacy will definitely be an issue. From a data warehousing perspective, we need to consider the security impact of direct feeds of sensor data into our data warehouses. Each entry point could potentially represent a new security vulnerability. Security breaches harm consumers whose identities have been compromised and harm the organizations that collected the compromised data. Just ask Target, Home Depot, or Anthem about the negative consequences of security breaches and consider what the impact would be on your own organization if your data warehouses were compromised.
The Bottom Line
The ability to incorporate sensor-based data into our data warehouse will provide us with new and greatly expanded analysis capabilities. However, we cannot ignore issues such as data quality, storage, and security. Above all, we must take steps to ensure that personal privacy and organization data security are preserved.
About the Author
Michael A. Schiff is a principal consultant for MAS Strategies. He can be reached at mschiff@mas-strategies.com 

Getting Started with Big-Data-as-a-Service

Introduction
After software-as-a-service, platform-as-a-service, and data-as-a-service, the next big trend will be big-data-as-a-service (BDaaS). In the last few years, many vendors (e.g. Amazon's AWS with Elastic MapReduce, EMC's Greenplum, Microsoft's Hadoop on Azure, Google's Google cloud big table, etc.) have started focusing on offering cloud-based big data services to help companies and organizations solve their Information management challenges.
Some analysts estimate that the portion of business IT spending that is cloud-based, x-as-a-service activity will increase from about 15 percent today to 35 percent by 2021. Given that it is estimated by IDC that the global big data market will be worth $88 billion by that point, at its current rate, we'd forecast that the value of the BDaaS market could be $30 billion -- which means roughly 4 percent of all IT spending will go into BDaaS in just six years.
In this article, I offer a brief overview of the BDaaS concept and explain how it can be utilized by organizations around the world.
What is BDaaS?
Big data refers to the ever-growing amount of more varied, more complex, and less structured information we are creating, storing, and analyzing. In a business sense, BDaaS refers specifically to applying insights gleaned from this analysis to drive business growth.
Big-data-as-a-service (BDaaS) is delivered as an analytical tool or as processed information provided by an outside supplier to provide insight into (and plans for creating) a competitive advantage. Such insights are provided by analyzing large amounts of data processed from a variety of sources.
The diagram above depicts various layers that are required to provide big data as a service.
The bottommost layer represents infrastructure-as-a-service components such as compute-as-a-service (CaaS) and storage-as-a-service (StaaS) and their management. On the next layer up, service providers offer database-as-a-service (DBaaS) or data aggregation and exposure as part of data-as-a-service (DaaS). The next layer up in this BDaaS architecture is data platform-as-a-service. As user expectations for real time grow, real-time analytics need to be enabled, and this occurs in the topmost layer; users of analytics-software-as-a-service can generate their own reports, visualizations, and dashboards.
Is the BDaaS Architecture Rigid?
Depending on the scope and size of an application, the data aggregation and data processing layers can be merged. Also it is not mandatory that all the layers need not be based on tools from same vendor. For example, storage-as-a-service (StaaS) and database-as-a-service (DBaaS) are very different and considerations for choosing a vendor or product for each layer would be different as well. It is not necessary to force-fit a solution from same vendor across layers unless we are convinced that it suits the business.
Why BDaaS?
Typically, most commercial big data initiatives will require spending money up front on components and infrastructure. When a large company launches a major initiative, this capital outlay is likely to be substantial. On top of initial setup costs, storing and managing large quantities of information require significant ongoing costs as well.
Big-data-as-a-service enables enterprises to provide users with widespread accessibility to big data interfaces and tools, including MapReduce, Hive, Pig, Presto, and Sqoop. At the same time, its self-managed and auto-scaling infrastructure enables enterprises to achieve this access with a very small operational footprint.
BDaaS enables agility, fluidity, and flexibility. For example, enterprises find it's relatively easier and quicker to adapting changes at lesser cost. Fluidity is the degree to which your BDaaS can be rapidly and cost-effectively repurposed and reconfigured to respond to (and proactively drive) change in a dynamic world. Regulatory factors can affect where you locate your data layer. For example, evolving data provenance laws in some countries require data to be hosted on "native soil." In such cases, the fastest path to compliance might be to extend your data layer to a public cloud data center within the country rather than establishing your own data center there.
Fluidity also enables greater business continuity, and thus confidence. After all, servers fail. Network connections fail. Data centers go offline. To keep serving users and generating value, a fluid data layer includes replicas of data in different locations that can handle data requests in the event of failures in other parts of the data layer.
When organizations outsource all of the big data-related technology and adopt a BDaaS approach, they are then free to focus on business challenges such as mergers and acquisitions, risk and regulatory compliance, customer churn, and adapting to a rapidly changing market
Some example scenarios:
  • As a small retailer, we could analyze the relationship between social-media conversations and buying trends to quickly capitalize on emerging sales opportunities.

  • When launching a new online service, we could analyze our site's visitors and track how they move from page to page so we can understand what engages them, what turns them off, and where we have promotional and cross-selling opportunities.
Top 3 Myths about Big Data as a Service
Myth #1: It is an Infrastructure Play
It's not just an infrastructure play; it's also a way of buying and selling the data itself in big data sets, or in the case of governments, giving it away. Take www.data.gov for example. It's a regular treasure trove of data. You don't need to pull all that data in-house because the government is kind enough to host it for you, and allow you to put your data analysis tools to bear on it to find the data nuggets that are most valuable to you and your business requirements — and whatever question you are trying to answer.
Myth #2: Architecture is Very Complex
Cloud computing has fundamentally changed the landscape of large-scale computing. Users are now able to quickly instantiate simple virtual machines with little effort and can exploit scalable platforms to serve their applications.
Following aspects of the services make the architecture relatively less complex but still deliver big data as a service in a seamless way. With this architecture, these services do not "install" applications but instantiate which is analogous to forking a process. These services also enable transferring data from one service to another set of services. This architecture also enables packaging various components (storage, caching, analysis etc.) together into a single service. By packaging various storage and analysis components together, users can create analytic services. Using this architecture, users will be able to interact with complex analytic services without considering the underlying software and hardware.
Myth #3: It is Hard to Implement
Vendors such as Microsoft, Google, Amazon, EMC have provided complete solutions to support Big data as a service and pay-as-you-use models. With such products and solutions, it has become viable and relatively less challenging to implement. This service eliminates the implementation and operational headaches caused by big data. Proprietary systems management of these products wraps proven configuration of standard open source technologies to provide a robust, scalable, secure and reliable service, giving customers a simple, developer-friendly environment. This can be deployed in public cloud, virtual private cloud, enterprise private cloud and dedicated clusters depending on business use case and data sensitivities.
Final Thoughts
The term big-data-as-a-service may not be elegant or attractive, but the concept is solid. As more organizations realize the benefits of implementing big data strategies, more vendors will emerge to provide such supporting services to them. With the growth in popularity of software-as-a-service, organizations are getting used to working in a virtualized environment via a Web interface and integrating analytics into this process is a natural next step.
We have started hearing from CxOs that this is making big data projects viable for many businesses that previously would not have considered it because they expected a low ROI or because their team didn't have the skills, time, or resources to tackle the technology. This service allows customers to focus on creating the applications that will drive value for their business instead of spending their time learning to design, test, deploy and manage a secure, scalable and available big data "infrastructure stack." Big-data-as-a-service is something we will see and hear a lot more about in the near future.

Four Drivers of Data Warehouse Modernization

No matter the vintage or sophistication of your organization's data warehouse (DW) and the environment around it, it probably needs to be modernized in one or more ways. DW modernization takes many forms. Common scenarios range from software and hardware server upgrades to the periodic addition of new data subjects, sources, tables, and dimensions.
As data types and data velocities continue to diversify, many users are likewise diversifying their software portfolios to include tools and data platforms that are built for new and big data. A few organizations are even decommissioning current DW platforms to replace them with modern ones that are optimized for today's requirements in big data, analytics, real-time, and cost control.
No matter what modernization strategy is in play, all require significant adjustments to the logical and systems architectures of the extended data warehouse environment.
Most of the trends that are driving the need for data warehouse modernization boil down to four broad issues:
Organizations demand business value from big data. In other words, users are not content to merely manage big data and other valuable data from new sources, such as Web applications, machines, devices, social media, and the Internet of things. Because big data and new data tend to be exotic in structure and massive in volume, users need new platforms that scale with all data types if they are to achieve business value.
The age of analytics is here. Many firms are aggressively adopting a wide variety of analytic methods so they can compete on analytics and understand evolving customers, markets, and business processes. There is a movement from "analyst intuition" and statistics to empirical data-science-driven insights. Furthermore, today's consensus says that the primary path to big data's business value is through the use of so-called "advanced" forms of analytics based on technologies for mining, predictions, statistics, and natural language processing (NLP). Each analytic technology has unique data requirements, and DWs must modernize to satisfy all of them.
Real-time data presents new challenges. Technologies and practices for real-time data have been around and successfully used for years. Yet, many organizations are behind in this area, and so it's a priority for their data warehouse modernization projects. Even organizations that have succeeded with real-time data warehousing and similar techniques will now need to refresh their solutions so that real-time operations scale up to exponential data volumes, streams, and greater numbers of concurrent users and applications. Furthermore, real-time technologies must adapt to a wider range of data types, including schema-free and evolving ones.
Open source software (OSS) is now ensconced in data warehousing. Ten years ago, Linux was the only OSS product commonly found in the technology stack for DWs, BI, analytics, and data management. Today, TDWI regularly encounters OSS products for reporting, analytics, data integration, and big data management. This is because OSS has reached a new level of functional maturity while still retaining desirable economics. A growing number of user organizations are eager to leverage both characteristics.

8 Questions to Ask about Hadoop in the Cloud

Big data analytics and cloud computing are the two biggest technology trends today. They are two of the four key pillars in IDC's Third Platform definition. With the growing popularity of public clouds such as AWS and Azure, enterprises of all sizes are seriously looking at running any and all workloads in the cloud to achieve business agility, cost savings, and faster innovation.
As enterprises start evaluating analytics on Hadoop, one of the questions they frequently ask is: "Can we run Hadoop in the cloud without any negative trade-offs?" We strongly believe that in the long term Hadoop will live in the hybrid cloud. However, important considerations need to be addressed to make cloud deployment successful.
At a fundamental level, there are three options for running Hadoop analytics in the cloud.
Option 1: Hadoop-as-a-service in the public cloud. Solutions such as Amazon's EMR (Elastic MapReduce) and Azure HDInsight claim to provide a quick and easy way to run MapReduce and Spark without having to manually install the Hadoop cluster in the cloud.
Option 2: Pre-built Hadoop in the public cloud marketplace. Hadoop distributions (such as Cloudera CDH, IBM BigInsights, MapR, Hortonworks HDP) can be launched and run on public clouds such as AWS, Rackspace, MS Azure, and IBM Softlayer.
Option 3: Build your own Hadoop in the public cloud. Public clouds offer Infrastructure-as-a Service (IaaS) solutions such as AWS EC2 that can be leveraged to build and manage your own Hadoop cluster in the cloud.
All three options can be good for various analytics use cases and strongly complement the on-premises Hadoop deployment on bare metal or private cloud infrastructure. For example, on-premises Hadoop deployment is a good choice where the source data lives on-premises; this option typically requires ETL from various discrete sources and ranges from hundreds of terabytes to several petabytes in capacity. Public-cloud-based Hadoop deployments are a good option when the data is generated in the cloud (e.g., analyzing Twitter data) or on-demand analytics (if it is cost-effective, secure, and easy to regularly migrate the source data from on-premises to the cloud).
We strongly believe that ultimately Hadoop will live in the hybrid cloud. There are plenty of on-premises deployments ramping up to hundreds of nodes as the design considerations are better understood. However, we are also starting to see some early-stage customer success stories involving public-cloud-based Hadoop deployments.
When thinking about Hadoop analytics in the cloud, you must answer some key questions. The remainder of this article focuses on what questions to ask when choosing a public cloud solution for Hadoop analytics.
Question 1: Does the cloud infrastructure guarantee consistent performance?
Traditionally, Hadoop provides architecture guidelines for ensuring consistent performance so that business insights can be achieved faster. In a public-cloud deployment, it is important to understand if the cloud provider can guarantee performance and know the associated costs for such performance. You may be sharing the underlying infrastructure with other enterprises. As a result, you will likely have no control over which physical server the VM is using and what other VMs (yours or other customers) are running on the same physical server. You may run into a noisy neighbor problem if VMs from other customers run rogue on the servers on which your VMs are running and there are no quality of service (QoS) policies in place.
Question 2: Does the cloud option offer high availability similar to the on-premises Hadoop deployment?
Hadoop provides architecture guidelines to ensure high availability against hardware failures. In a cloud-based deployment, there is no "rack awareness" that you have access to, something that can be configured in the namenode for on-premises deployment. In a cloud deployment, it is important to understand how high availability is maintained, especially for protecting against rack failures.
Question 3: Can the cloud offer flexible and cost economical scaling of computing resources?
Hadoop requires linear scaling of computing resources as the data you want to analyze keeps growing exponentially (doubling every 18 months according to research). It is important to understand the cost implications of scaling the computing power infrastructure and to carefully pick computing-resource capacity. Not all compute nodes are made equal. There is a buffet of different compute nodes. Some are heavy on processors while others give you more RAM. Hadoop is a compute-intensive platform and the clusters tend to be heavily utilized. It is important to pick compute nodes with beefier processors and higher amounts of RAM.
Sometimes you can upgrade your compute nodes to a higher level, but that's not true of all the compute nodes that are available. Some premium nodes are only available in certain data centers and therefore cannot be upgraded to if your existing compute nodes are provisioned in another data center.
Question 4: Does the cloud offer guaranteed network bandwidth required for Hadoop operations?
Ensuring high availability and data-loss prevention in Hadoop on DAS storage requires making three copies of the data replicated over a dedicated 10GbE network between the nodes. Highly reliable and high-performance, enterprise-grade storage options for Hadoop (e.g., NetApp E-Series storage array) require only two copies of data, thereby reducing the network bandwidth requirements by 33 percent compared to the DAS based deployment.
Hadoop also requires guaranteed network bandwidth for efficiently running MapReduce jobs because the ultimate goal is to finish the MapReduce jobs quickly to achieve business insights. For a cloud-based deployment, you must understand if guaranteed network bandwidth option is available and what that will cost. Typically in cloud deployments, the physical network is securely shared between multiple tenants, so it is critical that you understand any QoS policies to ensure network bandwidth availability.
Question 5: Can the cloud offer flexible and cost-effective scaling of storage?
Capacity and performance are the two important storage considerations for scaling Hadoop. From the capacity perspective, traditional Hadoop deployment requires replicating data three times to protect against data loss resulting from physical hardware failures. This means you need to factor in three times the storage capacity requirements in addition to the network bandwidth requirements to create three copies of data.
Find out how the cloud cost-per-GB compares with the on-premises deployment taking the 3x redundancy factor into consideration. Shared storage options such as the NetApp E-Series [full disclosure: Mr. Joshi works for NetApp] requires only two copies of data and the NetApp FAS w/NFS Connector for Hadoop in-place analytics requires only one copy of the data. Therefore, to save money, you could host a shared storage solution in a co-located facility and leverage the compute nodes from public clouds such as AWS and Azure.
From a performance perspective, Hadoop by design assumes the availability of high-bandwidth storage to quickly perform sequential reads and writes of large-block I/O (64K, 128K, and 256K) to complete the jobs faster. Flash media (e.g. SSDs with high IOPS and ultra-low latency) can help quickly complete Hadoop jobs. Performance can be ensured with on-premises deployment by using several servers with DAS storage (SSDs for performance) or by using the high-bandwidth shared storage options such as the NetApp E-Series/EF All-Flash Array that maintain performance even during disk drive failures and RAID rebuild operations. With cloud-based deployments, you must understand how high bandwidth, low latency, and high IOPS will be guaranteed and know the associated costs.
Question 6: Is data encryption option available in the cloud-based Hadoop solution?
Data encryption is becoming a key corporate security policy and government mandate, more so for industries such as healthcare. Therefore, find out if the cloud-based Hadoop deployment natively supports data encryption and learn the associated performance, scaling, and pricing implications.
Question 7: How easy and economical is to get the data in and out of the cloud for Hadoop analytics?
Cloud provides use different pricing models for ingesting, storing, and moving data out of the cloud. Also, there are pricing implications around the WAN network required to transfer the data to the cloud. Not every feature you need is available in every cloud location, so you may need to provision your Hadoop cluster in certain locations -- close or far from your on-premises data center. Get a detailed understanding of how much it will cost to move the data in and out of the appropriate cloud location. Some storage vendors offer data fabric solutions allowing you to easily replicate the data between on-premises and co-location facilities, and leverage compute farms from the public cloud. Such options can help you save money because you are not copying the data in and out of the AWS cloud yet still leveraging the cloud's compute benefits.
Question 8: How easy is it to manage the Hadoop infrastructure in the cloud?
As deployment scales, you may have hundreds to thousands of nodes in the Hadoop cluster with petabytes of data that may also require backups and disaster recovery. It becomes important to understand the management implications of scaling your Hadoop cluster in the cloud.
Final Thoughts
Hadoop-based big data analytics is gaining traction with a majority of early adopters who started their deployment in an on-premises data center. The public cloud has definitely matured today compared to just a few years ago. We are seeing enterprises across different industry segments starting to explore public cloud for Hadoop for a variety of reasons. We hope that this article provided valuable insight into the key things to consider when looking at public cloud for Hadoop. Our belief is that in the long term, Hadoop will live in the hybrid cloud with both the on-premises and cloud deployment options providing value for different use cases.

Data Warehouses and In-Memory Technologies: Myths and Reality

Analytical solutions have transformed organizations across every industry. From aspirational companies that are starting their analytics journey to mature enterprises that provide competitive advantage leveraging analytics, investments in this space are ever growing. Surely, "The price of light is less than the cost of darkness" -- investments in analytics technology, data, and people are critical in transforming businesses. At the same time, to secure these investments, a well-thought out business value case accompanied by a comprehensive data and analytics strategy is required to achieve the business value.
Many software and service providers promise quick results and assure that they will fulfill ambitious long-term goals. One such promise is in-memory technology to analyze big data, improved business performance, decreased time to value of analytics projects, and reduced complexity in layered data architectures. In-memory technology can open doors to a number of business value cases to foster analytics transformation enabled through big data, real-time reporting, etc.
However, in most other business value cases, the technology is not an imperative but only an expensive choice. It is neither a nostrum nor a fundamental requirement for analytics solutions. More importantly, these investments demand a stronger information value chain to leverage the faster processing and optimization capabilities, and a clear enterprise data warehouse architecture and management process to sustain the benefits.
I drew few interesting considerations from my personal experiences. One noteworthy example is a data warehouse implementation on in-memory database technology to provide enterprise reporting and analytics. The initial scope was to provide descriptive reporting and ad hoc analysis for a single business unit followed by an enterprise analytics road map. The initial investment reflected a commendable and ambitious vision of the organization for analytics. Visibility and easier access to information provided indispensable business insights leading to key strategic and operational decisions. However, the solution lacked positioning for long-term success and sustainability. Benefits achieved from the project were limited to faster query response times relative to a traditional data warehouse solution. It is questionable if the value realized justified the high costs of in-memory technology and associated software licenses.
Adding to the costs, the scalability of in-memory analytics platform is expensive and challenging. The situation impaired the enterprise analytics road map progress as the cost of new projects became extremely high.
 
As a result, the size of the data acted as a major stumbling block for newer business analytics (BA) projects. The organization had been highly selective in choosing analytics and BA projects to effectively leverage the initial investments in data warehouse platform. Business users were left to manually collect and combine distributed data and forced to create offline spread marts and departmental data islands. These local databases and spread marts caused data quality challenges and impaired key business decisions. The existing platform lacked positioning as an enterprise data warehouse and instead became one of the data mart solutions. The organization never obtained the true value of enterprise data warehouse capabilities with integrated information. As the data warehouse was incomplete, business users were unable to leverage the power of online BI tools.
Among many other factors, the expensive in-memory technology in this case hindered their progress on the analytical maturity curve. It was a "chicken and egg" puzzle of investment and value generation. Without seeing a good ROI, it's understandable that the organization hesitates to further scale their analytics platform and data warehouse.
This article provides two key takeaways for those who want to undertake an analytics transformation with an in-memory technology platform. First, having a 360-degree view of all costs will help avoid misconstruing the benefits case. Second, and perhaps more important, poor positioning of technologies can result in greater hurdles in achieving organizational goals.
  • Let's drill down into three business needs:
  • To beware of hidden costs behind the benefits case
  • To evaluate the complete information value chain
  • Positioning itself for successfully adopting the platform and realizing its benefits
Hidden Costs Behind the Benefits Case
Organizations should be aware of all costs before embracing in-memory technology for data warehouse solutions. The initial cost savings from simplified architecture, decreased development efforts, and reduced support and maintenance costs are largely a myth.
Increased costs: Initial setup costs are typically high (especially for the infrastructure and licenses) and carry associated hidden costs, such as the need for skilled resources for support, to handle additional governance activities, and to manage platform complexities. For instance, a simple database-related issue might require three workers to solve; a data modeler, in-memory database administrator, and system administrator. If these skills are segregated into separate teams, analyzing a simple performance issue will translate into a project, in turn affecting agility and responsiveness of the IT function. Another possible side effect: increased governance processes.
Increased complexity: Typical in-memory technology configuration has limited capacity for high-performance memory storage, as supposed to the capacity on disk is fairly large and easily scalable. The recommended memory-to-disk ratio must be followed to ensure proper system performance. To utilize the scarce in-memory resources cost-effectively, the data warehouse architecture should incorporate hot, warm, and cold data areas. Adding new data to the environment and regular data loads require memory and disk usage analysis. Moreover, data retention and deletion strategies should be implemented up front.
Evolving product: Product improvements and revisions are frequent in this space. Though the vendors are quite helpful in guiding customers, customers pay the price for continuous improvements and constant upgrades by allocating additional resources to keep up with the platform updates. Forced to be "current and correct" with the solution will cost IT resources and cause business disruptions.
Organizational changes: Unlike traditional analytics platforms, in-memory technology solutions require proper organizational changes to sustain the application and the technology platform. Gray areas of accountability and change management exist between application management and database management. Typically these are divided between two teams. To support the application holistically, business analytics or data warehouse application management will require overlapping knowledge of in-memory technology and database management. The overlapping nature of these skills will require constant and close collaboration to support the application. The magnitude of change management activities increases significantly, causing constant business disruptions.
Strengthen the Information Value Chain
A strong commitment is required from executives to realize the vision behind analytics investments. The broader the goals, the more complex the value-realization equation.
To generate business value, strong components of the entire information value chain will have to be in place: scalable platform, quality data, intuitive BI tools, trained users, and high-value projects covering descriptive, predictive, or prescriptive business cases. Investing in expensive in-memory technology will not reduce the importance of other components. Instead, it will increase the significance of these other components to realize the ROI on costly investments.
The entire information value chain is only as strong as its weakest link. Adding in-memory technology doesn't make the other links stronger. To realize the true business value, good data, skilled resources, and competent business users, right projects, and effective operations are critical.
As shown on the graphic, each component plays an equally important role in generating business benefits
In addition to strengthening the overall information value chain to maximize returns, a technology strategy should be adjusted to properly position new analytics solutions in the overall enterprise analytics framework. This is a critical step in effectively operationalizing new solutions and sustaining the benefits for a long time.
Positioning for Successful Adoption and Benefits Realization
In-memory technologies constitute only a piece of an enterprise data warehouse solution. Whether it is HANA or Hadoop, the new technologies typically augment a data warehouse solution. For organizations that truly want to tap into the power of in-memory technologies for point solutions / use cases and achieve maximum return on their investments (instead of continuously injecting more money as their database grows), consider a mixed technology solution. Enterprises should also consider developing a logical and distributed data warehouse solution that blends:
  • Enterprise data-modeling-layer capabilities to provide an enterprise data model and unified data warehouse (considering multiple platforms)
  • An architecture that isolates highly used aggregated analysis (separate transactional reporting models)
  • Data lake repositories (leveraging smart data access concepts) for staging all data from distinct and distributed data sources
The power of in-memory technology is untapped but it is only a tool, not the ultimate answer to data and analytics challenges that companies face.
A Final Word
This article had provided guidance on better positioning in-memory technology investments for data warehouses. A proper and informed situation analysis will guard against hype cycles and peer pressure. Costs and business disruptions caused by introducing new technology demand a comprehensive benefits case including the hidden operational and organizational costs.
In addition, organizations should be prepared to position any technology within an overall data and analytics strategy framework. They should evaluate the change management necessary to strengthen the value chain and sustain new solutions. This change management process cannot be undermined because it involves more resources, takes more time, and forces reevaluation of architecture and operational aspects that are critical for long-term success.

3 Tech Trends to Watch in 2016

3 Tech Trends to Watch in 2016

In 2016, these trends are all about taking visual BI and analytics to the next level.
Business users are more excited than ever about modern visual analytics and business intelligence (BI) technologies. By implementing them, they hope to move beyond their years of frustration with spreadsheets and canned enterprise BI reporting, technologies that were themselves once great advances over calculators, adding machines, and pencils with good erasers.
The turn toward visual analytics and data discovery tools and applications is making it easier for nontechnical users to analyze larger volumes of data more frequently as part of their daily activities rather than in the form of occasional and special requests to IT developers and business analysts.
Yet, while adoption of visual analytics tools and applications has been having a dramatic impact, the technology industry as well as user organizations still have further to go before they can truly realize the potential of democratized visual analytics. Some of the factors most affecting the course of advancement are not about the technology itself.
First, organizations should not make the mistake of neglecting user training because these are "self-service" technologies. Users still need training, not only to learn how to get the most out of the tools and applications but also to learn how to work effectively with data and visualizations. Many users lack experience in analyzing data, evaluating the validity of data sources or analysis, or exploring for unknown trends or patterns and understanding their significance. With the number of visualization options expanding, users need guidance in choosing the best ones for their analysis and for sharing insights with others.
Second, organizations must consider what types of activities will dominate users' implementation of the tools or applications.
-- If they are using dashboards or other visualizations to simply provide a better experience with standard BI reporting, then their emphasis should be on consistent presentation and ease in identifying changes in the data, particularly over time.
-- If the context is operational monitoring and alerting, then the visual interface must make it easy for users to spot the relative importance of issues so they can analyze root causes and quickly determine the best actions. Near-real-time data updates or stream analysis may be needed.
-- If users are performing visual discovery and analysis, they need room for experimentation with a variety of data sources; they need to be able to iterate and do test-and-learn inquiries and employ visual functionality for filtering, comparing, and correlating data.
Tech Trends to Watch
As we head into 2016, BI and visual analytics technologies will continue to evolve toward meeting the needs of the democracy of primarily nontechnical users who want to integrate analytics into their daily decisions and actions. Vendors are competing to make it easier and more fulfilling to use their tools and applications. Here are three trends to watch:
Tech Trend #1: Technologies will aim for intuitive user experiences
In the words of Daniel Kahneman, Princeton University psychology professor and author of "Thinking, Fast and Slow," intuitions are "thoughts and preferences that come to mind quickly and without much reflection." Intuitive interfaces apply algorithms to learn and adjust to user's needs and patterns behind the scenes so that "speed of thought" is uninterrupted; users should not have to stop and futz with the technology before moving forward. Such interfaces could enable users to experience a kind of informed intuition. Vendors are engaging in considerable research and development in this area; it will be interesting to see what advances they deliver in the coming year.
Tech Trend #2: Interactive access to Hadoop data will mature
To support analytics processes, users, developers, and data scientists need tools and data processing platforms that enable better interactive querying of the voluminous and varied data in Hadoop files.
Previously, the options for interactive querying were limited and slow. Now, both front-end BI and analytics tools and applications and back-end data processing platforms are advancing to enable multiple styles of interaction. These include better batch processing and execution to support Hive-MapReduce jobs, SQL-on-Hadoop technologies for interacting with Hadoop data directly rather than through a middle-tier database, and the emergence of Apache Spark as an alternative, and faster platform and framework. Open source projects continue to generate new technologies, so given the level of demand, we can expect more innovation in this space in 2016.
Tech Trend #3: Embedded BI and analytics will be a major focus
TDWI Research finds that most organizations view it as an important part of their technology strategy to be able to embed dashboards, reports, visualizations, and more within other kinds of applications, portals, and business process management systems. This is particularly valuable in operational contexts where users may not have separate BI and analytics tools and in any case it is more efficient for them to interact with data without switching from one application or interface to another. Organizations should evaluate technologies for how well they achieve more seamless and integrated experiences for users, including through embedding BI and analytics.
Pushing Past Today's Boundaries
These technology developments plus attention to training issues will help organizations take their visual BI and analytics tool and application deployments to the next level. They will help users, developers, and data scientists have more productive experiences with more kinds of data. They will also make for an exciting New Year in BI and analytics.

SAP Lumira is the Exciting Headline but Enterprise BI is the Bread and Butter

The Sapphire Now event this year was all about simplicity, the cloud, and millenials. That's the message that SAP CEO Bill McDermott wanted to drive home in his keynote. BI, meanwhile, took a back stage to HANA and cloud announcements.
By Cindi Howson, BI Scorecard
At SAP, all roads lead to HANA, whether for BI or for transaction processing, on-premises or in the cloud. HANA is just three years old, and as an in-memory appliance, SAP has rapidly seeded the market with developers, fostered its partner network, and introduced new products that leverage the speed of in-memory. Customer Norwegian Cruise Lines spoke of how HANA allowed them to better analyze their data, faster than a previous data warehouse, and saving $700 million annually. Most interesting to me is how Norwegian Cruise Lines hadn't used anything from SAP before HANA, a proof point that HANA is not only for big ERP customers.
The NFL with its fantasy football app is leveraging HANA, mobile, predictive, and Lumira to support its millions of fans, a customer base of millions and growing in the 25 to 50 percent range annually.Seoul University Hospital reduced its query time from hours to seconds, with a 147 percent return on investment in HANA. More important, CIO Dr. Hwang said HANA allows the hospital to analyze comments that were never before accessible.
Although all the news with HANA seemed positively glowing, news on the leadership and BI front in particular was a bit more fractured. Last month, SAP CTO Vishal Sikka abruptly resigned, amid speculation that he was frustrated that the position of CEO was not in his future.
In the BI space, two key SAP people -- Adam Binnie and Jason Rose -- also recently moved on. Insiders say the timing is coincidental. Outsiders worry about the impact on the BI road map for a product line that is one of the most complex in the industry. McDermott's vision for simplicity has a long way to go in BI.
Jayne Landry, newly appointed General Manager of BI and taking over from Binnie, outlined the BI road map. She conceded that for most BI segments, there are two, sometimes three, products (visual discovery includes Explorer and Lumira; dashboards includes Design Studio and Dashboards aka Xcelsius). The vision to simplify the product line was clear; the execution of how and when to get there was anything but. 
Dashboard users were assured there would eventually be a migration utility to Design Studio, the strategic product, and in the interim, were told to check out a product from partner APOS. Landry conceded that killing Desktop Intelligence was a mistake. At issue is how to support existing investments while focusing resources on moving forward. In empathy for SAP, who could have predicted that Apple would kill Flash, a technology on which Dashboards is based?
Long-time customers have been rightfully worried that SAP cares more about newer products HANA and Lumira than about the mature and broadly deployed SAP BusinessObjects. Judging by the headlines and excitement about these newer products, they might be right to worry. However, Landry shared a pie chart describing the company's three areas of analytics: enterprise BI, agile analytics, and advanced analytics.
Category
Main Products
Developers
Release Cycle
Enterprise BI
SAP BusinessObjects
Crystal Reports
Dashboards
Design Studio
600
6 to 12 months
Agile Analytics
Lumira
Explorer
200
6 to 8 weeks
Advanced Analytics
Infinite Insight (KXEN)
Predictive Analysis
100
Enterprise BI then is clearly getting the lion's share of development resources but is indeed on a slower release cycle. Speaker Ty Miller, senior director of BI, likened Lumira to the shiny new Tesla -- innovative engineering, new technology, disruptive -- while the BI Platform is like the tried-and-true Porsche. It's an apt analogy.
It would seem, thenm that it's not that SAP cares more about Lumira but rather that there's more frequent news as it's on such a rapid release cycle. To that end, the noteworthy new features in version 17 (due out this month) includes the following (BIScorecard.com has a detailed review here):
  • InfoGraphics, an evolution to Stories that combines visualizations, with text and images, and a greater degree of formatting
  • Direct connect to on-premises HANA and BW data from Lumira Cloud
  • Support for MAC, with the beta available mid-June
Customer Daimler Trucks gave a great demo of their Lumira application for dealers.
Also around the corner is a new release of Mobile, in which WebIntelligence reports can be filtered with a tap, even in offline mode. Users can save those filters to create a customized view of their reports. Offline users can now set an option so that reports are automatically refreshed when connected, ideal for sales people who need to be sure they have the latest content cached.
Stated improvements to the BI platform (but with no specific timeframe) include:
  • Free hand SQL. This is huge, but as I've heard it as a road map item for a couple of years now, I won't hold my breath until I see the beta.
  • Live Office support for newer universes created in 4.x (.UNX); this has been a hole in the product portfolio since version 4 was first released in 2012. Microsoft added support for universes in Power Query last month, a partial solution for long-time Live Office users.
  • Parity in the DHTML and Java client WebI interfaces.
SAP seems to be at a crossroads in the BI and business analytics space; trying to innovate at the pace of smaller, more nimble competitors such as Tableau and Qlik while serving and enhancing the bread-and-butter enterprise BI and ERP customers. HANA, HANA Enterprise Cloud, Business Suite on HANA, and Simple Finance are the bigger ticket items and next generation platforms that have been a mainstay of SAP for over 40 years. Straddling both worlds is not easy.