Golden rules for New SAP HANA Developers

I've been developing code in SAP HANA for nearly 3 years now, and at the end of 2013, there is now a huge influx of developers. This is good, because it means that mass adoption is here. But it also means that we see a lot of the same mistakes being made. I did some analysis of the questions asked in HANA development forums and thought I'd pick out the most common mistakes that newbie HANA developers make and give my advice.

The main thing I've noted is that every developer comes to SAP HANA with a set of misconceptions about HANA and a set of experience of some other technology. HANA is a little different to other application platforms and you need to change your thinking. I hope this helps you in your journey.

Never use row-based tables

If you are an Oracle, IBM, or Microsoft person then you are institutionalized into thinking that you put OLTP data in the row-store and OLAP in the column-store. This is not true with SAP HANA! You must create all tables in the column store and that can be used for both transactional and analytic scenarios.

HANA does have a row store and you can create row-oriented tables for some very specific scenarios:

- Transient data like queues, where you insert and delete a lot and the data never persists
- Configuration tables which are never joined, where you select individual entire rows
- When you are advised to by a SAP support personnel

But in general, never used oriented tables, especially for OLTP/Transactional scenarios. HANA is optimized to use the column store for combined transactional and analytical scenarios.

Never create indexes

Again if you come from the traditional RDBMS space, you will see that HANA allows the CREATE INDEX command.

However, when you create a table in HANA (and I'm simplifying), it is in fact creating a set of sorted, compressed and linked indexes. As a result, secondary indexes almost never improve performance. In fact, I've never come across a scenario where an index improved performance of an analytic query, where a large volume of data is aggregated.

There is one scenario when a secondary index can improve performance: when you have a query which selects a very small amount of data from a very large table (or group of joined tables). In this instance, creating a secondary index on all your sort columns can allow HANA to find the data faster. But this is a very specific situation - the simple advice is, never create indexes.

Don't use the SAP HANA Modeler Perspective

HANA has 3 developer perspectives, the SAP HANA Systems View, Modeler, and Developer Perspective. Take the time to read the developer guide and setup the Developer perspective. This will bring you the ability to put all your development artifacts including tables, information views, stored procedures, plus OData and HTML artifacts if you need them. You get change management, version management, the ability to test inactive objects, code completion and a bunch of other things.

Please don't create models directing in the SAP HANA Content Repository any more.

Don't use SQLScript unless you have to

SAP HANA provides a powerful stored procedure language, but its power is a bad thing for new SAP HANA Developers. It allows you to write very inefficient code which doesn't parallelize.

Most of the scenarios I see on the forums that developers are coding could be better done with Information Views like Attribute Views, Analytic Views and Calculation Views. And in SAP HANA SP07, Information Views are faster than SQLScript in almost every scenario. Plus, Information Views are easier for others to understand, remodel and change.

There are scenarios where you need SQLScript, but it shouldn't be viewed as a general-purpose solution to modeling problems.

If you have to use SQLScript, don't use Cursors or Dynamic SQL

If you are a PL/SQL or T-SQL developer than you will be familiar with Cursors and Dynamic SQL. I see a lot of questions in the forums related to performance problems with these. Avoid using them at all costs - this is detailed in the SQLScript Reference Guide.

There are a number of things that push you out of multi-threaded mode in SQLScript: Local Scalar Variables, Loops, Cursors and Dynamic SQL. All of these constructs will cause you performance problems if you use them, so you need to avoid them.

Especially don't use Dynamic SQL to generate column names for INSERT or SELECT statements! Instead, create an Information View for SELECT statements, or write a Python-based loader to load tables.

In many cases you can change a loop into a single SELECT or INSERT statement. In other cases, you can nest SQLScript procedures to improve performance. I'm thinking that this area needs a blog of its own.

Avoid JOINs on Large Data Volumes

Joins on very large tables (>100m rows) can be inefficient. If you have two large fact tables then never join them - performance will always be a problem.

Instead, you can normalize the data in Analytic Views and create a Union with Constant Values in a Calculation View. Werner Steyn describes this nicely in his Advanced Data Modeling guide. You can expect a very large performance increate for complex queries by using this mechanism.

Consider use of a second monitor

Most laptop screens are poorly adjusted for use of Eclipse - this isn't a HANA Studio-specific problem. The best screen is a 27" 1920x1080 screen like this one from HP. Connect your laptop to a second screen at work for the best development experience. If you have a regular laptop screen that is 1366x768 resolution, you will really struggle to develop well.

Update your HANA environment regularly

This is true for all your HANA components - DB, Studio and Client. There are continuous innovations coming every 6 months with major Service Packs, and smaller revisions every 2 weeks. It's well worth patching frequently to get the best developer experience. Take a read of this blog for more details.

Final Words

This isn't designed to be an exhaustive guide, but rather a compilation of the common mistakes that I see developers making when they're starting out with HANA. If you're new to HANA then please take the time to read them and think about what it means to your developer methodology.

Have I missed any obvious ones?

No comments:

Post a Comment