How can we use Anaplan to reduce cell sparsity from 360 million, to only 5 million cells?
It is a very common scenario where our clients need to be able to load transaction level data into Anaplan. In most cases this is due to bottom-up calculations, external system restrictions or low-level allocations/mappings. Large volumes of data significantly impact Anaplan model size and can even affect the performance, therefore we would like to share a couple of design techniques that will make the model efficient.
Staging lists
The requirement to import transaction level data doesn’t mean such high level of details has to be permanently stored within the model. In the vast majority of cases the main driver for high volume data loads are low level allocations and mappings between multiple data sources, but after the initial processing the details are no longer required. In such a scenario, using a staging list approach is the best design solution.
Staging lists process:
1. The data is initially loaded (via user action or automated process) into the model
2. System validates the data and runs low level calculations
3. User reviews results and validations (optional backup export can be enabled at this stage as well)
4. User submits the results by running a process (dashboard button)
5. The system aggregates the data using predefined unique key to a level that provides all required attributes and assures optimal size and performance efficiency
6. Source data loaded in p1 is being removed from the model
The design provides high quality bottom-up results and eliminates ineffective workspace utilisation and performance loses.
Multi-level Numbered lists
Multi-dimensional data cubes are the default way in Anaplan to deliver required results, but in some cases these prove to be inefficient or even not possible – this is when numbered list hierarchies prove to be the best solution.
To illustrate the efficiency of the technique we will use an example of a Claims triangle report for the purpose of Solvency II reporting.
In order to build the report, we typically need the following details: Reserving Class, Policy, Risk Code, Year of Account, Development Period. The example volumes are as follows:
– Reserving Cl = 30 items
– Policies = 20,000
– Risk Codes = 80
– YoA = 15
– Dev Period = 15
If all the attribute lists were added to a module it would result in following size for each measure (Res Cl is removed from calculation as it’s a direct parent of policy):
20,000 x 80 x 15 x 15 = 360,000,000
Even this simplified and fairly low volume example results in significant report size.
The key for efficient design is to understand the data – in this example, it is logical that not all the policies will have values for every YoA or development period, let’s assume an average of 7 years/periods per policy, on top of that a single policy usually does not have more than 5 unique risk codes. Following numbered list hierarchy would be created:
– L1 Reserving Class
– L2 Policies
– L3 Risk Codes (unique combination of RCs applicable for each policy)
– L4 YoA (unique combination of valid Policy, RC and YoA)
– L5 Dev Period (unique combination of valid Policy, RC, YoA and Dev Periods)
The total size of a single measure using above mentioned assumptions would be:
30 (Res Cl) + 20,000 * 5 * 7 * 7 = 4,900,030
As we can see the numbered list approach is significantly more efficient and the results can easily be exported in a pivot-friendly format or sent to other external systems.
These two techniques prove to be very powerful when processing large volumes of data and can either be used individually or in conjunction to provide the best solution for our client.