Inventory roles performance improvements

Cumulocity IoT versions - 10.13 and 10.14 and later

Motivation

Users with a large number of devices or Managed Objects experienced performance issues when making requests where inventory roles were used to authenticate access. The reason for this was the use of post-filtering which meant that for each document retrieved a check was made on the user’s permissions to determine if it could be returned to the user. This check was based on the user’s permissions to the Managed Object ID (MO itself in case of Inventory, “source” in case of Measurement, Events, Alarms, “deviceId” in case of Operations).

To achieve better performance and user experience new algorithms have been introduced, for each API endpoint with a dedicated implementation to ensure optimum performance.

Changes

The following section provides information on the changes that have been made in the various releases of Cumulocity IoT.

10.13 – Measurements, Events & Alarms

Performance improvements have been added to Measurements, Events, and Alarms (MEA) APIs. There are three specific areas of improvement, these are:

  • All requests with a specified “source” in the query parameters are optimized and will achieve optimal performance

  • Requests for Alarms or Events where there is specified “source” and “withSourceAssets”/”withSourceDevices” are partially optimized (see Examples for details). These are typically requests for alarms for the group and all devices in the group hierarchy.

  • All requests for which the number of documents matching the given criteria is below a certain threshold (2000 by default) are optimized. The overhead depends on the number of matching documents because each document needs to be checked against the user’s inventory role permissions.
    For example:
    – Given 20 different active Alarms in the system and the user has inventory-role access to only 5 of them, the response to a request for all active Alarms will be very fast (the platform needs to check only 20 alarms).
    – Given 500 different active Alarms in the system, and the user has inventory-role access to only 5 of them, the response will be much slower as all 500 need to be checked for access permission.

The focus for the optimizations is to reduce the number of Managed Objects that need to be traversed to check the authorization. The algorithms are based on a few heuristics, when the source is specified (first bullet point above) only a single source is checked and no post-filtering is required; when limited documents are returned (third bullet point above) post-filtering has been optimized.

10.14 - Inventory

Performance improvements have been delivered for the Inventory API. In order to provide these improvements the data held in the Operational Store requires restructuring. Multi-level parent/child relationships are persisted, thus dramatically improving the query over these structures.

For any request, including those with no filter, the new implementation returns all Managed Objects (MOs) which match the filters and meet any of the following conditions:

  • are ancestors of groups/devices on which the inventory role is defined,

  • are owned by the user,

  • have “c8y_Global“ fragment (these are meant to be visible to all users, even without INVENTORY_READ access).

10.16 - Operations

Performance improvements have been added to the Operations API

  • Analogically to the MEA improvements with “source”, all requests with a specified “deviceId” or “agentId” parameter are always optimized by using a direct database query for the target managed object.

  • Also analogically to the MEA improvements, the new algorithm is used for requests where the number of documents matching the query criteria is less than 2000 (default).

Paging User Experience

For all the improvements across all the various releases users will see a different paging experience. The new approach will return consecutive pages, for instance: 1, 2, 3, … whilst in previous releases page numbers reflected internal values used by the underlying post-filtering algorithm. This meant it was possible, for example, that the next page after 1 was 12001, or the previous page before 1300 was –1299, or indeed that some pages were empty because the limit of scanned documents was reached before any items were found.

In all instances, navigation links via “prev” and “next” will work properly and this should be the only way of iterating through multiple pages using inventory roles and retrieving all documents matching criteria.

Switching Performance Algorithms

This section describes how to enable these new performance improvements.

From 10.13 and subsequent releases there is a new platform property:

acl.algorithm-version=LEGACY # Possible values are: LEGACY, OPTIMIZED

The default values for this property are:

  • 10.13 - LEGACY

  • 10.14 - LEGACY

  • 10.15 - LEGACY

  • 10.16 and later - OPTIMZED

This property can only be configured by the platform operators, but it is possible to change the property on a per-tenant basis using Tenant Option. In this case, you need to send the following request:

POST /tenant/options
Content-Type: application/json 
{ 
"category": "configuration", 
"key": "acl.algorithm-version", 
"value": "OPTIMIZED" 
}

Per API optimization – 10.14

There is an additional property to exclude some APIs from OPTIMIZED. It is designed as an emergency fallback to the old algorithm in case an issue is experienced in the new solution for a given API. The property is:

acl.algorithm.optimized.disabled.apis=MANAGED_OBJECT

it takes a single API or multiple APIs separated by commas. Possible values are:

  • ALARM - for device alarms API

  • AUDIT - for audit records

  • EVENT - for device event API

  • MANAGED_OBJECT - for inventory API, means example for devices and groups over

  • MEASUREMENT - for device measurements

  • OPERATION - for device control operations.

For this property, there is a corresponding tenant option in the “configuration” category to configure the value on a tenant level which can be set as follows:

POST /tenant/options
Content-Type: application/json 
{ 
"category": "configuration", 
"key": "acl.algorithm.optimized.disabled.apis", 
"value": "EVENT,ALARM" 
}

Inventory optimization – 10.14

For the Inventory API, in addition to setting on acl.algorithm-version you need to also turn on the “inventory hierarchy” feature. This inventory hierarchy property, when set to true, will recalculate the hierarchy in the background. The new hierarchy enables direct checks for parent-child relationships (or descendant-ancestor) using indexed DB queries, which allows the system to do filtering based on inventory-role groups assignment directly on the database.

Important Note The hierarchy recalculation takes time, which may be hours when the inventory count is high.

The property, which turns on the inventory-hierarchy feature, is a boolean:

inventory.hierarchy.enabled=false

This parameter can be overridden on Enterprise tenant level using Tenant Option (category: configuration, key: inventory.hierarchy.enabled)

Note: all changes in “configuration” options are automatically reflected in Audit logs.

Limitations

Inventory roles with Fragments

For users who have an inventory role based on a fragment type (misleading name “Type”), there is no improvement even when the algorithm version is set to OPTIMIZED. If inventory role is defined as e.g.

{ 
"API": "Events", 
"Permission": "READ", 
"Type": "MyEventFragment" 
}

then any user with this inventory role will not use the optimized algorithms when requesting Events.
Note: Filtering by fragments prevented the efficient implementation in the new solution.

New Algorithm not matched

Where a request for which none of the new algorithms is matched, the system will fall back to the previous generic algorithm. For MEAs, these are:

  • No “source” specified in request query parameters, and/or

  • The number of all documents matching the query parameters is higher than the defined threshold (2000 by default)

For Operations, these are:

  • No “deviceId” or “agentId” specified in request query parameters, and/or

  • The number of all documents matching the query parameters is higher than the defined threshold (2000 by default)

How to meet the new algorithm criteria

If you have set the appropriate properties to use the new algorithms and are not experiencing the performance improvements you expect you can do the following. Ensure you can either specify the device for which you need data (this will meet criteria (a) above) or create a more specific query using parameters such as time range, type, fragment type, etc. (this will meet criteria (b) above).

Examples

Setup

All tests were performed on the tenant in a staging environment with the following setup:

  • 2 top-level groups

    • 2 subgroups each (4 total subgroups)

      • 4 sub-subgroups each (16 total sub-subgroups)

        • 4000 devices in each sub-subgroups

64000 total devices

Users with access to part of resources (1 top-level group or 1 sub-subgroup).

All attached results were taken from browser dev-tools, an optimized algorithm was turned on/off with the tenant option, and navigation in UI was repeated.

Measurements, Events & Alarms (MEA)

Single source algorithms, it is used for example on the device details page

Alarm for device overview

  • OPTIMIZED

    • Alarms

      Optimized alarm load times

    • Events

      Optimized event load times

  • LEGACY

    • Alarms

      Legacy alarm load times

    • Events

      Legacy event load times

Example requests:

GET /alarm/alarms?dateFrom=1970-01-01 
&dateTo=2022-10-11T21:08:15+02:00 
&pageSize=10 
&query=$orderby=severity+asc,time.date+desc,text+asc 
&resolved=false 
&severity=WARNING 
&source=254 
&withSourceAssets=true 
&withSourceDevices=true
GET /event/events?dateFrom=1970-01-01 
&dateTo=2022-10-11T21:06:28+02:00 
&pageSize=50 
&source=254 
&withSourceAssets=true 
&withSourceDevices=true

Limited count algorithm, used for example in “Alarms” page in Cockpit

Alarm overview

  • OPTIMIZED ( 8 active alarms)

    Optimized alarm overview load times

  • LEGACY (same 8 active alarms)

    Legacy alarm overview load times

Example request:

GET /alarm/alarms?dateFrom=1970-01-01 
&dateTo=2022-10-11T21:15:41+02:00 
&pageSize=10 
&resolved=false 
&severity=CRITICAL

Source and withSourceAssets/withSourceDevices for big hierarchy (Group_1 has 32K children). Sample Dashboards with all alarms and events from a group:

Sample Dashboards with all alarms and events from a group

  • OPTIMIZED

    Optimized load times for events and alarms

  • LEGACY

    Legacy load times for events and alarms

Requests are around 2-3 times faster, but the group hierarchy is still traversed once.
This scenario may be improved in the future with the use of inventory Materialized Path.

Example request

GET /event/events?dateFrom=1970-01-01 
&dateTo=2022-10-11T21:21:05+02:00 
&pageSize=10 
&source=232 
&withSourceAssets=true 
&withSourceDevices=true

Inventory - “All devices” page in Device Management

image

  • LEGACY

    • user with MANAGED_OBJECT_READ role for Group1 (optimistic situation)

      Legacy all devices load time - optimistic

    • user with MANAGED_OBJECT_READ role for sub-sub_Group_2_1_2 (pessimistic, 2-3 minutes in total because of paging)

      Legacy all devices load time - pesimistic

  • OPTIMIZED (similar for both users)

    Load times of all devices should scale linearly

Example requests:

GET /inventory/managedObjects?q= 
&pageSize=10 
&currentPage=1 
&withChildren=false 
&withTotalPages=true 
&withParents=true
GET /inventory/managedObjects?q= 
&pageSize=1 
&currentPage=1 
&withChildren=false 
&withTotalPages=true

Read full topic