Data Integration Options with Cumulocity IoT
Introduction
The core of Cumulocity IoT is device integration but with pure device data only a few use cases can be implemented such as Device Management or Condition Monitoring. The more comprehensive use cases rely on additional data which resides in other databases, systems or services. Only when combining device data with other data like master data or production data the real value of IoT can be unleashed!
In this article I will give you an overview about the options you have to integrate different kind of data with Cumulocity IoT to implement high value IoT solutions.
For each option I will give an overview about the architecture and use cases which can be implemented.
Let’s get started!
The power of (IoT) data
When building an IoT solution the most earliest and mandatory step is the device connectivity to retrieve device data. There are two main needs and therefor different kind of data in IoT solutions:
The need to get near real-time insights of what’s going on or when you have to act → live device data (hot storage)
The need to detect patterns and explore correlation of device data to adapt the solution accordingly → long-term device data (cold storage)
While for the live device data you need dashboard like visualization and streaming analytics capabilities, you need different capabilities for long-term device data. Most likely they are data exploration tools, pattern recognition and training of machine learning models.
Both device data variants are supported by the capabilities of Cumulocity IoT.
Again, having access to device data is a very good starting point for an IoT solution but there is most likely more data in your (or other) companies that needs to be integrated in your IoT solution.
To name the most common used data:
Master Data - Any kind of materials, customers or suppliers you have in your systems and must be correlated to device data.
Transactional Data - Any kind of processes, production orders or customer orders you have in your systems and must be correlated to device data.
Other Data - Any kind of 3rd Party data which could be environmental data or geolocation data which should be correlated to device data.
What they have in common is that they reside in different kind of systems or databases most likely offering different kind of APIs you need to integrate. The way the data flows is also very use case dependent. Sometimes the device data is integrated to the 3rd Party Systems (e.g. Ticketing solutions) and sometimes it is the other way around (e.g. Production Order data which should be visualized in Cumulocity IoT). Also sometimes it makes sense to replicate data sometimes you just need additional data for visualization or analytics/calculations. All ways are valid and must be supported!
Being most flexible and fast in regards of data integration is key for an IoT solution. You don’t want to spend much time building complex integrations or even workarounds.
Let’s have a view about the options & different kind of architectures you have!
Data Integration Options
Device Integration
Let’s start with the obvious one from an IoT standpoint: Device Integration.
Supporting device integration by an IoT platform is mandatory. With Cumulocity IoT you have multiple ways how you can integrate devices. On a high level perspective there are three options:
Connecting a device directly to one of the core protocols LWM2M, REST or MQTT. Meaning the device firmware contains the connectivity and mapping part to Cumulocity endpoints and domain model, called device agent. Those devices are called “smart” or “connected” devices which directly can communicate with an IP-Protocol like TCP (MQTT or REST). For LWM2M this is not Cumulocity specific as the data model is part of the protocol (instead of MQTT or REST) and can be used across all other systems supporting LWM2M not only Cumulocity IoT.
Connecting a device directly to a server side agent. Meaning the device can handle the IP-Protocol like TCP but supports any proprietary data models which is not supported out of the box by Cumulocity IoT and therefor must be mapped to the domain model. The server side agent can run inside or outside of Cumulocity IoT which is called in the picture above a Backend Agent (outside) or Server Side Agent (inside). This option is often used for network providers like LPWAN etc.
Connecting a device using any kind of gateway. Meaning the device either don’t support any IP-Protocol (BLE, ZigBee etc.) or is behind a firewall (production network etc.) with no access to Cumulocity IoT running on an edge or the cloud. Also the data model most likely must be mapped to the domain model of Cumulocity IoT.
For that purpose Cumulocity IoT offers the thin-edge.io which is an edge software running on such limited hardware addressing the need of connecting other devices and mapping data to Cumulocity IoT.
Device Integration is a complex topic and of course this is just a rough overview to integrate devices. I will address this need in another blog post where I will describe in detail what are the steps & process to get your device integrated.
Supported use cases
With device data only you can already implement a lot of common IoT use cases. Here is a list of some examples with not intention of completeness:
Device Management - Manage your devices including Software, Firmware & Configuration Management
Device Connectivity Management - Manage the availability of your device and act if something suspicious is happening
Condition Monitoring - Retrieve time series data and events to monitor the condition of devices or machines the devices are attached to.
Overall Equipment Effectiveness (when machine data is provided and existing) - Monitor and optimize your fleet of machines and production processes.
Data Contextualization
Let’s continue with the most simple data integration after device integration: Data Contextualization
In this option data reside in a 3rd Party System or database and should be written to Cumulocity IoT Database to extend the device data with master or transactional data. The interface to do this is REST. In this option no logic is running at Cumulocity IoT but the 3rd Party System or any other component implements the REST interface of Cumulocity IoT to add context data. In such use cases the context data is replicated in the Cumulocity IoT database and most likely to be used when the frontend of the solution is running within Cumulocity IoT.
It is not commonly used as the data replication of contextual data increases the operational costs of the IoT platform and keeping the data synchronous is quite complex.
A better pattern would be to use Service Integration
Supported use cases
The use case here depend totally on the data that is ingested to Cumulocity IoT. To summarize a few use cases:
Asset Management - Manage your assets and correlate them to device data
Master Data Management Harmonization - Combine multiple master data and device data to harmonize the access and view on data
Production Data Management - Correlate production data and device data to see how efficient your production is.
Data Push Integration
With the data push integration we enable 3rd Party Systems or Message Broker to work on near real-time device data.
With Cumulocity IoT you have 3 options to enable this pattern:
Using Apama Streaming Analytics you can implement a model that listens on device data and pushes the data in any format to a HTTP endpoint
With the pretty new Notification API 2.0 you can implement a client that subscribes on specific kind of device data and get the information reliable pushed when any data is forwarded. Reliable means if the client is down for any reason the messages are queued and when it comes online it gets the messages which were subscribed before. The protocols used between the client and the Notification API is HTTPS WebSocket.
Very similar to option 2 also using the Notification API 2.0 but the client subscribing is implemented within a microservice which forwards the information to any protocol of the broker or 3rd Party System. The microservice can use any outgoing protocol to forward the data like AMQP, REST or MQTT.
Supported use cases
By pushing data we can enable the following use cases:
Alarming - Keep informed via push notification on your mobile
Incident Management - Create incidents in any incident management system
Data Forwarding - Forward data reliable which should not reside for long at Cumulocity IoT.
Data Replication - Copy data in target systems as a duplicate.
Data Synchronization - Keep the data synchronous across systems & services.
High Priority Processes - Any process that listens on a event of device data e.g. maintenance process, shipping process etc.
Data Pull Integration
The opposite of push is… pull, correct
Let’s have a view on how we can pull device data out of Cumulocity IoT.
Pulling always needs a external trigger like a step in the process or a user requesting data. Also periodical pulling can be implementing by having a scheduler pulling data each hour or day.
Cumulocity IoT provides a comprehensive REST API to not only ingest data but also to retrieve it. It can be used to pull data out of Cumulocity IoT in two options:
Having any REST Client running somewhere implementing the Cumulocity REST API.
Having a microservice implemented providing custom REST endpoints for data retrieval by any REST client implementing the custom endpoints.
Especially option 2 is commonly used if the REST Client of the 3rd Party system cannot be adapted easily and have to stick to any protocol specification. A typical example of many for this is the ITSS protocol specification. Option 1 is commonly used if a custom client can be implemented and deployed in any 3rd Party system.
Supported use cases
The use cases of pulling are limited as the trigger of retrieving the data varies:
Data Retrieval - Fetch data on request to be visualized or processed
Data Replication - Copy data in target system as a duplicate
Data Synchronization - Keep the data in sync across systems
Reporting - Fetch historical data for reporting in 3rd Party systems
Data Offloading
We have discovered data contextualization, push and pull so far. With data offloading another often used option will be described here.
Data Offloading is very similar to data push or data pull but add much more functionality on top. I will come to this later. While data pull and data push most likely need to have a client or microservice developed and deployed somewhere data offloading can be achieved by non-coding approach using the DataHub of Cumulocity IoT.
The main purpose of the DataHub is to, you guessed it, offload data out of Cumulocity IoT to a data lake. There is one big difference to a simple data pull or push: Data is normalized and stored in parquet files. This also enables the access of the data using any SQL supporting tools like BI Reporting Tools.
The DataHub itself follows a zero-code approach: It comes with an UI which enables you to configure your offloading jobs and to configure your target data lake.
Once configured your data will be periodically offloaded to the configured data lake.
The Machine Learning Workbench can make use of the data offloaded via DataHub to train ML models.
Supported use cases
Data Offloading - obviously, periodically replicate data to a data lake
Data Replication - Copy data in target system as a duplicate
Data Exploration - By having historical data in a normalized way you can use BI Tools to perform data exploration
Machine Learning - Use the offloaded data to train ML models
Trend Analysis - Detect and visualize trends in offloaded data
Service Integration
Very often data must not be pulled, pushed or offloaded but the data should reside where it is currently stored but access to that data for visualization or calculation purpose must be provided. I call this the Service Integration or Data Stream Integration.
For an IoT solution UI you not access data of one single database but of many systems. With the Data Contextualization I introduced one option how you can solve this by replicating the data to the device database within Cumulocity IoT. As I stated there, this is not the preferred way. Most likely you just want to access heterogenous data sources but not want to replicate/store any data. This can be achieved by using provided APIs to fetch & correlate data on request or as I call it, to stream data. So if an user clicks on an UI or a streaming analytics rule is triggered the data is fetched and directly available for presentation or processing.
This option should be preferred when having single services to be integrated. When you have multiple heterogenous services with complex APIs I would prefer to go for System Data Integration
Supported use cases
Enhancing functionality - Add new functionality to Cumulocity IoT (e.g. Location Services, Weather Services etc.)
Service Integration - Integrate any other service like PKI to issue certificates and forward them to devices
System Integration
At last the option with the most powerful and flexible way to integrate data. I call it System Data Integration
In focus are complex systems like ERPs, CRMs, Cloud SaaS services like Incident Management etc. What they all have in common is that they provide APIs how you can access and fetch data, similar to Cumulocity IoT. Now you can go for the Service Integration option implementing for each API a microservice to correlate data. On the other hand you need to implement and maintain such microservices which gets complicated when the API is changing or you adapting your IoT solution. There is a pretty nice solution for that: webMethods.io Integration which is an iPaaS (Integration Platform as a Service).
It acts as a central cloud component and comes with tons of pre-build connectors to the most common Cloud Systems and Cloud Services as well as a connector for Cumulocity IoT.
Via a graphical editor you can define your workflows and transform & map data very easily. When you’re done you just deploy the workflow and your integration is done.
Cumulocity IoT Integration can be used as a trigger using the real-time API of Cumulocity IoT (listening on an incoming event) or a connector using the REST API of Cumulocity IoT (fetch or ingest data).
Supported use cases
Most of the options mentioned above can be implemented by using webMethods.io Integration except the one using dedicated services like Data Offloading using the DataHub and Device Integration of course.
iPaaS - Integrate & transform data
Hybrid Integration - Enable on-prem and cloud integration
Data Forwarding - Forward data reliable which should not reside for long at Cumulocity IoT.
Data Ingest - Ingest any kind of data
Data Retrieval - Fetch data on request to be visualized or processed
Data Synchronization - Keep the data in sync across systems
High Priority Processes - Any process that listens on a event of device data e.g. maintenance process, shipping process etc.
Dynamic & flexible integration of systems - No-code approach
Solution Accelerators - Accelerate solution development by re-using prebuild connectors.
Conclusion
I hope I could bring some lights into data integration options you have with Cumulocity IoT. It is often quite complex and it really depends on multiple factors which options matches best to your requirements.
The System Integration is the most powerful option you currently have but also comes with a additional product in your solution architecture. If you want to start small and use it for a very few and less integration it is like using a sledgehammer to crack a nut. If you migrating a big & complex IoT solution to Cumulocity IoT using this option has many benefits.
I’m interested in feedback and your use cases you’ve implemented. What options did you use in your IoT projects? Did I miss something that needs to be added?
What kind of data & systems do you usually integrate? Feel free to reply in the comments or write me a PM.
Next Steps & References
If you are interested in exploring the data integration you register yourself for a Cumulocity IoT or webmethods.io Integration trial
For further reads you can follow the Cumulocity documentation or webmethods.io Integration documentation