Grafana Dashboard (Real Time)

By using Kafka mechanics to process data, the efforts already in place by Confluent to do data mapping to a JDBC target can be reused. Small modifications will have to be made to that connector to create timestamp-enabled tables, or a process has to be in place to create the necessary tables. Kafka streams can be used to get data aggregated and put in an output format suitable for TimescaleDB. The streams should use RedCap and ManagementPortal clients to get additional information only available there.

Example implementation of a Kafka Stream in RADAR-backend:  

The implementation time will be determined by familiarity with Java, Kafka and TimescaleDB. Kafka streams will be able to scale to multiple nodes. New Avro schemas are needed for all output data so that it can be processed by the JDBC connector, however those schema’s don’t necessarily have to be published to RADAR-schemas. 

The advantage of this approach is that it is completely integrated with Kafka, and the mapping logic can be separated from the input and output medium. Also, data can be updated in the database as it comes in, leading to a much more in-sync grafana view. Finally, it is less affected by unrelated issues that occur in HDFS or S3. Disadvantages include that if the database gets lost or corrupted, historical data no longer in Kafka cannot easily be reimported. Also, data that has been collected in other projects cannot be imported in this way. New analyses cannot be rerun on historical data either, so if the metrics change often, they cannot be recomputed on past data. The performance characteristics of the approach are partially unknown. From experience with Kafka streams, high-frequent analyses on high-frequent data will take a lot of CPU time, whereas low-frequent analyses take a much lower performance toll.  


Example Cohort level views of RADAR-base data.