HIV Case-Based Surveillance
0.2.0 - CI Build International flag

HIV Case-Based Surveillance, published by Jembi Health Systems. This is not an authorized publication; it is the continuous build for version 0.2.0). This version is based on the current content of https://github.com/openhie/HIV-CBS and changes regularly. See the Directory of published versions

Testing

Introduction

Testing data for accuracy and correctness is essential for any business who relies on quality data for the purpose of decision making. This is even more critical in healthcare as data is very commonly used to monitor the health status of a client and used to make decisions in regards to diagnosis and treatment plans etc.

For this reason, using an established automated testing framework can be highly beneficial to ensure data quality.

In the context of DISI's reference platform architecture, a Central Data Repository (CDR) Testing Framework has been developed as the testing tool to support quality assurance, end-to-end. The CDR testing framework is a custom developed package developed by Jembi Health Systems that sits on top of Cucumber and Gherkin's automation engine.

For more information in regards to the setup, deployment & use of the CDR testing framework, please consult the Developer & Tester Guide.

The following sections offer a brief overview and background into what an automated testing framework can look like to support data quality in health systems.

Testing Approach

Tests can be carried out in the following ways.

  1. Automation
    • End-to-end testing
      1. To ensure that data is moving along the pipeline from the point of service (PoS) system to the analytics platform in a timeous fashion and that both target systems (Central Data Repository & Client Registry) represent data that is accurate and correct. The key objective of this test is to ensure the availability of data and data quality.
      2. Methodology: By using the CDR Automated Testing Framework in conjunction with the predefined input and expected outcome datasets, testers are able to prove that the system is processing data correctly, flattening data for analytical purposes and displaying information that is accurate and correct for each client and their encounters. In particular, the testing framework will ensure that data displayed in each case report meets the needs documented in each report requirements specification.

        • The CDR testing framework operates on a transactional basis meaning that bulk submissions to the CDR are not possible.
  2. Manual Testing
    • Methodology: With the availability of the EMR and assuming that it is stable, the tester will be able to transact with the DISI's reference platform architecture and perform tests against each component to compare outcomes as per the specification documentation and further measure data quality. Should the EMR not be available or unstable, Postman will be used as the tool to support the overall testing process. This area of testing will require that the tester explores each architectural component directly to understand its core capabilities and limitations.

Overview

The Central Data Repository (CDR) Testing Framework is an automation tool designed to assist with report data accuracy as well as test the data pipeline end-to-end. The CDR testing framework is built on top of Cucumber, which is a general automation testing framework but also comes packaged with more specific custom developed modules which are used to query input and expected outcome datasets to assist with the measurement of data quality in the analytics platform. The CDR testing framework implements a modular design which will enable analysts, testers and developers to quite easily build new report modules to efficiently execute on-demand and regression testing processes against the data pipeline.

The illustration depicted as a High-level Design Architecture offers an overview of what the CDR testing framework modules should look like.

The illustration depicted as a Component Architecture is an example of what the type of components could look like to support end-to-end testing.

High-level Design Architecture
High-level Design Architecture


Component Architecture
Component Architecture

Input & Expected Outcome Dataset

In order for the CDR Testing Framework to be considered successful in terms of end-to-end automation testing, the framework must be able to not only submit input test data to the CDR using Postman but also be able to query the analytics platform to verify whether the input data that was submitted was also successfully flattened and stored by the analytics platform. Furthermore, the framework must also be able to check each and every element of the patient record to ensure that the value that is stored matches the documented expected outcome data for the patient.

For the purpose of streamlined data management activities, the input and expected outcome datasets can be centrally hosted as Google Sheets. The CDR Testing Framework should then fetch data from both datasets and use it during data assertions.

Input Data

This is the set of data that will be submitted to the CDR to mimic events at a given facility. The input dataset must be defined using static data to ensure that the expected outcome data values marry up with what was submitted to the CDR.

Expected Outcome Data

This is the set of data that will govern the quality and correctness of data at rest in the analytics platform. In other words, the expected outcome dataset only contains patient records that must be reported on and has data values specified that correspond with the data in the input dataset and any report specification conditional logic. The expected outcome dataset is a static and final outcome which the CDR testing framework will expect to see in the analytics platform. If the testing framework detects a value in the analytics platform that does not correspond with the value specified for the same data element in the expected outcome dataset, the testing framework must fail that test case and immediately halt any further testing.

An expected outcome dataset may have data defined for the following types of reports.

  1. Line Listing Tabular Reports
  2. Aggregated Tabular Reports
  3. Charts

Summary of Benefits

  • Able to run tests rapidly using accurate input data
  • Able to update the input and expected outcome data in the test cases with a single click of a button - data managed solely in the input and expected outcome documents
  • Uses behavioral-driven development - non technical stakeholders can understand the test cases
  • Can be configured to use any middleware component or even transact directly with the analytics system (API)
  • Input and expected outcome datasets are Google Sheet documents in Google Drive - Implements a Google Service account
  • Able to see point of failures on a system level or patient record
  • Able to see point of failures per patient record resulting from incorrect field-level values
  • Supports a modular design approach - easily build custom reports using any preferred techniques and simply reference the automation framework engine to do all the work
  • Tests the pipeline end-to-end
  • Contributes to performance testing over the pipeline
  • Can be included into the project's continuous integration (CI) processes so that report data quality is ensured with each build before merging into master.