Challenges in Building Health Data Platform — Electronic Health Record

Amit Kumar
6 min readJan 1, 2022

Building user health data platform Electronic Health Record on a large scale is extremely challenging. Looking at the current scenario, mostly health data lies in unstructured format like image, pdf and papers. I designed & built a health data platform from scratch, it has added a lot values to the present health system by digitising the health data contents. To begin with, we consider the scale that if required it can easily onboard every person on this planet to our health data platform. During the design and development phase, we encounter several challenges to deal with. I’m listing down few challenges that I have experienced. Here you go:

Health Data Types and its Formats: User health data is generally not in a unified format. In a country like India, health data can be put up in regional languages, so that everyone who can read can have an easy access to the online health system. There can be multiple sources of the health data such as:

  • Doctor and Practitioner’s research notes, surgical notes, consultation and observation reports
  • Pharmaceutical needs
  • Lab diagnostic reports
  • User health survey
  • Device sensory health data, like health data from smartwatches
  • Smart devices monitored health data
  • Insurance health data

So all the above health data will be having different formats.

Health Data Sources: Health data is scattered at many sources, those sources can be like

  • Practitioners, Doctors and Research Labs
  • Patients and Users
  • Pharmacy Stores
  • Health Clinics and Training institutes
  • Hospitals
  • Health Devices like Smartwatches

Receiving health data: There are many channels to receive user health data. For a user, health data may be received from the user itself and from the data source also. For example; doctor consultation notes and report can be received by user and through doctor also. So we may end-up having two copies of the same health data in the system. The health data can be represented in the form of images, pdf, video, audio or in any other formats like XML, CSV, JSON or Text. So there are multiple challenges to deal with the health data if that comes in the different formats (image/pdf/audio/video). For example, extracting health data from media will be a challenge. For an instance, a pharmacy purchase data may be received from user via image or pdf, and the same data may be received via pharmacy though some other channels like; physical data copy or in digital format. It may led to a possibility of data duplicity in the system and extracting the health data from media file will need an authentication to put in the system. As mentioned above, there can be data sources for the user health data such as:

  • Doctor and practitioners can provide user health data through any health domain related app. Doctor can also upload health data report into the system. Each doctor may have different template to file health data reports.
  • Patients/users can also provide health data through health-related apps, similarly the way doctors/practitioners provided health data. Users can also provide health data based on some sort of health survey, like questionnaire or health quiz.

Similarly Pharmacy, Clinic, Hospitals and Research Labs can also provide health data in different formats as explained earlier.

Health data from any electronic health device will be received based on the event handling service model.

While discussing on how to build health data platform on an advanced level, there will be lots of health data into the system and that will be received via many channels.

Health data modelling, storage and query: This is a huge challenge to deal with when it comes to model health data, where health data doesn’t have specific unified format. Dealing with huge health data in different formats and from different channels really need a great sense of data modelling. Data modelling is generally derived from the read and write query patterns. We also need to decide what database would be helpful to handle the scale health data writes. There are many options available to go for DB, like NoSQL or Relational SQL DB but based on our need (CAP), we have to decide what will be the best option for us to go.

OLTP and OLAP query support: There can be two types of queries on health data.

  • OLTP queries: OLTP queries will be specific health data queries on a specific user health data. For example; get all the doctor consultation reports of user A which are taken in last one year.
  • OLAP queries: OLAP queries will be in favour of research and analytical purpose. These queries will be like job queries, for example; get the number of users who purchased a specific medicine in last one year.

Consent management on user health data: Health data is extremely confidential and not everyone is allowed to enquire on a user health data. Infect, to store a user health data we need his/her consent. We also need user/patient consent to work on their health data. The work can be processing of the health data for analytical purpose or to share with the external entities. There must be proper user/patient consents on every health data record stored in the system. User/patient consents must be applicable on the document levels. For example; patient A has 10 health records. Consent can be granted and revoked at any point of time and some consents can be granted for temporary time interval.

PII information security: We are not supposed to store user PII info along with the health data. If we are storing user PII info along with the health data then that should be stored in an encrypted way. There are different ways to encrypt the data but we need to make sure we are safely achieving encryption on PII info and should not breach the security. If PII info is stored in an encrypted way along with the health data, it’s also equally important to filter out the PII info from the query response.

Universal health data compliance: The health data should be in compliance with the universally adopted format FHIR. It’s good to have FHIR compliance health data support so that it can be used across the continents for research study and knowledge sharing.

Building ML & AI intelligent system on top of the health data: Just storing the health data would’t be sufficient. We need to build intelligent system on the top of health data so that it would help to make users/patients health better and better going ahead.

HIPAA, Audit Compliance and DEPA: Health data platform should be HIPPA compliance, there are many rules and regulations imposed from national and international health authorities and from the governments also, there are policies imposed on any health data. Some countries follow stringent measures to secure their health data storage and servers within their countries. There should be proper track of every data write and read from EHR. Health data is immutable, no further update or deletion should be allowed. DEPA is a new approach which is giving people the power to decide how their data can be used.

Duplicate health data flow in the system: When we receive the health data from a data provider through some sort of channel, then we generally don’t get a specific and unique health record id from the data source, for the received health record. We have to generate unique event id for every health data that we get into health data platform. Now it’s very challenging that how to figure out the data duplicity. We may get same data multiple times into EHR, possibilities are there but we don’t have exact rule to figure out the duplicate data. We can control some parts of data duplicity based on some rules, for example; pharmacy purchase data from specific provider can be checked against data duplicity based on the bill id or invoice id, but controlling it fully for the entire health data is still remains a challenge.

Extract of the health data from unstructured data: Most of the data will be coming in different media formats like image, pdf, audio and video. It would be an another challenge to extract the right info from these different formatted data. There is a way if every data provider follow a specific template for each of the different formats but that’s again a challenge to make a mutual agreement with every data providers which is not a small number.

Original source data backup: Original source data would be required to store for life long purpose. There are many reasons we may need original source data, it can be because of audit compliance and Govt policy. But I see there is another very important reason we may need original source data backup, as we are dealing with unstructured data so we may need to align the extracted info with the original source data. This can be used to train the models to extract info from unstructured data.

I’ll explain the way how exactly we encounter these challenges in my next article.

References:

--

--

Amit Kumar

Technical Project Lead in health domain at Reliance Jio.