Database Credentialed Access

Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center

Senjun Jin Lin Chen Kun Chen Zhongheng Zhang

Published: Jan. 19, 2023. Version: 1.0


When using this resource, please cite: (show more options)
Jin, S., Chen, L., Chen, K., & Zhang, Z. (2023). Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center (version 1.0). PhysioNet. https://doi.org/10.13026/901c-yv54.

Additionally, please cite the original publication:

Senjun J, Lin C, Kun C, Hu C, Hu S, and Zhongheng Z. Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center. https://doi.org/10.1038/s41597-023-01952-3. Sci Data (2023).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

The medical specialty of critical care, or intensive care, provides emergency medical care to patients suffering from life-threatening complications and injuries. The medical specialty is featured by the generation of a huge amount of high-granularity data in routine practice. The data comprise hourly vital signs, ventilator waveforms, medical orders, and laboratory results. Currently, these data are well archived in the hospital information system for the primary purpose of routine clinical practice. However, data scientists have noticed that in-depth mining of such big data may provide insights into the pathophysiology of underlying diseases and healthcare practices. Clinical questions related to risk factors, predictive analytics, cost-effectiveness, and causal inference can be addressed with the critical care database. There have been several openly accessible critical care databases being established, which have generated hundreds of scientific outputs published scientific journals. However, such work is still in its infancy in China. China has a large patient population, which contributes to the generation of a large healthcare database in hospitals. The establishment and sharing of such a database can help to promote the open data science and discover novel scientific knowledge in a collaborative way. In this data descriptor article, we report the establishment of an openly accessible critical care database generated from hospital information system. The database comprises 8180 unique hospital admissions for 7638 individual patients from January 2012 to May 2022. The database contains 11 plain text relational tables that can be linked by hospital_ID.


Background

Critically ill patients managed in the intensive care unit (ICU) are usually monitored closely for organ dysfunctions, and are treated intensively by a variety of supportive modalities [1,2]. Vital signs, laboratory tests and medical treatments were adjusted at a higher frequency than those treated in the general ward. Such daily intensive management will produce huge amount of information including medical orders, imaging studies, laboratory findings and waveform signals. The data generation mechanisms may reflect key factors related to the healthcare system, pathophysiology of underlying disease, and patients’ preferences and cultures [3]. Thus, in-depth data mining of such large databases, such as risk factor analysis, predictive analytics and causal inference [4,5], can provide more insights into clinical research questions. More knowledge or wisdoms can be obtained from data mining, and the translation of the knowledge into clinical practice may potentially improve clinical outcomes [6,7].

Most published scientific reports do not make their original raw data freely accessible in current critical care research community, partly attributable to the confidentiality issues. The unwillingness to share data makes it difficult to reproduce the reported results. Furthermore, exploration of such large database from a single research group could be biased and limited. Thus, strenuous efforts have been made to encourage the scientific community to share their raw data, which is also supported by the open data campaign [8,9]. Several openly accessible critical care databases have been established, mainly reflecting the healthcare systems of western countries [10-12]. China is a large country with huge number of patient population, with special hospital information systems that are distinct from those from western countries. However, hospital information systems in Chinese hospitals are mainly used for clinical practice and are far less developed for research purposes. Data sharing is still in its infancy in the Chinese critical care community, which significantly impairs the transparency of scientific work and international collaborations. To the best of our knowledge, there are two critical care databases being established in China which focus on pediatric critically ill patients and those with infections[13,14]. Here, we reported the establishment of a large critical care database comprising high-granularity data generated from the information system of a tertiary care university hospital. Details of the database are reported in the paper to encourage new researches through secondary analysis of the database.


Methods

Study setting and population

The study was conducted in Zhejiang Provincial People's Hospital, Zhejiang, China from January 2012 to May 2022. All patients admitted to the ICU of the hospital were eligible. There were two ICUs in the hospital: one was the comprehensive central ICU and the other was the emergency ICU (EICU). There was no exclusion criterion in enrolling subjects because we believed that patients who were excluded by a particular study might be eligible for another study. Thus, we included all records in the information system related to ICU stays. The study was approved by the ethics committee of Zhejiang Provincial People's Hospital (approval number: QT2022185). Informed consent was waived as determined by the institutional review board, due to the retrospective design of the study. The study was conducted in accordance with the Declaration of Helsinki.

Database structure and development

The database is distributed as comma separated value (CSV) files that can be imported to any relational database system. We recommend the R package tidyverse for the management of the relational database because of its capability to streamline the workflow from data management to statistical analysis and to the training of machine learning models[15]. Each file contains a single table which will be further explained in the subsequent sections. For large files, we recommend the data.table package to process the tabular data.

Each individual subject can be identified by a series number (patient_SN) with the combination of digits and letters such as “3c74cf74c36241b7082ec35e458279dc”. Each unit hospital stay is denoted by a Hospital_ID with examples such as "ZY|360812" and "IP|20190500469". The unique ICU stay can be identified by the HospitalTransfer table, which contains intrahospital transfer events for the subjects. All tables use Hospital_ID to identify an individual hospital stay, and the HospitalTransfer table can be used to determine ICU stays linked to the same patient and/or hospitalization.

Deidentification

All tables are deidentified according to the Health Insurance Portability and Accountability Act (HIPAA). All protected information are removed including addresses, date of birth, date of hospital admission, date of discharge, date of medical order, personal numbers (e.g. phone, social security and hospital number), exact age on admission. When creating the dataset, patients were randomly assigned a unique identifier (patient_SN and hospital_ID) and the original hospital identifiers were not retained. As a result, the identifiers in the database cannot be linked back to the original, identifiable data. All doctor/nurse/pharmacist identifiers have also been removed to protect the privacy of contributing providers. The dates in the free text were replaced with asterisks. Other information that may help to identify individual patients such as names, IDs, addresses, phone number, dates were removed or replaced with asterisk symbols. 


Data Description

The database comprises 8180 unique hospital admissions for 7638 individual patients from January 2012 to May 2022. Table 1 shows the baseline demographics of hospital admissions. There are 2965 female and 5215 male patients in the dataset. The length of hospital day was 17 days (Q1 to Q3: 10 to 28). Male patients showed slightly longer hospital stay.

Table 1 Demographics and discharge status of the 8180 hospital admissions in the database.

Variables

Total (n = 8180)

Female (n = 2965)

Male (n = 5215)

p

Age, n (%)

 

 

 

< 0.001

(0,18]

35 (0)

14 (0)

21 (0)

 

(18,45]

1012 (12)

339 (11)

673 (13)

 

(45,65]

2609 (32)

859 (29)

1750 (34)

 

(65,75]

1952 (24)

709 (24)

1243 (24)

 

(75,90]

2044 (25)

836 (28)

1208 (23)

 

(90,150]

528 (6)

208 (7)

320 (6)

 

Days Hospital Stay, Median (Q1, Q3)

17 (10, 28)

16 (10, 26)

18 (10, 28)

< 0.001

Status On Discharge, n (%)

 

 

 

0.901

Cured

5666 (73)

2050 (73)

3616 (73)

 

Dead

438 (6)

157 (6)

281 (6)

 

Not cured

1202 (16)

437 (16)

765 (15)

 

Unknown

444 (6)

153 (5)

291 (6)

 

The number of hospital admissions for ICU patients increased remarkably after the year 2018 because the of the expansion of bed number in this year for both comprehensive ICU and emergency ICU (Figure 1). The distributions of hospital length of stay are shown in figure 2, restricting to patients with LOS < 60 days.

Classes of data

The data are organized into tables. There are a total of 10 tables comprising patient demographic data, medical order, laboratory findings, image studies, microbiology and hospital transfer events (Table 2). We will provide more details of each individual table to promote the reuse of our database.

Table 2. A general description of the tables in the database.

Tables

MD5_hashes

Size (b)

Description

Diagnosis.csv

3b7ca8b430b16d9ebbd1317cb06cc87b

25582236

Diagnosis

DrugSens.csv

2f79976765464593b8eed552221c6359

136436217

Sensitivity to antibiotics for cultured bacteria

EMR.csv

5c6048462b1dc6d44a47687fb34bbc65

13730602

Electronic medical records for each hsopital admission

ExamReport.csv

f771ad05ec45b2b65105744b2c29907f

52649246

Examination report including CT, ultrasound and MRI

HospitalTransfer.csv

168fef171980a7cacbe2ed79d1dd63ba

1441407

intrahospital transfer events

Lab.csv

4939ebb2155bbcfaff37da8e78f8cc4a

1828953993

Laboratory findings

Medication.csv

4e9d16531c6cfb2a7d9aafc21f465e16

277782777

Medication events

MedOrder.csv

297d4e9f5e7e9ba8c3dca5005b8d7684

207348204

Medical order

MicrobiologyCulture.csv

af996f30325ec74eefbf7a25d6680473

39362619

Microbiology cuture

VitalSign.csv

5224b7450833a2ac441d16b459502f32

607364251

Vital signs

Electronic medical record

The EMR.csv table contains data related to each hospital admission (Table 3). The patient_SN is a unique ID for individual patient and Hospital_ID is unique ID for hospital admission. If a patient discharged/died within 24 hours, the data were recorded in a separate table, so there are separate columns describing the chief complain and admission status for those short hospital stays. We provide both English and Chinese descriptions for chief complain. The present history recorded in the Med_history column contains more words, and the original Chinese descriptions are kept, so that some natural language processing algorithms can be applied.

Table 3. variables in the EMR table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

Sex

Sex

ChiefComplain_24hr

Chief Complain about patients who discharged within 24 hours after hospital admission

AdmissionStatus_24hr

Admission Status for patients who discharged within 24 hours after hospital admission

ChiefComplain_24hr_dead

Chief Complain for patients who died within 24 hours after hospital admission

AdmissionStatus_24hr_dead

Admission Status for patients who died within 24 hours after hospital admission

ChiefComplain

Chief Complain in Chinese

Med_history

Medical history in text

PastHistory

Past history/comorbidities

StatusOnDischarge

Status On Discharge

DiagnosisOnDeath

Diagnosis On Death

StatusOnDischarge_Desc

Status On Discharge described in text

DischargeTime

Discharge time relative to hospital admission time as the time zero in days

DaysHospitalStay

Days of Hospital Stay

ChiefComplain_Eng

Chief Complain in English

Age_cut

Age in category

Diagnosis table

The diagnosis table contains information related to diagnosis for a hospital stay (Table 4). The Diagnosis_Desc column provides free text description for the diagnosis. ICD10_code is the code number for the standard ICD code. The information can be well processed with the icd package in R (https://github.com/cran/icd). The functionality of the package includes but not limited to finding comorbidities of patients based on ICD-10 codes, Charlson and Van Walraven score calculations, and comprehensive test suite to increase confidence in accurate processing of ICD codes.

Table 4 variables in the Diagnosis table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

Diagnosis_Desc

Description of diagnosis in free text

ICD10_code

ICD-10 code

ICD10_name

ICD-10 name for the diagnosis

Diagnosis_DateTime

Time for making the diagnosis relative to hospital admission time as the time zero in days

Hospital transfer table

The HospitalTransfer table contains information related to intrahospital transfer events (Table 5). The time and department of each transfer event are given in respective columns. To protect patients’ privacy, all date time information is recorded as days relative to hospital admission. Since the EICU is in the emergency department, the department names denoted by “Emergency medical department” or “Emergency Department” refer to the EICU.

Table 5. Explanation for variables in the HospitalTransfer table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

TransferIn_dateTime

time of transfer in in days relative to hospital admission

TransferOut_dateTime

time of transfer out in days relative to hospital admission

TransferTo_Dept_Eng

department of transfer to

TransferFrom_Dept_Eng

department of transfer from

Lab table

The lab table contains data related to the laboratory findings (Table 6). There are 11,082,482 records of laboratory items in the dataset involving 214 types of laboratory items. there are 17 types of samples being tested for laboratory findings, including whole blood, plasma, urine, serum, arterial blood, stool, venous blood, catheter orifice, ascites, bile, dialysate, CK blood sample (kaolin-activated TEG channel), cerebrospinal fluid, bone marrow, deep venous catheter, sputum, gastric juice. The sample collection time is also recorded in days in reference to the hospital admission time.

Table 6. Explanation for variables in the Lab table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

Lab_category

Category of lab item

Lab_time

Time of lab in days relative to hospital admission

Lab_results

Results of the lab finding

Unit_measure

Unit of measurement

LabSampleCollect_time

Sample collection time in days relative to hospital admission

Lab_itemName_Eng

Name of lab item

Lab_Sample_Eng

Sample name

Microbiology culture table

The MicrobiologyCulture table contains information related to microbiology culture results (Table 7). Conventional information regarding sample, culture finding, culture time and description of microbiology culture are provided in the table.

Table 7. Explanation for variables in the Microbiology culture table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

MicrobiologyCulture_finding

Microbiology Culture finding

MicrobiologyCulture_time

Microbiology Culture time in days relative to hospital admission

MicrobiologyCulture_sample_Eng

Microbiology Culture sample

MicrobiologyCulture_Category_Eng

Microbiology Culture Category

MicrobiologyCulture_DESC_Eng

Description of Microbiology Culture

Drug sensitivity table

The DrugSens table contains information related to drug sensitivity of cultured bacteria (Table 8). Conventional information including sample, microbiology, culture time and drug name are available in the table. The negative and positive values in the DrugSens_result column refers to the results for Ultra broad spectrum β- Lactamase or D-test.

Table 8. Explanation for variables in the Drug sensitivity table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

Drug_Code

Code of the drug for sensitivity analysis

DrugSens_result

Results for Drug Sensitivity test

MIC

Minimum inhibitory concentration

DrugSens_time

Time for the results relative to hospital admission time as the time zero in days

Drug_name_Eng

Name of the tested drug

DrugSens_Microbiology_Eng

Microorganism for testing

DrugSens_Category_Eng

Category for the test

DrugSens_sample_Eng

Sample name

Examination report table

The ExamReport table contains information related to a variety of medical examinations, including computed topography (CT), X-ray and ultrasound (Table 9). The images are not available in current dataset, but instead we include the free text descriptions and conclusions for these examinations.

Table 9. Explanation for variables in the ExamReport table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

ExamReport_Category

Category of examination

ExamReport_DESC

Description of the examination in free form text

ExamReport_finding

Result finding

ExamReport_time

Time for the examination results relative to hospital admission time as the time zero in days

ExamReport_item_Eng

Name of the Examination

Medical order table

The MedOrder table contains information related to the medical order prescribed by clinicians (Table 10). The table provides both regular and stat medical orders (MedOrder_Type). The contents of the medical order can be found in the MedOrder_DESC column.

Table 10. Explanation for variables in the MedOrder table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

MedOrder_Type

Type of medical order: regular or stat

MedOrder_DESC

Description of medical order in free text

MedOrder_Start_DateTime

Start time of medication in days relative to hospital admission

MedOrder_Stop_DateTime

Stop time of medication in days relative to hospital admission

Medication table

The medication table provides data on the medication orders prescribed by clinicians (Table 11). This table is designed specifically for medication orders, containing columns for drug dose, frequency, unit of drug dose and route of administration.

Table 11. Explanation for variables in the Medication table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

Med_category

Category of medication

SingleDose

Single dose

Med_Freq

Frequency of administration

Med_unit

Unit of measurement

Med_startTime

Start time of medication in days relative to hospital admission

Med_stopTime

Stop time of medication in days relative to hospital admission

Med_route_Eng

Route of administration

Med_DESC_Eng

Medication name in text

Vitalsign table

The VitalSign table provides vital sign data for each hospital admission (Table 12). The VitalSign_DESC column provides categories of vital signs including diastolic blood pressure, temperature, heart rate and respiratory rate.

Table 12. Explanation for variables in the VitalSign table

Variables

Explanation

patient_SN

Patient series number: unique to each individual subject

Hospital_ID

unique to each hospital admission

VitalSign_DESC

Vital Sign Description

VitalSign_value

Vital Sign value

VitalSign_unit

Vital Sign unit of measurement

VitalSign_time

Vital Sign measurement time in days relative to hospital admission


Usage Notes

The dataset can be used for a variety of studies related to critical care medicine, including predictive analytics, model external validation, identification of risk factors and epidemiological studies. Code to generate the dataset is available on GitHub [17]. We will continue to expand the code to facilitate research community and we are also welcome other researchers to contribute code for some useful data extraction. For further details, please read our associated data descriptor paper [18].

Limitations

The dataset is created from electronic healthcare records, there are some missing data and errors, reflecting the real clinical setting. Studies related to temporal trends cannot be performed because the real calendar data were removed.


Release Notes

This is version 1.0 (initial release)


Ethics

The study was approved by the ethics committee of Zhejiang Provincial People's Hospital (approval number: QT2022185). Informed consent was waived as determined by the institutional review board, due to the retrospective design of the study. The study was conducted in accordance with the Declaration of Helsinki.


Acknowledgements

S.J. received funding from Youth Talents Project of Health Commission of Zhejiang Province (Project number: 2019323925). Z.Z. received funding from Yilu “Gexin” - Fluid Therapy Research Fund Project (YLGX-ZZ-2020005), Health Science and Technology Plan of Zhejiang Province (2021KY745).


Conflicts of Interest

There is no competing interest to declare.


References

  1. Elias, K. M., Moromizato, T., Gibbons, F. K. & Christopher, K. B. Derivation and validation of the acute organ failure score to predict outcome in critically ill patients: a cohort study. Crit Care Med 43, 856–864 (2015).
  2. Yehya, N. & Wong, H. R. Adaptation of a Biomarker-Based Sepsis Mortality Risk Stratification Tool for Pediatric Acute Respiratory Distress Syndrome. Crit Care Med 46, e9–e16 (2018).
  3. Chu, C. D. et al. Trends in Chronic Kidney Disease Care in the US by Race and Ethnicity, 2012-2019. JAMA Netw Open 4, e2127014 (2021).
  4. Höfler, M. Causal inference based on counterfactuals. BMC Med Res Methodol 5, 28 (2005).
  5. Zhang, Z., Chen, L., Xu, P. & Hong, Y. Predictive analytics with ensemble modeling in laparoscopic surgery: A technical note. Laparoscopic, Endoscopic and Robotic Surgery (2022) doi:10.1016/j.lers.2021.12.003.
  6. Valik, J. K. et al. Validation of automated sepsis surveillance based on the Sepsis-3 clinical criteria against physician record review in a general hospital population: observational study using electronic health records data. BMJ Qual Saf 29, 735–745 (2020).
  7. Zhang, Z. et al. Analytics with artificial intelligence to advance the treatment of acute respiratory distress syndrome. J Evid Based Med 13, 301–312 (2020).
  8. Forero, D. A., Curioso, W. H. & Patrinos, G. P. The importance of adherence to international standards for depositing open data in public repositories. BMC Res Notes 14, 405 (2021).
  9. Shahin, M. H. et al. Open Data Revolution in Clinical Research: Opportunities and Challenges. Clin Transl Sci 13, 665–674 (2020).
  10. Pollard, T. J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data 5, 180178 (2018).
  11. Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci Data 3, 160035 (2016).
  12. Thoral, P. J. et al. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med 49, e563–e577 (2021).
  13. Zeng, X. et al. PIC, a paediatric-specific intensive care database. Sci Data 7, 14 (2020).
  14. Xu, P. et al. Critical Care Database Comprising Patients With Infection. Front Public Health 10, 852410 (2022).
  15. Wickham, H. et al. Welcome to the Tidyverse. Journal of Open Source Software 4, 1686 (2019).
  16. Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, E215-220 (2000).
  17. Code used to generate the CSV files of the Chinese critical care database on Github. https://github.com/zh-zhang1984/ZhejiangProvinceICU/blob/main/ZhejiangProvinceICU.md [Accessed: 18 January 2023]
  18. Senjun J, Lin C, Kun C, Hu C, Hu S, and Zhongheng Z. Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center. https://doi.org/10.1038/s41597-023-01952-3. Sci Data (2023).

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery
Corresponding Author
You must be logged in to view the contact information.

Files