Supporting clinical research with an intensive-care database
Crowdsourcing clinical data from some 40,000 patients could vastly improve research and critical-care decisions
Image courtesy of the MIT Laboratory for Computational Physiology
Thousands of medications and interventions are in regular use in the ICU, with an uncountable number of interactions among them. Further, patient demographics and genetics play a role in what treatments are optimal. Patients have their vital signs and serum blood values monitored frequently, sometimes continuously, and as a result the ICU environment is undeniably complex. The role of the care provider is to synthesize this deluge of data into a useful treatment course, and it is not an easy one.
Paucity of well-curated clinical data is often cited as a key challenge in conducting research. There is a need for a new approach to knowledge generation; one which is efficient and can progress much faster than research has in the past.
Researchers at the MIT Laboratory for Computational Physiology have taken an alternative approach to clinical research by freely releasing ICU data to researchers globally, with a goal of crowdsourcing the knowledge generation process. Thanks to this work, researchers are able to test hypotheses using real data acquired from patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts.
“One of the biggest challenges in health care research is accessing the data,” says Alistair Johnson, a postdoc at MIT’s Institute for Medical Science and Engineering. “You need ethics approval, assistance from the hospital IT department, technical knowledge, and clinical knowledge. We’ve taken care of all that.”
The database, Medical Information Mart for Intensive Care (MIMIC), houses data on over 40,000 patients admitted ICUs at the Beth Israel Deaconess Medical Center since 2000. The data was de-identified to conform with the Health Insurance Portability and Accountability Act, and interested researchers must sign a data use agreement, promising not to use the data for any unlawful purpose among other guarantees. The data collected include vital signs, medications, laboratory measurements, observations, and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
In addition to making the data available, the laboratory has created an open code repository to allow researchers to collaboratively develop and reuse analytical code. Tom Pollard, a co-author on the paper, notes that when data and code are shared together, studies becomes completely reproducible, and says that collaboration is the key to advancing knowledge. “Sharing this code and data helps to advance the work of researchers around the world, from early-career students to highly experienced academics. Our belief is that together we can achieve much more than would be possible in closed groups.”