October 18, 2019

BioHIT TALK | Using text data to improve causal analyses of electronic health records | Dr Aaron Kaufman

Visiting speaker Dr Aaron Kaufman will give a talk on “Using text data to improve causal analyses of electronic health records” at the Lecture Hall of the Institute of Informatics and Telecommunications, of NCSR Demokritos on Thursday 7/11, at 14:00.

Text is a ubiquitous component of medical data, containing valuable information about patient characteristics and care, yet almost wholly ignored in clinical research. Using a large database of patient records and treatment histories accompanied by extensive notes by attendant physicians and nurses, we show how text data improve research in all stages, from conception and design to analysis and interpretation. In particular, we consider a study using matching for causal inference. By incorporating text in the matching stage, we improve covariate balance; by using text to supplement a multiple imputation procedure, we improve the fidelity of our imputed values; and by conditioning on text, we can estimate easily interpretable conditional treatment effects. We introduce software to implement our procedures which easily fit into existing workflows. Using these techniques, we hope to expand the scope of secondary analysis of clinical data to domains where quantitative data is of poor quality or nonexistent but text is available, for example developing countries.

Speaker Biography
Dr Kaufman is an Assistant Professor of Computational Social Science at New York University Abu Dhabi. He graduated from the University of California, Berkeley in 2013, received his AM from the Department of Statistics at Harvard University in 2016, and his PhD from the Department of Government at Harvard University in May 2019. His work focuses on applying machine learning and other advanced computational tools for solving measurement problems in the social sciences within a causal inference framework. He collaborates extensively with MIT’s Laboratory for Computational Physiology, where he applies the tools he builds for social science problems to the secondary analysis of electronic health records; the first of this work recently appeared in Nature Digital Medicine. His other work has appeared in the top journals in political science including the American Political Science Review, the American Journal of Political Science, Political Analysis.