BioASQ is a series of international challenges promoting advances in biomedical semantic indexing and question answering. In this direction, it organizes different shared tasks annually, developing respective benchmark datasets that represent the real information needs of experts in the biomedical domain. The dataset for the biomedical semantic indexing task include millions of papers from the biomedical scientific literature, manually annotated with thousands of relevant topics from the Medical Subject Heading thesaurus. Beyond annotations with domain concepts, such as a particular disease or a drug, annotations for publication characteristics are also provided, such as whether it is a research paper, meta-analysis, or clinical trial, which can be related to the overal quality and reliability of the document. In this study, we analyze the benchmark dataset for biomedical Semantic Indexing, investigate the particular challenges of state-of-the-art methods for the automated semantic indexing of the biomedical literature, and explore machine-learning techniques to address them.
Relevant literature:
The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey
Filtering failure: the impact of automated indexing in Medline on retrieval of human studies for knowledge synthesis
Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation.