Many studies in biomedical research collect clinical information on patients in order to associate it with the experimental results on the tissue samples. The criteria upon which the type of clinical information is collected and the experiments are designed tend to depend only on the clinicians’ experience on which items of this information are important or appear to be linked to disease progression and survival. There is therefore a need for methods that can associate the information with outcome in an unsupervised manner.
We describe a method that can potentially infer specific relations within datasets without any prior knowledge about the data. Our method uses Inductive Logic Programming to infer rules from clinical data. In order to overcome problems of inconsistent and incomplete data we develop a bootstrapped method in which we randomly sample small subsets of our dataset and infer relapse rules using ILP from these. We test our method on a clinical dataset collected from five breast cancer studies. Our results are promising, as they conform with existing knowledge on breast cancer.