An Evaluation of Different Clustering Methods and Distance Measures Used for Grouping Metabolic Pathways

Printer-friendly versionSend by email
Conference Proceedings (fully refereed)
Kim, S., Peña, M., Moll, M., Giannakopoulos, G., Bennett, G., Kavraki, L.
Large-scale annotated metabolic databases, such as KEGG and MetaCyc, provide a wealth of information to researchers designing novel biosynthetic pathways. However, many metabolic pathfinding tools that assist in identifying possible solution pathways fail to facilitate the grouping and interpretation of these pathway results. Clustering possible solution pathways can help users of pathfinding tools quickly identify major patterns and unique pathways without having to sift through individual results one by one. In this paper, we assess the ability of three separate clustering methods (hierarchical, k-means, and k-medoids) along with three pair-wise distance measures (Levenshtein, Jaccard, and n-gram) to expertly group lysine, isoleucine, and 3-hydroxypropanoic acid (3-HP) biosynthesis pathways. The quality of the resulting clusters were quantitatively evaluated against expected pathway groupings taken from the literature. Hierarchical clustering and Levenshtein distance seemed to best match external pathway labels across the three biosynthesis pathways. The lysine biosynthesis pathways, which had the most distinct separation of pathways, had better quality clusters than isoleucine and 3-HP, suggesting that grouping pathways with more complex underlying topologies may require more tailored clustering methods.
Software and Knowledge Engineering Laboratory (SKEL)
Conference Short Name: 
BICoB 2016
Conference Full Name: 
8th International Conference on Bioinformatics and Computational Biology
Conference Country: 
US:United States
Conference City: 
Las Vegas, Nevada
Conference Date(s): 
Mon, 04/04/2016 - Wed, 06/04/2016

© 2018 - Institute of Informatics and Telecommunications | National Centre for Scientific Research "Demokritos"