Gene Ontology Prediction Using Compression Based Distances and Alignment Scores on Both Amino Acid Sequence and Secondary Structure

Filiz A., Çataltepe Z.

23rd International Symposium on Computer and Information Sciences (ISCIS), İstanbul, Turkey, 27 - 29 October 2008, pp.599-600 identifier

  • Publication Type: Conference Paper / Full Text
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.599-600
  • Istanbul Technical University Affiliated: Yes


Normalized Compression Distance (NCD) is a compression based pairwise distance measure. NCD has been shown to perform well in different domains, such as music, biological sequence and text classification. In this study, we use NCD distance together with Smith-Waterman (SW) alignment scores of protein sequences for gene ontology prediction. We find out that, using secondary structure in addition to the amino acid sequence increases the prediction performance when using NCD or SW alignment scores alone. The best contribution ratio of secondary structure for SW alignment scores is 0.25, while it is 0.50 for NCD scores. We also investigate using both NCD and SW together with the amino acid and secondary structure. We find out that this combination results in better prediction than NCD alone, but worse prediction than SW alone.