{Computing Similarity Between RNA Structures}

Computing Similarity Between RNA Structures


The primary structure of a ribonucleic acid (RNA) molecule is a sequence of nucleotides (bases) over the alphabet {A, C, G, U}. The secondary or tertiary structure of an RNA is a set of base-pairs which form bonds between bases. For secondary structures, these bonds have been traditionally assumed to be non-crossing. Tertiary structures, however, lack this traditional requirement.

It is useful to compare the similarity of two RNA molecules; it is presupposed that RNAs with similar molecular structures also have similar biological functions. Consequently, the comparison of RNA molecules is useful for the classification and taxonomy of bacteria, virii, and other RNA structures.

In considering the notion of similarity between two RNA molecules, we take into account their primary, secondary, and tertiary structures. It has been shown (Postscript, ~238k) that in general, the problem of considering the similarity of two RNAs using their tertiary structures is NP-hard. We have developed heuristics, however, (Postscript, ~157k) that allow us to make use of this tertiary information.

Our approach has advantages over other RNA comparison algorithms which compare RNA molecules using their primary structures, while trying to incorporate secondary structure data. The weakness of this approach is that it does not treat a base-pair as a whole entity. We feel that our approach is closer in spirit to the comparative analysis method currently being used in the manual analysis of RNA structures.

Site last modified October 5, 2000
For further information, please email kzhang@csd.uwo.ca