Bioinformatics: The Effects of Sequence Length and Percent Identity on Alignments Done With CLUSTALW

Document Type


Publication Date



Biology | Computer Sciences | Life Sciences | Physical Sciences and Mathematics


J. Andrew Holey; David Mitchell, Biology and Computer Science


The purpose of this study is to compare the effects of sequence length and percent identity on the alignment of protein and DNA sequences using the algorithms of the CLUSTALW program. To test the effects, eight protein alignment sets were taken from BAliBASE databank and the sequences for the eight corresponding DNA alignments were taken from GenBank. The results of this study clearly show that (1) for both DNA and proteins the percent identity of the sequences in an alignment has a greater effect than does the length of the sequences in an alignment, (2) proteins are more sensitive to changes in the percent identity than are DNA sequences, and (3) DNA sequences respond less to changes in their gap penalties than do proteins when the sequences have a low percent identity.