Improving Start Codon Prediction Accuracy in Prokaryotic Organisms Using Naïve Bayesian Classification

Document Type


Publication Date



Computer Sciences | Physical Sciences and Mathematics


Imad Rahal, Computer Science


With an overwhelming amount of genetic data now becoming publicly available, there is a growing need to develop more effective gene location prediction methods that produce reliable results. Although prediction of the stop codon location for genes in prokaryotic organisms is largely considered to be a solved problem, accurate prediction of the exact start codon location continues to lag behind because of the ambiguity for these start codons in the genetic code. This thesis will detail a new approach to predicting more precise gene locations for both the start and stop codon in prokaryotic organisms. This approach uses a set of gene location prediction results from other prediction programs to find consistently predicted gene locations. It then uses these ''consistent genes'' as a training set for Naïve Bayesian classification to improve accuracy in the ''ambiguous genes,'' those in which there is some variability or inconsistency in predicted locations among the prediction programs. The result is an improved accuracy in the location predictions when compared with the original set of prediction results.