Using naïve Bayesian classification as a meta-predictor to improve start codon prediction accuracy in prokaryotic organisms

Document Type


Publication Date



Bioinformatics | Computer Sciences | Databases and Information Systems | Genomics | Numerical Analysis and Scientific Computing


Modern gene location prediction techniques are able to achieve near-perfect accuracy for prokaryotic organisms, but this reported accuracy is generally only for the stop codon locations. Accurate prediction of the start codon locations is more difficult to attain, and different approaches often produce conflicting predictions for the same gene. In this paper, we describe a new approach to resolve these conflicts and improve start codon prediction accuracy. Our approach uses a set of gene location prediction results from other popular prediction approaches to find consistently predicted gene locations. It then uses these consistent genes as a training set for a naïve Bayesian classifier to improve accuracy in the ambiguous genes, those in which there are some inconsistencies in the predicted start codon location among the original predictions. The methods detailed here apply to prokaryotic organisms, using E. coli and the EcoGene Verified Set database as a case study.