Um Modelo Oculto de Markov para Encontrar Promotores em Seqüências de DNA
We present a Hidden Markov Model (HMM) to find binding sites, like promoters, in a DNA sequence. This approach allows variable-length spacers between the consensus sequences. The model was built using 150 known promoters of the {\em Escherichia coli} genome and uses the Expectation-Maximization (EM) algorithm to reestimate parameters. In order to test the model, we used 30 regions of {\em E.~coli}, each one known to contain a promoter. By cutting randomly these regions, we produced 20 sets of 30 sequences. The model was able to determine the correct or nearly correct (within 6 bp) 78$\%$ of the consensus sequences of a set, on average. The program is available through the WWW and can be useful as a tool to find a promoter in any procaryotic DNA sequence.
1998