Identifying clusters of protein binding sites in a nucleotide sequence under analysis. A computerized system determines likelihood parameters for a plurality of known protein binding sites. The likelihood parameter for each protein binding site represents a likelihood that the protein binding site will occur in a nucleotide sequence under analysis relative to a likelihood that the protein binding site will occur in a random nucleotide sequence of a substantially equivalent composition. Selected protein binding sites are grouped as a function of their respective likelihood parameters to determine a likelihood score, which is compared to a predetermined threshold. The selected protein binding sites in the nucleotide sequence are identified as one or more clusters if the likelihood score exceeds the predetermined threshold.
 This invention was made in part with Government support under grants R01-HG01391 and DE-FG02-94ER61910, awarded by the National Institutes of Health and the Department of Energy, respectively. The Government has certain rights in this invention.