A first protein sequence associated with the organism is identified, wherein the first protein sequence comprises a plurality of ordered residues. A plurality of sub-sequences is generated based on the first protein sequence, wherein each sub-sequence comprises a plurality of contiguous residues and a starting residue number of each sub-sequence differs from a starting residue number of another sub-sequence by one position in the first protein sequence. A first unique sub-sequence comprising a first set of contiguous residues based on the plurality of sub-sequences is identified, wherein the first unique sub-sequence is specific to the organism and is identified based on a dataset of protein sequences and stored.
STATEMENT REGARDING FEDERALLY FUNDED RESEARCH
 This invention was made in the course of or under prime Contract No. DE-AC52-07NA27344 between the U.S. Department of Energy and Lawrence Livermore National Security, LLC. This Record of Invention is prepared for the Office of the Assistant General Counsel for Patents, U.S. Department of Energy.