Systems and computer-implemented processes for computation and analysis of significant themes in a corpus of documents. The computation and analysis of significant themes can be executed on a processor and involves generating a lexical unit document association (LUDA) vector for each lexical unit that has been provided and quantifying similarities between each unique pair of lexical units. The LUDA vector characterizes a measure of association between its corresponding lexical unit and documents in the corpus. The lexical units can then be grouped into clusters such that each cluster contains a set of lexical units that are most similar as determined by the LUDA vectors and a predetermined clustering threshold.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
 This invention was made with Government support under Contract DE-ACO576RL01830 awarded by the U.S. Department of Energy. The Government has certain rights in the invention.