Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with Government support under Contract DE-AC0576RL01830 awarded by the U.S. Department of Energy. The Government has certain rights in the invention.