Skip to Content
Find More Like This
Return to Search

Automatic generation of stop word lists for information retrieval and analysis

United States Patent

January 8, 2013
View the Complete Patent at the US Patent & Trademark Office
Pacific Northwest National Laboratory - Visit the Technology Commercialization Program Website
Methods and systems for automatically generating lists of stop words for information retrieval and analysis. Generation of the stop words can include providing a corpus of documents and a plurality of keywords. From the corpus of documents, a term list of all terms is constructed and both a keyword adjacency frequency and a keyword frequency are determined. If a ratio of the keyword adjacency frequency to the keyword frequency for a particular term on the term list is less than a predetermined value, then that term is excluded from the term list. The resulting term list is truncated based on predetermined criteria to form a stop word list.
Rose; Stuart J (Richland, WA)
Battelle Memorial Institute (Richland, WA)
12/ 555,962
September 9, 2009
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This invention was made with Government support under Contract DE-AC0576RL01830 awarded by the U.S. Department of Energy. The Government has certain rights in the invention.