Skip to Content
Find More Like This
Return to Search

Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

United States Patent

8,290,961
October 16, 2012
View the Complete Patent at the US Patent & Trademark Office
Sandia National Laboratories - Visit the Intellectual Property Management and Licensing Website
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Chew; Peter A. (Albuquerque, NM), Bader; Brett W. (Albuquerque, NM)
Sandia Corporation (Albuquerque, NM)
12/ 352,621
January 13, 2009
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH This invention was developed with Government support under Contract No. DE-AC04-94AL85000 between Sandia Corporation and the U.S. Department of Energy. The U.S. Government has certain rights in this invention.