The method and system of the invention involves processing each new document (20) coming into the system into a document vector (16), and creating a document vector with reduced dimensionality (17) for comparison with the data model (15) without recomputing the data model (15). These operations are carried out by a first computer (11) while a second computer (12) updates the data model (18), which can be comprised of an initial large group of documents (19) and is premised on the computing an initial data model (13, 14, 15) to provide a reference point for determining document vectors from documents processed from the data stream (20).
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
This invention was made with U.S. Government support under Contract No. DE-AC05-00OR22725 awarded to UT-Battelle LLC, by the U.S. Dept. of Energy. The Government has certain rights in the invention.