Skip to Content
Find More Like This
Return to Search

Agent-Based Software for Gathering and Summarizing Textual and Internet Information

Oak Ridge National Laboratory

Contact ORNL About This Technology


Technology Marketing SummaryORNL’s Piranha solves the challenge most users face: finding a way to sift through
large amounts of data that provide accurate and relevant information. This requires
software that can quickly filter, relate, and show documents and relationships. Piranha is
JavaScript search, analysis, storage, and retrieval software for uncertain, vague, or complex
information retrieval from multiple sources such as the Internet. With Piranha, researchers
have pioneered an agent approach to text analysis that uses a large number of agents
distributed over very large computer clusters. Piranha is faster than conventional software
and provides the capability to cluster massive amounts of textual information relatively
quickly due to the scalability of the agent architecture.


DescriptionWhile computers can analyze massive amounts of data, the sheer volume of data makes
the most promising approaches impractical. Piranha allows advanced textual analysis
to be accomplished with unprecedented accuracy on very large and dynamic data. For
data already acquired, this design allows discovery of new opportunities or new areas of
concern. Piranha has been vetted in the scientific community as well as in a number of realworld
applications.
Benefits
  • More effective at collecting and summarizing large amounts of information from multiple sources
  • Clustering technique compares and stores similar information and provides a visual display

Applications and IndustriesPiranha’s Capabilities:
  • Finding Similar Documents: After selecting a document of interest, users can quickly find other similar documents.
  • Sampling Documents: A set of documents usually contains common themes or topics.
  • Representative documents from these themes can be found quickly and presented to an analyst.
  • Classifying Documents: A set of representative documents can be used by an analyst to define a topic of interest, and then related documents can be added to that set.

Potential Applications:
  • Text mining
  • Information “sense-making”
  • Document organization
  • Classification
More InformationPatents
Thomas E. Potok and Joel W. Reed, Agent-Based Method for Distributed Clustering of Textual Information, U.S. Patent 7,805,446, issued September 28, 2010.
Thomas E. Potok, Mark T. Elmore, Joel W. Reed, Nagiza F. Samatova and Jim N. Treadwell, System for Gathering and Summarizing Internet Information, U.S. Patent 7,072,883, issued July 4, 2006.
Thomas E. Potok, Mark T. Elmore, Joel W. Reed, Jim N. Treadwell, and Nagiza F. Samatova, System for Gathering and Summarizing Internet Information, U.S. Patent 7,315,858, issued January 1, 2008.
Y. Jiao and T. Potok, Dynamic Dimensionality Reduction for Data Stream Analysis, U.S. Patent Application 12/072,723, filed February 28, 2008.
B. Beckerman, R. Patton, and T. Potok, Method for Learning Phrase Patterns from Textual Documents, U.S. Patent Application 61/310,351, filed March 4, 2010.
R. Patton and T. Potok, Detecting Temporal Precursor Words in Text Documents Using Wavelet Analysis, U.S. Patent Application 61/331/941, filed May 6, 2010.

Lead Inventor
Thomas E. Potok
Computational Sciences and Engineering Division
Oak Ridge National Laboratory

Patents and Patent Applications
ID Number
Title and Abstract
Primary Lab
Date
Patent 7,805,446
Patent
7,805,446
Agent-based method for distributed clustering of textual information
A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.
Oak Ridge National Laboratory 09/28/2010
Issued
Patent 7,072,883
Patent
7,072,883
System for gathering and summarizing internet information
A computer method of gathering and summarizing large amounts of information comprises collecting information from a plurality of information sources (14, 51) according to respective maps (52) of the information sources (14), converting the collected information from a storage format to XML-language documents (26, 53) and storing the XML-language documents in a storage medium, searching for documents (55) according to a search query (13) having at least one term and identifying the documents (26) found in the search, and displaying the documents as nodes (33) of a tree structure (32) having links (34) and nodes (33) so as to indicate similarity of the documents to each other.
Oak Ridge National Laboratory 07/04/2006
Issued
Patent 7,315,858
Patent
7,315,858
Method for gathering and summarizing internet information
A computer method of gathering and summarizing large amounts of information comprises collecting information from a plurality of information sources (14, 51) according to respective maps (52) of the information sources (14), converting the collected information from a storage format to XML-language documents (26, 53) and storing the XML-language documents in a storage medium, searching for documents (55) according to a search query (13) having at least one term and identifying the documents (26) found in the search, and displaying the documents as nodes (33) of a tree structure (32) having links (34) and nodes (33) so as to indicate similarity of the documents to each other.
Oak Ridge National Laboratory 01/01/2008
Issued
Application 20090119343
Application
20090119343
Dynamic reduction of dimensions of a document vector in a document search and retrieval system
The method and system of the invention involves processing each new document (20) coming into the system into a document vector (16), and creating a document vector with reduced dimensionality (17) for comparison with the data model (15) without recomputing the data model (15). These operations are carried out by a first computer (11) while a second computer (12) updates the data model (18), which can be comprised of an initial large group of documents (19) and is premised on the computing an initial data model (13, 14, 15) to provide a reference point for determining document vectors from documents processed from the data stream (20).
Oak Ridge National Laboratory 02/28/2008
Filed
Patent 8,473,314
Patent
8,473,314
Method and system for determining precursors of health abnormalities from processing medical records
Medical reports are converted to document vectors in computing apparatus and sampled by applying a maximum variation sampling function including a fitness function to the document vectors to reduce a number of medical records being processed and to increase the diversity of the medical records being processed. Linguistic phrases are extracted from the medical records and converted to s-grams. A Haar wavelet function is applied to the s-grams over the preselected time interval; and the coefficient results of the Haar wavelet function are examined for patterns representing the likelihood of health abnormalities. This confirms certain s-grams as precursors of the health abnormality and a parameter can be calculated in relation to the occurrence of such a health abnormality.
Oak Ridge National Laboratory 06/25/2013
Issued
Technology Status
Technology IDDevelopment StageAvailabilityPublishedLast Updated
1031, 1368, 1759, 2235, 2377DevelopmentAvailable09/25/201209/25/2012

Contact ORNL About This Technology

To: David L. Sims<simsdl@ornl.gov>