An adaptive web crawling system generates a first utility measurement based on web page snippets associated with individual search result items by crawling from a collection of web page crawling seeds and according to a specific user web crawling criteria. The system generates a second utility measurement based on features extracted from the full webpages downloaded according to the guidance of the first utility measurement results. A web page utility prediction function is introduced to forecast the second utility measurement based on the first utility measurement. The system adapts its priorities for web crawling based on the web page utility prediction function.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT
 This invention was made with United States government support under Contract No. DE-AC05-00OR22725 awarded by the United States Department of Energy. The United States government has certain rights in the invention.