A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract. No. B554331 awarded by the Department of Energy.