A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with Government support under Contract No. B519700 awarded by the Department of Energy. The Government has certain rights in this invention.