A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
 This invention was made with government support under Prime Contract Number DE-AC52-07NA27344, Subcontract Number B600716 awarded by the Department of Energy (DOE). The government has certain rights in the invention.