Document Type


Publication Date



The last ten years have seen the rise of a new parallel computing paradigm with diverse hardware architectures and software interfaces. One of the common architectures, known as 'non-uniform memory access' (NUMA), structures parallel computers so cores can access certain parts of memory faster than others. In our work, we sought to model a specific NUMA machine and use that model to inform optimizations for performing the Conjugate Gradient method. We used the model to come up with a segmented design that puts data that a core needs in memory where it can access it fast. Our segmented solution proved to be eff ective over the control with a maximum speed-up of 11.1x faster.


Approved by: Michael Heroux, Lynn Zeigler, Dean Langley, Thomas Kirkman, Tony Cunningham

Included in

Physics Commons