Using Pseudo-Random Numbers Repeatably in a Fine-Grain Multithreaded Simulation

Dmitry Savin

August 29, 2017

Motivation

Particle transport Monte Carlo simulations are a key tool for High Energy Physics experiments, including the LHC experiments at CERN. All Monte Carlo (MC) simulations depend vitally on Pseudo-Random Number Generators (PRNGs) used to sample many distributions. PRNGs used must possess very large periods, fast execution, the ability to create a large number of streams, and very good correlation properties between streams and sampled values. One of the key requirements for simulation is that it must be possible to reproduce a simulated event any time, for reexamination or debugging.

In Geant the history of the passage of a particle through matter forms a track. Each track is divided in steps defined by scatterings, geometry boundaries or user limits. Because of the transition from event-level parallelism in Geant4 to dynamical multithreading in GeantV, the tracks in one event or even parts of the same track are processed by different threads. Thus to assure reproducibility the pseudo-random engine state must be associated with the track itself; and the state of the secondary track has to be a deterministic function of the parent track. The sequence of the number of older siblings of each ancestor of a track uniquely represents its pedigree. The hashed pedigree of the track can be used to initialize the state of the random number generator.

Goals

Implementation

Standalone prototype

The standalone prototype was implemented in accordance with the proposal. There is an implementation compatible with any HepRandomEngine from the CLHEP library included in geant4 and a specific version for Philox and Threefry generators from Random123 library to reduce the abstraction overhead. There is a unit test for each implementation that traverses the tree in width-first and depth-first order and checks the equality of outcome.

Geant4-based prototype


PICT

Figure 1: Pedigree hash calculation and usage in the Geant4-based prototype.


A supplementary 64-bit state was added to G4Track. The stored number represents the hashed pedigree of the track. The hashed pedigree is used to seed the random number generator at the beginning of processing each track. The hash is a Merkle-Damgård construction with standard hash followed by boost_combine used as the compression function and a random number as the initialization vector. To toggle the random number generation behavior at runtime a Geant4 macro command was added. For testing the p-values for pairs of histograms obtained with different track processing order were calculated.

Benchmarks


SVG-Viewer needed.


Figure 2: User run time with (Y axis) and without (X axis) reseeding.


Figure  shows run times of a standard geant4 example with the standard random number generation and with reseeding based on pedigrees for different random number generators. The overhead depends on the complexity of the seeding function of the random number generator. It is small for the new counter-based generators, because the complexity is shifted to the output function.

Conclusion

We implemented a Geant4 prototype who can perform fully reproducible simulations in a multithreaded environment and under the exchange of the order of track propagation. We profiled the overhead to achieve this reproducability for different random number engines and showed that it is quite low for selected engines.

Therefore it is worth to apply a similar algorithm for reproducible pseudo-random number generation in GeantV.

Acknowledgements

Development sponsored by Google in Google Summer of Code 2017 under supervision of John Apostolakis and Sandro Wenzel.

Links

Standalone prototype
Geant4-based prototype
Preprint about the Geant4-based prototype (in progress)
GSoC proposal