Using Pseudo-Random Numbers Repeatably
in a Fine-Grain Multithreaded Simulation
Dmitry Savin
August 28, 2017
Abstract
Because of the transition from event-level parallelism in Geant4to dynamical multithreading in GeantV the order in whichtracks are processed becomes non-deterministic. Thus to maintainreproducibility one needs to associate the random generator state withthe track itself and the worker thread currently processing the track.To be reproducible from the beginning of the event, this state canonly depend on the pedigree of the track. Calculation of the hashedpedigree of a track using the hashed pedigree of the parent track makesthe hashing algorithm a Merkle-Damgård construct with well-studiedproperties.
We implement this construction using a 64-bit hash and standardhash with boost_combine as the compression function operating onthe number of the track among siblings as the input message blocks.We use the hashed pedigree as the seed for a CLHEP random numberengine at the beginning of processing of each track in a Geant4simulation. We show the reproducibility of the results under differenttrack stacking order and their agreement with the results with thedefault random number generation. We show that the performanceoverhead is negligible for most of the random number generatorsexcept for those that have a complex internal state.
GeantV[1]is a project aimed at increasing the performance of HEPsimulations. It is achieved by utilizing massive parallelism of modernhardware and accelerators, such as GPGPUs and Intel Xeon Phicoprocessors. While different threads may operate independently, to makeuse of vectorization within thread one must have data layout suitable forSIMD (Single Instruction Multiple Data) operations. Thus the tracks withsimilar properties (particle type, energy, enclosing volume) mustbe regrouped into ”baskets”. Then the baskets are processed byworker threads, which readiness depends on running conditions andcannot be deterministic. So the tracks in one event or even parts ofthe same track are processed by different threads. Thus to assurereproducibility the pseudo-random engine state must be associatedwith the track itself; and state of the secondary track has to be adeterministic function of the parent track. This is similar to themechanism of pedigrees[3,4]for repeatable parallel random-numbergenerators proposed for Intel Cilk platform, which gave the idea for thiswork.
1.2 Pedigrees
Pedigrees are deterministic labels for the executed instructions in adthreaded program execution that partition the instructions into validstrands in such way, that
each strand includes instructions with the same pedigree,
the pedigree of any instruction does not depend on how theprogram is scheduled on multiple processors.
These requirements guarantee that pedigrees deterministically identify strandsirrespective of scheduling. A random number can be generated by hashingthe pedigree, which can be used to initialize a substream of the randomnumber generator. The pedigree can be represented as a sequence of theinstruction rank followed by the ranks (number of other daughterinstruction generated earlier) of its ancestors.
In Geant case is translates into the rank of a track equal to the numberof secondary tracks generated by the parent before it. We implementpedigree calculation in Geant4 as a prototype to be further ported toGeantV.
2 Geant4-based prototype
Figure 1: Pedigree hash calculation and usage in the Geant4-based
prototype.
To check the results of setting the random number generator on aper-track basis we add additional state to G4Track. For compatibility withdifferent random number generators and minimal overhead the additionalstate is a single number that is used as the seed for the random numbergenerator at the beginning of tracking. The stored number represents thehashed pedigree of the track. The hashed pedigree is used to seed therandom number generator at the beginning of processing each track. Sincein Geant4 the track is processed by the thread from start to the end, it isunnecessary to store a copy of the random number generator state in thetrack.
2.1 Hash calculation
Because when the secondary track is created not full pedigree of its parentbut only the hash is available, the hashing algorithm is similar toMerkle-Damgård construct[2,5]with more relaxed requirementscompared to cryptographic uses. At each secondary track creation acompression function is used to calculate the hashed pedigree of thesecondary track from the hashed pedigree of its parent and the number ofalready created siblings. To assure cryptographic quality one could use256-bit hash and SHA-256 (wich is a Merkle-Damgård construct itself) asthe compression function, but since the simulated physics and seeding of therandom number generator add more complexity, we implemented a simplerscheme. The hash is a 64-bit integer, and the compression function isstandard hash followed by has_combine from boost. The initialvector is generated by the random engine at the beginning of theevent.
2.2 Counter-based Pseudo-Random Number Generators
To reduce the overhead from frequent generator seeding one can use agenerator with a simple transition function and compact state but a morecomplex output function. Counter-based Pseudo-Random NumberGenerators are the ultimate example of such generators with thetransition function is a simple increment and the output is a blockcipher. To take advantage of an already tested solution we provide aHepRandomEngine wrapper for Philox1x64 and Threefry1x64 generatorsfrom Random123 library. They both have a 64-bit counter an thus generate64-bit output sufficient to generate a double precision floating pointvalue in [0;1) range - the most frequent use case in Geant4. Alsothey both use an additional key, which allows easy spawning ofindependent streams. Other variations of Philox and Threefry(as well as AES-based) generators provided in this library seemexcessive because of bigger output size, but may be more efficientwhen simultaneously generating many random numbers for vectorcomputation.
2.3 Testing
2.3.1 Reproducibility
To test the reproducibility we add a runtime capability to turn on and offsetting seeds at the start of tracking via a Geant4 macro command. Wemodify a basic example to process tracks either in the default (first in lastout) order, or in a non-deterministic depending on the standard randomdevice. To compare the results we calculate p-value between theoutput histograms (which represent the number of tracks in eachgeneration) from different runs. The full test macros are presented inAppendixA.
To check the sensitivity of the test we calculate the p-values with thedefault tracking order for runs with the same engine and different seeds setvia a macro command. We find that for runs with the same seeds thep-value equals 1, and for runs with different seeds the p-value p <1 (lessthan 10-10for this histogram because of correlated bins and highstatistics), thus the test is sensitive to the random numbers used during thesimulation. The p-value for runs with random tracking order p <1,thus the test is sensitive to the tracking order. The p-value forruns with reseeding at the beginning of tracking are equal 1 withany tracking order, thus the pedigrees do assure reproducibilityindependent of tracking order. The full test results are presented inAppendixB.
2.3.2 Physical results
Figure 2: Neutron induced gamma spectrum from a graphite sample with
different random engines.
To assess the physical results we compare the output of Hadr06standard example with different random number generators. Themacro used was derived from graphite.mac by multiplying thestatistics by ten. Figure2.3.2shows the spectrum of gamma raysemitted from a graphite sample. The most intensive line at 4.4 MeVcorresponds to the first excited level of12C. The p-values for pairs ofhistograms produced with different engines are in the range 0.1 - 0.9. Itindicates that the results with different generators are in generalagreement, including the new counter-based ones. For full results seeAppendixB.
2.3.3 Performance
Figure 3: User run time with (Y axis) and without (X axis) reseeding.
Figure2.3.3shows run times of a standard geant4 example with thestandard random number generation and with reseeding based onpedigrees for different random number generators. To calculatethe mean values and standard deviation the example was run 16times with each generator. Times for each run are presented inAppendix2.3.3. The overhead depends on the complexity of the seedingfunction of the random number generator. It is the small for the newcounter-based generators, because the complexity is shifted to the outputfunction.
3 Conclusion
We implemented reproducible pseudo-random number generation in aGeant4-based prototype. We showed the reproducibility of the resultsindependent of the track processing order. Also we profiled the overhead fordifferent random number engines. We showed that it is rather low forappropriate random number engines.
Therefore it is worth to apply a similar algorithm for reproduciblepseudo-random number generation in multithreaded GeantV simulations.
4 Acknowledgements
Development sponsored by Google in Google Summer of Code 2017 undersupervision of John Apostolakis and Sandro Wenzel.
References
[1]John Apostolakis, René Brun, Federico Carminati, AndreiGheata, and Sandro Wenzel. A concurrent vector-based steeringframework for particle transport. Journal of Physics: ConferenceSeries, 523(1):012004, 2014.
[2]Ivan Bjerre Damgård. A Design Principle for Hash Functions,pages 416–427. Springer New York, New York, NY, 1990.
[3]Charles E. Leiserson, Tao B. Schardl,and Jim Sukha. Deterministic parallel random-number generationfor dynamic-multithreading platforms. In Proceedings of the 17thACM SIGPLAN Symposium on Principles and Practice of ParallelProgramming, PPoPP ’12, pages 193–204, New York, NY, USA,2012. ACM.
[4]Charles E.Leiserson, Tao B. Schardl, and Jim Sukha. Deterministic parallelrandom-number generation for dynamic-multithreading platforms.SIGPLAN Not., 47(8):193–204, February 2012.
[5]Ralph C. Merkle. A Certified Digital Signature, pages 218–238.Springer New York, New York, NY, 1990.
A Test macros
# Macro file runTr.mac for the modified example B2b # # To be run preferably in batch, without graphics: # % exampleB2b runTr.mac # # Runs the example with different stacking and seeding options # to demonstrate the result independence of the tracking order # if track hash based seeding is used. # The executed runSomeTimes.mac runs simulation with different initial seeds # to validate the test and to avoid initial condition dependence. /tracking/verbose 0 /run/numberOfThreads 1 /B2/run/setGenHistLimit 30 /run/initialize /gun/particle e- /gun/energy 1 GeV /hits/verbose 0 /globalField/verbose 0 /event/verbose 0 /run/verbose 1 # default mode, used in geant4 as of version 10.04 beta /tracking/rng default /B2/stack/setStackingType default /B2/track/restoreTrackSeeds false /control/execute runSomeTimes.mac # leaves seeding the same but shuffles the tracking order # to make sure it works, the result should be irreproducible /tracking/rng default /B2/stack/setStackingType random /B2/track/restoreTrackSeeds false /control/execute runSomeTimes.mac # activates toolkit-level seed hashing does not shuffle tracks # the results are reproducible /tracking/rng generic /B2/stack/setStackingType default /B2/track/restoreTrackSeeds false /control/execute runSomeTimes.mac # activates toolkit-level seed hashing does not shuffle tracks # the results are reproducible and the same as without shuffling /tracking/rng generic /B2/stack/setStackingType random /B2/track/restoreTrackSeeds false /control/execute runSomeTimes.mac