Using Pseudo-Random Numbers Repeatably in a Fine-Grain Multithreaded Simulation

Because of the transition from event-level parallelism in Geant4 to dynamical multithreading in GeantV the order in which tracks are processed becomes non-deterministic. Thus to maintain reproducibility one needs to associate the random generator state with the track itself and the worker thread currently processing the track. To be reproducible from the beginning of the event, this state can only depend on the pedigree of the track. Calculation of the hashed pedigree of a track using the hashed pedigree of the parent track makes the hashing algorithm a Merkle-Damgård construct with well-studied properties.

We implement this construction using a 64-bit hash and standard hash with boost_combine as the compression function operating on the number of the track among siblings as the input message blocks. We use the hashed pedigree as the seed for a CLHEP random number engine at the beginning of processing of each track in a Geant4 simulation. We show the reproducibility of the results under different track stacking order and their agreement with the results with the default random number generation. We show that the performance overhead is negligible for most of the random number generators except for those that have a complex internal state.

1 Introduction

1.1 Fine-grained parallelism and multi-threading

GeantV [1] is a project aimed at increasing the performance of HEP simulations. It is achieved by utilizing massive parallelism of modern hardware and accelerators, such as GPGPUs and Intel Xeon Phi coprocessors. While different threads may operate independently, to make use of vectorization within thread one must have data layout suitable for SIMD (Single Instruction Multiple Data) operations. Thus the tracks with similar properties (particle type, energy, enclosing volume) must be regrouped into ”baskets”. Then the baskets are processed by worker threads, which readiness depends on running conditions and cannot be deterministic. So the tracks in one event or even parts of the same track are processed by different threads. Thus to assure reproducibility the pseudo-random engine state must be associated with the track itself; and state of the secondary track has to be a deterministic function of the parent track. This is similar to the mechanism of pedigrees [3, 4] for repeatable parallel random-number generators proposed for Intel Cilk platform, which gave the idea for this work.

1.2 Pedigrees

Pedigrees are deterministic labels for the executed instructions in a dthreaded program execution that partition the instructions into valid strands in such way, that

These requirements guarantee that pedigrees deterministically identify strands irrespective of scheduling. A random number can be generated by hashing the pedigree, which can be used to initialize a substream of the random number generator. The pedigree can be represented as a sequence of the instruction rank followed by the ranks (number of other daughter instruction generated earlier) of its ancestors.

In Geant case is translates into the rank of a track equal to the number of secondary tracks generated by the parent before it. We implement pedigree calculation in Geant4 as a prototype to be further ported to GeantV.

2 Geant4-based prototype

To check the results of setting the random number generator on a per-track basis we add additional state to G4Track. For compatibility with different random number generators and minimal overhead the additional state is a single number that is used as the seed for the random number generator at the beginning of tracking. The stored number represents the hashed pedigree of the track. The hashed pedigree is used to seed the random number generator at the beginning of processing each track. Since in Geant4 the track is processed by the thread from start to the end, it is unnecessary to store a copy of the random number generator state in the track.

2.1 Hash calculation

Because when the secondary track is created not full pedigree of its parent but only the hash is available, the hashing algorithm is similar to Merkle-Damgård construct [2, 5] with more relaxed requirements compared to cryptographic uses. At each secondary track creation a compression function is used to calculate the hashed pedigree of the secondary track from the hashed pedigree of its parent and the number of already created siblings. To assure cryptographic quality one could use 256-bit hash and SHA-256 (wich is a Merkle-Damgård construct itself) as the compression function, but since the simulated physics and seeding of the random number generator add more complexity, we implemented a simpler scheme. The hash is a 64-bit integer, and the compression function is standard hash followed by has_combine from boost. The initial vector is generated by the random engine at the beginning of the event.

2.2 Counter-based Pseudo-Random Number Generators

To reduce the overhead from frequent generator seeding one can use a generator with a simple transition function and compact state but a more complex output function. Counter-based Pseudo-Random Number Generators are the ultimate example of such generators with the transition function is a simple increment and the output is a block cipher. To take advantage of an already tested solution we provide a HepRandomEngine wrapper for Philox1x64 and Threefry1x64 generators from Random123 library. They both have a 64-bit counter an thus generate 64-bit output sufficient to generate a double precision floating point value in [0;1) range - the most frequent use case in Geant4. Also they both use an additional key, which allows easy spawning of independent streams. Other variations of Philox and Threefry (as well as AES-based) generators provided in this library seem excessive because of bigger output size, but may be more efficient when simultaneously generating many random numbers for vector computation.

2.3 Testing

2.3.1 Reproducibility

To test the reproducibility we add a runtime capability to turn on and off setting seeds at the start of tracking via a Geant4 macro command. We modify a basic example to process tracks either in the default (first in last out) order, or in a non-deterministic depending on the standard random device. To compare the results we calculate p-value between the output histograms (which represent the number of tracks in each generation) from different runs. The full test macros are presented in Appendix A.

To check the sensitivity of the test we calculate the p-values with the default tracking order for runs with the same engine and different seeds set via a macro command. We find that for runs with the same seeds the p-value equals 1, and for runs with different seeds the p-value p < 1 (less than 10^-10 for this histogram because of correlated bins and high statistics), thus the test is sensitive to the random numbers used during the simulation. The p-value for runs with random tracking order p < 1, thus the test is sensitive to the tracking order. The p-value for runs with reseeding at the beginning of tracking are equal 1 with any tracking order, thus the pedigrees do assure reproducibility independent of tracking order. The full test results are presented in Appendix B.

2.3.2 Physical results

To assess the physical results we compare the output of Hadr06 standard example with different random number generators. The macro used was derived from graphite.mac by multiplying the statistics by ten. Figure 2.3.2 shows the spectrum of gamma rays emitted from a graphite sample. The most intensive line at 4.4 MeV corresponds to the first excited level of ¹²C. The p-values for pairs of histograms produced with different engines are in the range 0.1 - 0.9. It indicates that the results with different generators are in general agreement, including the new counter-based ones. For full results see Appendix B.

2.3.3 Performance

Figure 2.3.3 shows run times of a standard geant4 example with the standard random number generation and with reseeding based on pedigrees for different random number generators. To calculate the mean values and standard deviation the example was run 16 times with each generator. Times for each run are presented in Appendix 2.3.3. The overhead depends on the complexity of the seeding function of the random number generator. It is the small for the new counter-based generators, because the complexity is shifted to the output function.

3 Conclusion

We implemented reproducible pseudo-random number generation in a Geant4-based prototype. We showed the reproducibility of the results independent of the track processing order. Also we profiled the overhead for different random number engines. We showed that it is rather low for appropriate random number engines.

Therefore it is worth to apply a similar algorithm for reproducible pseudo-random number generation in multithreaded GeantV simulations.

4 Acknowledgements

Development sponsored by Google in Google Summer of Code 2017 under supervision of John Apostolakis and Sandro Wenzel.

References

[1] John Apostolakis, René Brun, Federico Carminati, Andrei Gheata, and Sandro Wenzel. A concurrent vector-based steering framework for particle transport. Journal of Physics: Conference Series, 523(1):012004, 2014.

[2] Ivan Bjerre Damgård. A Design Principle for Hash Functions, pages 416–427. Springer New York, New York, NY, 1990.

[3] Charles E. Leiserson, Tao B. Schardl, and Jim Sukha. Deterministic parallel random-number generation for dynamic-multithreading platforms. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pages 193–204, New York, NY, USA, 2012. ACM.

[4] Charles E. Leiserson, Tao B. Schardl, and Jim Sukha. Deterministic parallel random-number generation for dynamic-multithreading platforms. SIGPLAN Not., 47(8):193–204, February 2012.

[5] Ralph C. Merkle. A Certified Digital Signature, pages 218–238. Springer New York, New York, NY, 1990.

A Test macros

# Macro file runTr.mac for the modified example B2b
#
# To be run preferably in batch, without graphics:
# % exampleB2b runTr.mac
#
# Runs the example with different stacking and seeding options
# to demonstrate the result independence of the tracking order
# if track hash based seeding is used.
# The executed runSomeTimes.mac runs simulation with different initial seeds
# to validate the test and to avoid initial condition dependence.

/tracking/verbose 0

/run/numberOfThreads 1
/B2/run/setGenHistLimit 30
/run/initialize

/gun/particle e-
/gun/energy 1 GeV

/hits/verbose 0
/globalField/verbose 0
/event/verbose 0
/run/verbose 1

# default mode, used in geant4 as of version 10.04 beta
/tracking/rng default
/B2/stack/setStackingType default
/B2/track/restoreTrackSeeds false
/control/execute runSomeTimes.mac

# leaves seeding the same but shuffles the tracking order
# to make sure it works, the result should be irreproducible
/tracking/rng default
/B2/stack/setStackingType random
/B2/track/restoreTrackSeeds false
/control/execute runSomeTimes.mac

# activates toolkit-level seed hashing does not shuffle tracks
# the results are reproducible
/tracking/rng generic
/B2/stack/setStackingType default
/B2/track/restoreTrackSeeds false
/control/execute runSomeTimes.mac

# activates toolkit-level seed hashing does not shuffle tracks
# the results are reproducible and the same as without shuffling
/tracking/rng generic
/B2/stack/setStackingType random
/B2/track/restoreTrackSeeds false
/control/execute runSomeTimes.mac

#runSomeTimes.mac
#run several times to check variability

/run/printProgress 10

/random/setSeeds 0 0
/run/beamOn 100

/random/setSeeds 0 0
/run/beamOn 100

/random/setSeeds 8989 1234567
/run/beamOn 100

/random/setSeeds 9876 12345
/run/beamOn 100

/random/setSeeds 98079843 2134512
/run/beamOn 100

/random/setSeeds 98079843 2134512
/run/beamOn 100

/random/setSeeds 13417512123 231111457008
/run/beamOn 100

/random/setSeeds 13417512123 231111457008
/run/beamOn 100

/random/setSeeds 13417512123 231111457008
/run/beamOn 100

B Test results

B.1 Reproducibility

   ------------------------------------------------------------
   Welcome to ROOT 6.10/04                http://root.cern.ch                                   c 1995-2017, The ROOT Team     Built for linuxx8664gcc                                        From tag v6-10-04, 28 July 2017                                Try ’.help’, ’.demo’, ’.license’, ’.credits’, ’.quit’/’.q’     ------------------------------------------------------------

Processing script.C...
Mode #0
       1        1 2.7e-138 2.6e-186 7.9e-148 7.9e-148 1.8e-196 1.8e-196 1.8e-196
       1        1 2.7e-138 2.6e-186 7.9e-148 7.9e-148 1.8e-196 1.8e-196 1.8e-196
2.7e-138 2.7e-138        1   1e-154  1.9e-53  1.9e-53  2.7e-40  2.7e-40  2.7e-40
2.6e-186 2.6e-186   1e-154        1  8.9e-68  8.9e-68 1.2e-233 1.2e-233 1.2e-233
7.9e-148 7.9e-148  1.9e-53  8.9e-68        1        1  1.9e-57  1.9e-57  1.9e-57
7.9e-148 7.9e-148  1.9e-53  8.9e-68        1        1  1.9e-57  1.9e-57  1.9e-57
1.8e-196 1.8e-196  2.7e-40 1.2e-233  1.9e-57  1.9e-57        1        1        1
1.8e-196 1.8e-196  2.7e-40 1.2e-233  1.9e-57  1.9e-57        1        1        1
1.8e-196 1.8e-196  2.7e-40 1.2e-233  1.9e-57  1.9e-57        1        1        1

Mode #1
       1  1.7e-94  6.3e-76  5.3e-80  5.7e-68  4.7e-74  7.1e-57  1.4e-99 1.5e-148
1.7e-94        1  2.5e-57  2.5e-65   2e-103 1.3e-187 4.6e-152  2.5e-87 1.9e-137
6.3e-76  2.5e-57        1  9.8e-26  2.5e-26  8.7e-81  3.4e-54  5.8e-36  2.7e-57
5.3e-80  2.5e-65  9.8e-26        1  1.8e-20 8.2e-132  5.4e-59  2.2e-53  1.5e-87
5.7e-68   2e-103  2.5e-26  1.8e-20        1  3.4e-84  3.6e-52  6.7e-34  3.2e-77
4.7e-74 1.3e-187  8.7e-81 8.2e-132  3.4e-84        1  8.9e-55    2e-83  4.6e-93
7.1e-57 4.6e-152  3.4e-54  5.4e-59  3.6e-52  8.9e-55        1  1.1e-54  7.3e-52
1.4e-99  2.5e-87  5.8e-36  2.2e-53  6.7e-34    2e-83  1.1e-54        1  3.4e-20
1.5e-148 1.9e-137  2.7e-57  1.5e-87  3.2e-77  4.6e-93  7.3e-52  3.4e-20        1

Mode #2
       1        1  2.9e-35  4.5e-62  1.6e-67  1.6e-67  1.2e-38  1.2e-38  1.2e-38
       1        1  2.9e-35  4.5e-62  1.6e-67  1.6e-67  1.2e-38  1.2e-38  1.2e-38
2.9e-35  2.9e-35        1 1.2e-116  1.8e-31  1.8e-31  6.4e-33  6.4e-33  6.4e-33
4.5e-62  4.5e-62 1.2e-116        1 1.8e-106 1.8e-106  4.7e-93  4.7e-93  4.7e-93
1.6e-67  1.6e-67  1.8e-31 1.8e-106        1        1  2.2e-15  2.2e-15  2.2e-15
1.6e-67  1.6e-67  1.8e-31 1.8e-106        1        1  2.2e-15  2.2e-15  2.2e-15
1.2e-38  1.2e-38  6.4e-33  4.7e-93  2.2e-15  2.2e-15        1        1        1
1.2e-38  1.2e-38  6.4e-33  4.7e-93  2.2e-15  2.2e-15        1        1        1
1.2e-38  1.2e-38  6.4e-33  4.7e-93  2.2e-15  2.2e-15        1        1        1

Mode #3
       1        1  2.9e-35  4.5e-62  1.6e-67  1.6e-67  1.2e-38  1.2e-38  1.2e-38
       1        1  2.9e-35  4.5e-62  1.6e-67  1.6e-67  1.2e-38  1.2e-38  1.2e-38
2.9e-35  2.9e-35        1 1.1e-116  1.8e-31  1.8e-31  6.4e-33  6.4e-33  6.4e-33
4.5e-62  4.5e-62 1.1e-116        1 1.7e-106 1.7e-106  4.6e-93  4.6e-93  4.6e-93
1.6e-67  1.6e-67  1.8e-31 1.7e-106        1        1  2.2e-15  2.2e-15  2.2e-15
1.6e-67  1.6e-67  1.8e-31 1.7e-106        1        1  2.2e-15  2.2e-15  2.2e-15
1.2e-38  1.2e-38  6.4e-33  4.6e-93  2.2e-15  2.2e-15        1        1        1
1.2e-38  1.2e-38  6.4e-33  4.6e-93  2.2e-15  2.2e-15        1        1        1
1.2e-38  1.2e-38  6.4e-33  4.6e-93  2.2e-15  2.2e-15        1        1        1

Run #0
       1  2.3e-51 1.2e-122 1.2e-122
2.3e-51        1 8.9e-100 8.9e-100
1.2e-122 8.9e-100        1        1
1.2e-122 8.9e-100        1        1

Run #1
       1  7.9e-16 1.2e-122 1.2e-122
7.9e-16        1 3.3e-103 3.3e-103
1.2e-122 3.3e-103        1        1
1.2e-122 3.3e-103        1        1

Run #2
       1  2.7e-68  1.3e-45  1.3e-45
2.7e-68        1  1.7e-21  1.7e-21
1.3e-45  1.7e-21        1        1
1.3e-45  1.7e-21        1        1

Run #3
       1   4e-102  6.5e-26    7e-26
  4e-102        1  1.3e-90  1.3e-90
6.5e-26  1.3e-90        1        1
   7e-26  1.3e-90        1        1

Run #4
       1  8.2e-64 3.6e-108 3.6e-108
8.2e-64        1  3.5e-36  3.5e-36
3.6e-108  3.5e-36        1        1
3.6e-108  3.5e-36        1        1

Run #5
       1  6.3e-90 3.6e-108 3.6e-108
6.3e-90        1 4.2e-135 4.2e-135
3.6e-108 4.2e-135        1        1
3.6e-108 4.2e-135        1        1

Run #6
       1  1.3e-17  5.5e-95  5.5e-95
1.3e-17        1  1.4e-54  1.4e-54
5.5e-95  1.4e-54        1        1
5.5e-95  1.4e-54        1        1

Run #7
       1  1.6e-75  5.5e-95  5.5e-95
1.6e-75        1  7.4e-53  7.4e-53
5.5e-95  7.4e-53        1        1
5.5e-95  7.4e-53        1        1

Run #8
       1  3.9e-60  5.5e-95  5.5e-95
3.9e-60        1 1.6e-101 1.6e-101
5.5e-95 1.6e-101        1        1
5.5e-95 1.6e-101        1        1

1	0.22	0.83	0.79	0.12	0.57
0.22	1	0.19	0.42	0.89	0.8
0.83	0.19	1	0.68	0.5	0.79
0.79	0.42	0.68	1	0.29	0.79
0.12	0.89	0.5	0.29	1	0.76
0.57	0.8	0.79	0.79	0.76	1

1	0.44	0.07	0.39	0.12	0.5
0.44	1	0.52	0.97	0.79	0.93
0.07	0.52	1	0.33	0.2	0.56
0.39	0.97	0.33	1	0.85	0.83
0.12	0.79	0.2	0.85	1	0.42
0.5	0.93	0.56	0.83	0.42	1

1	0.91	0.6	0.33	0.26	0.35
0.91	1	0.02	0.85	0.82	0.5
0.6	0.02	1	0.1	0.26	0.76
0.33	0.85	0.1	1	0.84	0.57
0.26	0.82	0.26	0.84	1	0.68
0.35	0.5	0.76	0.57	0.68	1