Implementation Notes

This section provides implementation details for the current version of the random library.

General Implementation Notes

This is release 1.4.1 of the Swarm libraries. It contains version 0.8 of the random-number library and version 0.81 of the random library documentation.

Look at the Random Library Reference for the objects defined in this library.

`Fat' vs. `Thin' doubles. Note that distributions which use floating point variates from their generators by default draw `fat' doubles (-getDoubleSample) which use two calls to the basic (32-bit) unsigned int sample method (-getUnsignedSample) in order to fill the 53-bit mantissa of a double. If you don't need this much precision, or want to speed up the distributions, you can make the distributions use `thin' samples (-getThinDoubleSample) instead. See the note at the top of random/distributions.h for how to change this behavior. If you do, be sure to remake Swarm (make, make install).

Implementation notes for Generators

Version 0.8: Changes since version 0.75. 

  1. The code was rearranged to conform to create-phase protocol (CREATING-SETTING-USING) ordering.

  2. Some for-loop indices were changed to unsigned integers to eliminate compiler warnings.

  3. A few objects (C2LCGXgen, C4LCGXgen, SWBgen, TGFSRgen) were given their own -drop methods to drop internally allocated arrays properly.

  4. A new -reset method was added to all generators. This method resets the state of the generator to what it was at creation, or at the point when -setStateFromSeed(s) was last used. Counters are also reset.

Version 0.75: Changes since version 0.7. 

  1. The method '-getDoubleSample' was redefined to use only double variables in its implementation (instead of long doubles).

  2. The macros used for starting seed generation were changed to avoid a situation where many new generators would be created the same starting seed (if '--varyseed' was not specified.) See the Generator Usage Guide and the Reference Guide for details.

Version 0.7: Improvements over version 0.6. 

  • A host of new generators, located on the web or in the literature, have been added since the last version of Random. There is now a total of 36 different generators defined! Some of these have immense periods, some are very fast, and some have much better statistical properties than the old generators.

  • A new *type* of generator, the `split' generator, has been introduced in the form of L'Ecuyer's C2LCGXgen and C4LCGXgen generators.

  • A `split' generator is a long-period generator for which we are able to split the period into arbitrary sub-periods, which we can access quickly. We then configure the generator as having a number (A) of `virtual generators', each of which can address a number (2^v) of sub-segments of length 2^w. These parameters (A,v,w) are user selectable when the generator is created. (As an example, for C4LCGXgen the default values are A=128, v=31, w=41.) The advantage is that the subsegments act as statistically independent streams of random numbers.

  • In addition to the -getUnsignedSample method, generators now also supply floating point output in the range [0.0,1.0), in the form of these methods:

       -(float)       getFloatSample;		// using 1 unsigned value
       -(double)      getThinDoubleSample;		// using 1 unsigned value
       -(double)      getDoubleSample;		// using 2 unsigned values
       -(long double) getLongDoubleSample;		// using 2 unsigned values

    Note that the last method is not portable across architectures, since the length of a long double varies between machines.

  • Generators may now be started with a single seed, *or* with a vector of seeds whose length is generator dependent. (PMMLCG requires 1 integer for a seed, while MT19937 needs 624 of them.)

  • Generators now remember what seed values they were started with. They also count how many variates they have delivered (i.e., how many calls to -getUnsignedSample they have serviced.)

  • There are a few arbitrary seed values, DEFAULTSEED, DEFAULTSEED1, DEFAULTSEED2, DEFAULTSEED3, DEFAULTSEED4 defined. There is also the value FIRSTSEED, which returns the value that the default generator `randomGenerator' was started with.

  • The macro NEXTSEED will generate a deterministic sequence of seed values, using and inline LCG and starting with FIRSTSEED. There is the macro RANDOMSEED, which will be different every time it is invoked because it depends on program time. And there is value STARTSEED, which will by default equal NEXTSEED, but will instead be equal to RANDOMSEED if you start your program with the --varyseed or -s command line parameter.

  • The generators have gained a new creation method, '+createWithDefaults: aZone', which creates the generator and initializes it with STARTSEED. Split generators get default values for A,v,w.

Version 0.7: Changes since version 0.6. 

  • The generator classes have changed names to where they all end in '-gen'. A simple search-and-replace in your code will get you up and running again.(Or perhaps you'll want to try one of the new generators?)

  • A bug in SWBgen was corrected. Code for ACG and SCG was also changed.

  • The -verifySelf method is gone. Instead see the test program located in /random/testR0. (Available in a separate tarball at the SFI ftp site.)

  • The `getState:' method has been named `putStateInto: (void *) buffer', and the `setState:' method is now `setStateFrom: (void *) buffer'. A quick search-and-replace fixes things in your code.

  • Note: these methods have also changed somewhat, as has the size of the data being saved. As a result, v. 0.7 generators will refuse to `setStateFrom' data saved by v. 0.6 objects.

  • There should be fewer changes like this in the next release.

Testing Generators.  Since v. 0.6 we have done some rudimentary statistical testing of the implemented generators, using Marsaglia's Diehard tests and the ENT tests. The results of these tests are summarized in Generator quality table (now found in the Swarm User Guide), where test results as well as period length, state size and execution times are listed. You can use these data to select a generator that suits your simulation. Some brief comments:

  1. the tests show that old generators SCG and LCG are of poor quality and should be avoided.

  2. the lagged-Fibonacci based generators (ACG, SWB, PSWB) all fail Diehard's `Birthday spacings test', for reasons having to do with their lattice structure. These generators are only conditionally recommended.

  3. The rest of the 32-bit generators (i.e. generators that fill all 32 bits of an unsigned int) pass all tests, and are recommended at this time. (Note that while a test may show that a generator is bad, passing a number of tests does not prove that a generator is good!)

  4. The 31-bit generators all fail the same set of tests. Some of these cannot be passed by a generator whose output has a `stuck' bit. Until I clear up with Prof. Marsaglia how to interpret these results, I believe all the 31-bit generators are in the `recommended' category.

  5. However, a cautionary note: while the PMMLCG generators pass the tests, they have a very short period ( less than 2^31 ) and should only be used for `toy' simulations. You don't want your generator(s) to `go around' and start repeating themselves !

  6. For what it's worth, Professor L'Ecuyer recommends his own C4LCGX and C2MRG3 generators as well as Matsumoto's TT800 (the monster MT19937 hadn't been released yet), and Prof. Marsaglia recommends his own Multiply-With-Carry generators (MWCA, MWCB, C3MWC, RWC2, RWC8="Mother").

Implementation notes for Distributions

Version 0.8: Changes over version 0.75. No functional changes were made. Code was rearranged to conform to create-phase protocol (CREATING-SETTING-USING) ordering.

Version 0.75: Changes over version 0.7. No functional changes were made.

Version 0.7: Improvements over version 0.6. 

  • One new distribution class, BernoulliDist, has been added. It returns binary values (yes/true/1) with a given probability (while the old RandomBitDist has a fixed 50% probability, a fair coin toss.)

  • Distributions now have a new create method, '+createWithDefaults: aZone'. This method creates the distribution object, and also a new generator object for its exclusive use. Each distribution class has a different default generator class assigned. These generators are initialized with STARTSEED, which by default equals the fixed value DEFAULTSEED, but will be equal to the varying RANDOMSEED if you start your program with the command line parameter --varyseed or -s.

  • All distributions have code to interact with the new `split' generators.

  • UniformIntegerDist and UniformUnsignedDist now allow you to set parameter minValue equal to maxValue. In this case that value is returned every time.

  • UniformDoubleDist also allows this, even if the set [x,x) is mathematically suspect ...

  • NormalDist and LogNormalDist now allow you to specify zero Variance, in which case the values returned are the Mean and exp(Mean) respectively.

Version 0.7: Changes since version 0.6. 

  • The distribution classes have changed names to where they all end in `Dist'. A simple search-and-replace in your code will get you back up and running.

  • The strong distinction between `frozen' and `un-frozen' distribution objects in v. 0.6 has been softened considerably. You may now set and reset the default parameters as often as you wish, and you may make calls for variates with given parameters even if different default parameters have been set.

  • The generation of uniform(0,1) floating point values has been moved from the distribution objects into the generator objects. Thus, if all you need is a uniform(0,1) double, you have no need of a distribution but can get what you desire from a generator.

  • Note that the generators fill the mantissa of a double from two 32-bit unsigned values in a different manner from v. 0.6 distributions, so output will be a bit different in the new version.

  • A bug in LogNormalDist has been fixed.

  • The `getState:' method has been named `putStateInto: (void *) buffer', and the `setState:' method is now `setStateFrom: (void *) buffer. A quick search-and-replace fixes things in your code.

  • But note: these methods have also changed somewhat, as has the size of the data being saved. As a result, v. 0.7 distributions will refuse to `setStateFrom' data saved by v. 0.6 objects.

Utility Objects Provided. The following objects have been defined in random/random.m. They may be accessed and used from anywhere in your program.

id <SimpleRandomGenerator>   randomGenerator;
id <UniformIntegerDist>      uniformIntRand;
id <UniformUnsignedDist>     uniformUnsRand;
id <UniformDoubleDist>       uniformDblRand;

The 3 distribution objects all draw their random numbers from the MT19937 generator, which has a period of 2^19937 (10^6001) and is quite fast.

Programming yet to do

Like many Open Source projects, this random-number library is a work in process. Further developments still on the to-do list are detailed below.

The following changes and additions are contemplated for the next release of Random for Swarm (though I don't promise they'll all make it in -- nor when that next release will be):

  1. ADD a few more generators. It's good to have many different types of generators, so you can test your model's results with different generators and ensure that the results aren't artifacts of the generator used. And people seem to insist on inventing new, better ones with longer periods!

  2. ADD more distributions. NOTE: if you have any strong opinions about what distributions or generators need to be added, please e-mail me!

  3. TEST the generators, using Marsaglia's Diehard battery, or L'Ecuyer's tests when/if those become available. This will allow us (a) to make a choice between the implemented generators on the basis of their statistical quality, (b) to decide what old and bad generators to remove, and (c) to detect any bugs in my implementation.

  4. TEST the distributions, to make sure they actually put out numbers according to the probability distribution and parameters used.

  5. ELIMINATE 'bad' old generators on the basis of statistical tests

  6. RETAIN PMMLCG as the only short-period generator, for convenience

  7. ADD an 'empirical' distribution, whose f is defined by a set of user- supplied data

  8. IMPLEMENT a version of getState/setState that is portable across machine architectures, so that simulations may be moved to or duplicated on other machines. (The problem: integers and doubles are stored in different byte orders on different systems.)

  9. IMPLEMENT a proper -drop method for generators that allocate their state vectors dynamically, freeing the state vector memory to avoid `memory leakage'

  10. REVIEW all objects for ways to make the crucial methods run faster

  11. ADD code to make all objects meter their own usage and send the author monthly e-mails in a stealthy manner, so he can monitor usage and perhaps start collecting a usage fee for his efforts ... ;-) (Suggested by Rick Riolo. Thanks, Rick!)

Can you think of anything else? Drop me a note! -- Sven Thommesen