The class RandomBitSteam
is supposed to model a binary random variable
(with 50% probability of being "true" and 50% of being "false", and
successive samples are independent).
There are three ways of creating a new RandomBitSteam
object:
RandomBitStream RBS1; // default ctor, seeded with 1 RandomBitStream RBS2(n); // seed pseudo-random generator with n, positive unsigned long RandomBitStream RBS3(0); // seed pseudo-random generator from current time
Note that seeding from current time will produce different results each
time the program is run; this is likely desirable unless you're trying to
debug a randomized algorithm. Conversely, if you create more than one
RandomBitStream
object with the same seed, they will each produce
exactly the same sequence of bits.
Once you have created a RandomBitStream
you may perform the following
operations on it:
*RBS // get the current value of RBS (as a boolean). ++RBS // advance to next value of RBS. RBS++ // advance to next value of RBS *INEFFICIENTLY*. SampleBool(RBS) // advance RBS and then return current value; same as ``*++RBS`` SampleLong(RBS) // produce a uniform random (signed) long; may advance RBS by more than the number of bits in a ``long`` prob(P, RBS) // use bits from RBS to produce a boolean with prob P of // being true; on average uses less than 2 samples from RBS per call. out << RBS // print some information about RBS. RBS.myIndex() // number of times RBS has been advanced from initial seeding, same as the number of random bits generated.
Note that a RandomBitStream
supports input iterator syntax.
Note prob
should work correctly even with probabilities extremely close to 0
(e.g. < 1/10^10).
You may assign or create copies of RandomBitStream
objects; the copies
acquire the complete state of the original, so will go on to produce exactly
the same sequence of bits as the original will produce.
The idea is very simple: use the pseudo random number generator of GMP to generate
a random integer, and then give out the bits of this integer one at at time;
when the last bit has been given out, get a new random integer from the GMP generator.
The random integer is kept in the data member myBuffer
, and myBitIndex
is
used to read the bits one at a time.
Among the data members, there are two mild surprises: mySeed
and
myCounter
. I put these in to help any poor blighter who has to debug a
randomized algorithm, and who may want to "fast forward" the RandomBitSteam
to the right place.
The data member myState
holds all the state information used by the GMP
generator. Its presence makes the ctors, dtor and assignment messier than
they would have been otherwise. The data member myMPZ
is used as a very
temporary buffer to hold the value produced by the GMP generator; the value
is copied directly into myBuffer
Most of the member functions are quite simple.
The advancing and reading member functions (i.e. operator++
and operator*
)
are inline for efficiency, as is the SampleBool
function. The condition for
refilling myBuffer
is when the index goes beyond the number of bits held in
myBuffer
.
myFillBuffer
is a little messy because the value generated by GMP is placed
in an mpz_t
value. Note that myBitIndex
is set to zero; it seemed most
sensible to do this here.
The function SampleLong
may discard some random bits. It hardly seems worth
the effort of trying to be economical and avoiding the waste.
The function prob
is nifty; if you think about it for a moment, it is
obviously right (and economical on random bits). It would be niftier still
if the probability were specified as an unsigned long
-- on a 64-bit
machine this ought to be sufficient for almost all purposes. If the requested
probability is of the form N/2^k for some odd integer N, then the average
number of bits consumed by prob
is 2-2^(1-k), which always lies between 1 and 2.
If the requested probability is 0 or 1, no bits are consumed.
Should the syntax for seeding from current time be more visibly different from the other ctor calls?
Is run-time speed so important here? How many algorithms really consume millions of random bits? Surely the computations on the random bits/values should exceed the cost of generating them, shouldn't it?
Is it really worth having inline functions here? operator++
is long
enough that it is hard to justify keeping it inline.
Is it really worth using a bitset<>
for the buffer? Do I genuinely gain
anything worthwhile over simply keeping the GMP value and using mpz_tstbit
?
Should SampleBool
advance *before* getting the value, or vice versa?
Should the RandomBitSteam
object buffer up more values from GMP
to avoid calling mpz_urandomb
too many times? I'm not sure how expensive
mpz_urandomb
is? Does it really matter?
Is the information printed by myOutputSelf
adequate? Time will tell.