Skip to content Skip to sidebar Skip to footer

How Should I Move From Prng Based Generation To Hash-based Procedural Generation?

I want to replace an existing random number based data generator (in Python) with a hash based one so that it no longer needs to generate everything in sequence, as inspired by thi

Solution 1:

First, a big caveat: DO NOT ROLL YOUR OWN CRYPTO. If you're trying to do this for security purposes, DON'T.

Next, check out this question which lists several ways to do what you want, i.e. transform a random uniform variable into a normal one: Converting a Uniform Distribution to a Normal Distribution

Solution 2:

Unless you're doing this for your own amusement or as a learning exercise, my very strong advice is don't do this.

PRNGs have the same general structure, even if the details are wildly different. They map a seed value s into an initial state S via some function f: S←f(s); they then iterate states via some transformation h: Si+1←h(Si); and finally they map the state to an output U via some function g: Ui←g(Si). (For simple PRNGs, f() or g() are often identity functions. For more sophisticated generators such as Mersenne Twister, more is involved.)

The state transition function h() is designed to distribute new states uniformly across the state space. In other words, it's already a hash function, but with the added benefit that for any widely accepted generator it has been heavily vetted by experts to have good statistical behavior.

Mersenne Twister, Python's default PRNG, has been mathematically proven to have k-tuples be jointly uniformly distributed for all k ≤ 623. I'm guessing that whatever hash function you choose can't make such claims. Additionally, the collapsing function g() should preserve uniformity in the outcomes. You've proposed that you "can use the integer version of the hash to create a flat number range, just by taking the modulus." In general this will introduce modulo bias, so you won't end up with a uniformly distributed result.

If you stick with the built-in PRNG, there's no reason not to use the built-in Gaussian generator. If you want to do it for your own amusement there are lots of resources that will tell you how to map uniforms to Gaussians. Well-known methods include the Box-Muller method, Marsaglia's polar method, and the ziggurat method.


UPDATE

Given the additional information you've provided in your question, I think the answer you want is contained in this section of Python's documentation for random:

The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state. This is especially useful for multi-threaded programs, creating a different instance of Random for each thread, and using the jumpahead() method to make it likely that the generated sequences seen by each thread don’t overlap.

Sounds like you want separate instances of Random for each person, seeded independently of each other or with synchronized but widely separated states as described in the random.jumpahead() documentation. This is one of the approaches that simulation modelers have used since the early 1950's so they can maintain repeatability between configurations to make direct comparisons of two or more systems in a fair fashion. Check out the discussion of "synchronization" on the second page of this article, or starting on page 8 of this book chapter, or pick up any of the dozens of simulation textbooks available in most university libraries and read the sections on "common random numbers." (I'm not pointing you towards Wikipedia because it provides almost no details on this topic.)

Here's an explicit example showing creating multiple instances of Random:

import random as rnd

print("two PRNG instances with identical seeding produce identical results:")
r1 = rnd.Random(12345)
r2 = rnd.Random(12345)
for _ inrange(5):
    print([r1.normalvariate(0, 1), r2.normalvariate(0, 1)])

print("\ndifferent seeding yields distinct but reproducible results:")
r1 = rnd.Random(12345)
r2 = rnd.Random(67890)
for _ inrange(3):
    print([r1.normalvariate(0, 1), r2.normalvariate(0, 1)])
print("\nresetting, different order of operations")
r1 = rnd.Random(12345)
r2 = rnd.Random(67890)
print("r1: ", [r1.normalvariate(0, 1) for _ inrange(3)])
print("r2: ", [r2.normalvariate(0, 1) for _ inrange(3)])

Solution 3:

I have gone ahead and created a simple hash-based replacement for some of the functions in the random.Random class:

from __future__ import division
import xxhash
from numpy import sqrt, log, sin, cos, pi

defgaussian(u1, u2):
    z1 = sqrt(-2*log(u1))*cos(2*pi*u2)
    z2 = sqrt(-2*log(u1))*sin(2*pi*u2)
    return z1,z2

classpghash:
    def__init__(self, tuple, seed=0, sep=','):
        self.hex = xxhash.xxh64(sep.join(tuple), seed=seed).hexdigest()

    defpgvalue(self):
        returnint(self.hex, 16)

    defpghalves(self):
        return self.hex[:8], self.hex[8:]

    defpgvalues(self):
        returnint(self.hex[:8], 16), int(self.hex[8:], 16)

    defrandom(self):
        return self.value() / 2**64defrandint(self, min, max):
        returnint(self.random() * max + min)

    defgauss(self, mu, sigma):
        xx = self.pgvalues()
        uu = [xx[0]/2**32, xx[1]/2**32]
        return gaussian(uu[0],uu[1])[0]

Next step is to go through my code and replace all the calls to random.Random methods with pghash objects.

I have made this into a module, which I hope to upload to pypi at some point: https://github.com/UKHomeOffice/python-pghash

Post a Comment for "How Should I Move From Prng Based Generation To Hash-based Procedural Generation?"