Imagine a server-client game application.
While running, the server is constantly generating numbers for various random events and sending them over the network to inform the clients of the result.
I realized it would save alot of bandwidth if I made my rng deterministic and would only transmit a seed value at the start of the game and the number of random calls that have been made on that random instance in the past so that my clients can reproduce a similar random instance and the next generated value will be the same for all clients. Is that a reliable option in terms of security and synchronization and apart from predictability?
Entropy
Sending random numbers and sending seeds is the same thing: sending entropy.
Consider that in the case of sending random numbers, you can interpret that each random number sent is a seed for a random number generator used by the client, and this random number generator has a cycle of only one number.
Consider also that if you use a random number generator that generates 2^63-1 values with a seed of 32 bits of entropy, then the sequence has very little entropy... which means they are easy to crack.
Have a look at "Cracking Random Number Generators" from all that jazz.
My question refers to an imaginary server-client application and was meant in general. So you can assume a worst case scenario
I do not know about worst case scenario. Yet, I will take a case that I think is fairly high: 2 bit per millisecond.
So, that is 2000 bits per second, or if you prefer it is about 33 bits (roughly an int32) per frame in a 60 fps game.
That mounts to slightly below 7MB per hour. It does not sound that crazy at this scale, does it?
Numbers are the same for all clients.
Generate a random file, and allow clients to download it. Use something based on TCP so that they all get the same bits. You server would have to generate one file per hour, and there should be some metadata so the clients can synchronize.
Yes, HTTP would work. Note: We are not talking bandwidth yet.
However, when it comes to generate the random numbers, the standard random number generators will start to show patterns.
See Random.org article on statistical analysis.
The following is from "Cracking Random Number Generatos" (part 3):
A much more popular PRNG algorithm is the Mersenne Twister. This is used by Ruby's rand() function, Pythons random module, and PHP's mt_rand() function.
The Mersenne Twister was invented in 1997. The most common implementation of it uses an internal state of 624 32 bit words (integers).
So, with only 624 * 32 bits = 19968 bits (a little less than 20 KB) you can work out the sequence, if you are generating numbers this way, sending more numbers gives no more entropy to the client.
In fact, if you need strong guarantees on the random numbers (for example, if you are running an online casino), consider random.org services. They can generate over 2000 bits per second, so I would recommend their service if you need this many random numbers.
Note: For this rate of output, something like Microsoft Crypto API will work ok.
Security
you can assume a worst case scenario
You want an algorithm that generates random numbers that has the property that an attacker won't be able to predict the next numbers. That is, you want a Cryptographically secure pseudorandom number generator. If we are talking about Java, our first stop is SecureRandom. With SecureRandom you do not have control over the input entropy, the system will gather entropy from whatever source the underlying operating system provides.
From SecureRandom (Java Platform SE 8 ) - Oracle:
Note: Depending on the implementation, the generateSeed and nextBytes methods may block as entropy is being gathered, for example, if they need to read from /dev/random on various Unix-like operating systems.
Otherwise, you will need a third party solution (or cook one yourself). I suggest looking for an implementation of Fortuna which will give you more control on how you provide entropy.
What do you mean by "stepping between sender and receiver"?
I think Bálint was talking about Man in the Middle attacks.
The bad news is that a third party could try to pretend to be server and give you false numbers. Use a secure channel (SSL, HTTPS, etc...).
The good news is that everybody gets the same numbers. So there is no need for session or tokens, as the client does not need to identify itself. Which also means that session hijacking or impersonation are non issues when it comes to this random number generator.
However, a side effect is that it could be possible (with a bit of reverse engineering) to figure out how to create a tool that can download the data files and run the random number generator ahead of time, and perhaps do predictions of what will happen in the game. Which is another reason to generate small files often (multiple in a day) instead of generating a large file for the day or the week.
Edit: I guess you could use authentication to protect the data files. However that just means that the guys doing the tool need a valid account. I suggest to not bother. Assume they will make the tool.
Synchronization
Every client must do the same rolls. That means that you need to design the game in such way that rolls are not conditional.
The reason is that if a client does a roll but another client does not, they are now out of sync.
This also means that you should have a plan for exactly how the random number generator will be used.
Consider using two random number generators. One for the big stuff that needs to stay in sync, and one for any other stuff that does not have to be as strong as the first one.
Then, there is timing. You wan two things to make this work:
- Publish the data files and the time at which they should start using it.
- Have a side channel to synchronize clocks.
Note: Another thing you can do, is limit the number of times a generator is used. If you game knows how many numbers it has to generate, that is. Which I recommend you to know.
The server should have available the next data file and at what time the clients should use it. Then the clients will download it and prepare a random number generator ahead of time and when the time comes they switch to the new one... and start downloading the next.
However, we need them to switch to the new one at the same time. Which is why we need to synchronize clocks. And synchronize often. A query to an ntp often.
What if you used a generator a few extra times? Go back and redo them with the new one. I would suggest to introduce a natural exit point when the change is happening (having natural exit points is good anyway).
Bandwidth
If you want to run SecureRandom, it must be in the server because you cannot seed it. In this case, you will be serving those 7MB files.
you can assume a worst case scenario
21 million users?
20 million * 7 MB / hour = 311.1 GB/s.
About 102 PB per month. Note: Those are PetaBytes - TeraBytes are suddently small.
I woudl have said Get a global CDN solution and let them handle the traffic. But this crazy.
Let us put some of the work on the client. If we use Fortuna random number generators, we can provide about 12KB of useful entropy and have it virtually never run out (Fortuna reseeds itself).
If we stay with the idea of providing entropy once per hour,
20 million * 12 KB / hour = 533.3 MB/s.
About 176 TB per month.
It could work with only 2KB of entropy (I was compensating for the entropy input being infrequent). In that case we would be talking of only about 30 TB per month.
Get Amazon CloudFront.