(Those not interested in puzzle solving or computer science might want to skip this one.)

Let’s say you’re trying to write an algorithm that generates a distinct 1:1 mapping between two sets of n numbers. This mapping is built dynamically, selected randomly, and stored between events.

One solution is to generate all the random mappings at one time, 1 to n. This is effective and easy, but wastes space, especially if few of these numbers are ever looked up.

Another solution is, when you first encounter an unmapped number, randomly select a number 1 to n, check that the random number you chose is not being already used for a mapping for another number, and if not, store the mapping.

This works pretty well, up to a point. Let’s assume n=10. On your first unmapped number, any number 1 to 10 will work. On your second unmapped number, randomly you’ll still pick 1-10, but you’ll have a 10% chance of picking the number you picked for the first number. When that happens, you pick again 1 to 10. If that number is available, store it, and move on.

This works as long as you don’t need to generate a mapping for all of n (just some numbers in n.) When you try to pick the nth number using the random algorithm, there’s only a 1/nth chance of selecting the last available mapped number. This means in theory you have to pick n times for the nth number — in practice, I have a hunch it’s much worse than that.

At some point, it makes sense to fall back to the first algorithm – generate a list of known original numbers and their mappings, take from that the yet-unmapped original numbers, and the pool of available to-be-mapped numbers, randomly order the to-be-mapped numbers, and apply them to the rest of the pool. This sounds like better approach than hoping the nth mapping random mapping will pick the last remaining available number.

But where is this cutoff between randomly selecting a single element, versus iterating over all the previous selections and completing all the outstanding mappings? Do I make the call on the percentage of n completed? (For example, after n/2 elements, iterate over the rest.) That seems wasteful if I’m only going to use (n/2)+1 variables, and the effeciency of the random algorithm might not decay until much later.

Or, instead of looking at basing the cutoff on the percentage of n, base it off how many random retries any mapping requires. If there are too many collisions, no matter where it lies in n, quit the random approach and iterate over the rest. (For example, for any number in n, if it took more than n/2 tries to pick an available number, assume the random algorithm has failed you, and iterate.) But that’s a harder cut off to determine….

It makes sense that after n/2 the per-random selection will less efficient for each new number. I hypothesize that the cost of the iterative algorithm at that point will be less than letting the random-selection one continue as the number of available options decreases. But what about stopping it after x retries? This might only happen on the last few elements, if x was high enough… It could also happen before n/2 and in that case you ate up a bunch of space you didn’t need.

Which approach do you think is better, and where would you make the cutoff?