Today problem is a probability
problem.
This problem was asked by Facebook:
Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability.
There are many variations of such problems, and before solving it, I want to show some basic examples that I met.
Most trivial one is picking one random element from an array.
Every programming language has a function to generate a pseudo-random
number (int
or float
) within the given range.
If you think of an array A as of N
numbers, it’s clear how to pick random one: generate number
i from 0 to N, and a[i] is the
answer.
func oneRandom(a []int) int {
:= rand.Intn(len(a))
i return aa[i]
}
Sometimes you need to pick N random elements from an array.
In this case, you can use the same approach and pick N indexes from 0 to len(A).
func nRandom(a []int, n int) []int {
:= make([]int, n)
result for i := 0; i < n; i++ {
[i] = oneRandom(a)
result}
return result
}
However, if you need to pick N random elements, they also have to be different.
In this case, you can pick N different indexes and make sure that they are different:
func nDifferentRandom1(a []int, n int) []int {
:= map[int]bool{}
m for len(m) != n {
:= rand.Intn(len(a))
j [j] = true
m}
:= make([]int, n)
result for i := range m {
[i] = a[i]
result}
return result
}
But it’s not very efficient, because you can spend much time trying to pick the last index when you need to pick 8 elements from an array of length 10.
In this case, you can use the well-known zero-allocation algorithm to do so. The idea is to move elements you picked to the beginning of an array, and choose from others next:
func nDifferentRandom2(a []int, n int) []int{
for i := 0; i < n; i++ {
:= a[i:] // define a slice to pick from
chooseFrom := rand.Intn(len(chooseFrom)) // pick a random index from i to N
choosenIndex [i], a[choosenIndex] = a[choosenIndex], a[i] // swap choosenElement with i
a}
// after all, we have choosen elements at the
// begining of an array, and probability is always same.
return a[:n]
}
Let’s return to problem #15 and try to solve it using previous examples.
What do we have:
We understand that to pick a random element from the stream with uniform probability, we need to process hole stream somehow (without storing to memory).
Well, we always can iterate over a stream (but just once).
If we knew the length of a stream, we could use the approach from the first example, that would be perfect.
In first example each element of an array have a linked number - it’s index. Moreover, we used a function that returns number from 0 to n with uniform probability within that range to pick one.
So, in this case, we can assign a random value to each element with the same probability, and choose between them based on it linked number.
Let’s do it: for each element, we generate a float number between [0..1]. Also, we remember maximum value that we generated, an element from the stream associated with it - random element.
func solution(in <-chan int) <-chan int {
:= make(chan int, 1)
res go func() {
var result int
var lastprob float64
for a := range in {
:= rand.Float64()
prob if prob > lastprob {
= a
result = prob
lastprob }
}
<- result
res }()
return res
}