In this post, I will dive into a hard question that involves combinatorics, modular operations of the division among two large numbers. The original question is located at 2954. Count the Number of Infection Sequences, and it is actually the last question of LeetCode Weekly Contest 374.

Problem Analysis

An important observation is that

Given an infected child at position i, the infection sequence of the children left to i has nothing to do with the infection sequence of the children right to i.

In other words, given a list of infected positions sick, it essentially divides all of the children into several independent groups and we can first consider how the infection sequence would look like inside each group and then consider how the final whole sequence can be when the infection sequence inside groups are determined.

There are two types of groups, and let us first consider how many possible infection sequence in each group (assume the group size is k).

Groups that between two infected children. In such cases, at any time, we have two possible choices (infect the left most or right most uninfected child) until the last 1 left. Therefore, the total number of possible infection sequences is $2^{k-1}$. Note that this number can be really large, and python allows us to compute the modular of this number efficiently using pow(2,k-1,modv) where modv = 10 ** 9 + 7 in our case.
Groups at the beginning or end (i.e., adjacent to only one infected child). In this case, at any time, we have only 1 choice to infect. Therefore, there is only 1 possible infection sequence.

The python code for this part is

modv = 10 ** 9 + 7
m = len(sick)
# records all sizes of groups
size = [sick[0]] if sick[0] > 0 else [] # group at beginning
ans = 1

for i in range(1,m):
    l = sick[i] - sick[i-1] - 1
    if l > 0:
        size.append(sick[i] - sick[i-1] - 1)
        ans = (ans * pow(2, l -1, modv)) % modv

# group at end
if sick[-1] < n-1:
    size.append(n-sick[-1] -1)

The Combinatorial Problem

Now, we know the size of each group and how many possible ways to infect the children within each group. The next question is

Given one certain infection sequence of each group, how many ways we can formulate the final sequence. In other words, if the total size is $t$, the question is asking how many possible permutation of t children, such that the order of element inside each group is retained.

Assume the size of all groups are $s_1,s_2,…,s_n$, then we can solve the above using two different ways (the final answer is the same).

Merge One by One

Let’s first compute the answer when there is only two groups. In this case, it is quite straight forward. Because the group size are $s_1$ and $s_2$, the final sequence length would be $s_1 + s_2$, and we can first choose $s_1$ positions from them and fill in the infection sequence of $s_1$ with the same order, then we can fill the remaining positions with the infection sequence of $s_2$. Therefore, the answer is

$$C_{s_1+s_2}^{s_1} = \frac{(s_1+s_2)!}{s_1 ! s_2 !}$$

Similarly, we can further merge with the above results with new group $s_3$. The total length would be $s_1 + s_2 + s_3$, and we can first choose $s_1+s_2$ positions out of them and fill in the previous result sequence, then we fill the remaining positions with the infection sequence of $s_3$. The number of ways is now

$$C_{s_1+s_2+s_3}^{s_1 + s_2} = \frac{(s_1+s_2 + s_3)!}{(s_1 + s_2) ! s_3 !}$$

We can keep doing this and then product everything together and cancel some terms in the middle, and we have

$$
\frac{(s_1+s_2)!}{s_1 ! s_2 !} * \frac{(s_1+s_2 + s_3)!}{(s_1 + s_2) ! s_3 !} * …* \frac{(s_1+s_2 + … + s_n)!}{(s_1 + …+ s_{n-1}) ! s_n !} \\
= \frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!}
$$

Consider all Permutation

The total number of elements is $s_1 + … + s_n$, thus the total number of permutation is $(s_1+s_2 + … + s_n)!$.

However, we can easily see that, not all sequence satisfied the given element order in each group. Specifically, for the first group, among all the sequences, only $\frac{1}{s_1 !}$ of them satisfies the given order for group one. Similarly, for group 2, among all the remaining sequences, only $\frac{1}{s_2 !}$ satisfies the conditions.

Therefore, at the end, the number of sequences that satisfy all the order within each group is

$$
\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!}
$$

Note that the above analysis is not a rigorous proof, but I believe you can get the idea.

Now the problem is

How can we compute $\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!} \% (10^9 +7)$ without having integer overflow!

Although python support large integers and there is no overflow. However, give that $s_1 + s_2 + … + s_n$ can up to $10^5$, the direct computation would be very slow because of handling very large integers, which leads to TLE.

Division Modular and Modular Multiplicative Inverse

We know that
$$(a+b) \% m = (a \% m + b \% m) \% m$$
and
$$(a \cdot b) \% m = (a \% m \cdot b \% m) \% m$$

However
$$(a/b) \% m \ne (a \% m / b \% m) \% m$$

So the question is

how can we compute $\frac{a}{b} \% m$ when it is impossible to compute $\frac{a}{b}$ first?

Modular Multiplicative Inverse

Let us consider the following equation

$$bx + my = 1$$

if we can find $x,y$ such that the above formula holds, then we will have

$$\frac{a}{b} \% m = (ax) \% m$$

The proof is as follows.
$$
\frac{a}{b} \% m = z \Rightarrow \frac{a}{b} = km + z \Rightarrow a=kmb+bz \Rightarrow ax = kbxm + bx z \\
ax \% m = (bxz) \% m = ( (bx \% m) \cdot (z \% m ) ) \% m = (1 \cdot z) \% m = z
$$

Note that bx%m = 1 because we assume that equation bx+my = 1 holds. In addition, $z < m$ because it is the residual after modular m.

Now the question is

How can we compute the coefficient x and y ?

Extended Euclidean Algorithm

Euclidean algorithm is often used to compute the Greatest Common Divisor (GCD) of two numbers, it works as follows.

def get_gcd(a,b):
    if b == 0:
        return a
    return get_gcd(b, a % b)

Now, in addition to find the GCD of the two numbers $a,b$, we are also interested in the coefficients $x,y$ such that we have
$$
ax+by = gcd(a,b)
$$

The problem can be solved using extended Euclidean algorithm, it works as follows.

Base Case
When b == 0, we know the GCD is a, we return the coefficients as x=1,y=0.
Recursive
When we go up one level through the recursion tree, assume $x’,y’$ is the coefficients returned from the base case. We now have
$$
bx’ + (a \% b)y’ = bx’ + (a - (a//b) * b)y’ = ay’ + b(x’ - (a//b)y’) = gcd(a,b)
$$
From above formula, we find the new coefficient
$$
x = y’, y = x’ - (a//b)y’
$$

The recursive python implementation is shown as follows.

def egcd(a,b):
    if b == 0:
        return a, 1, 0
    d,x,y = egcd(b, a % b)
    return d, y, x - (a//b) * y

In our case, the b here is our modular number $10^9 +7$, which is prime. So for any $a$, we have $gcd(a,b) = 1$. Therefore, we can simply call egcd(a, modv) to get the modular multiplicative number of a.

Solution of the Combinatorial Problem

Now the way of compute
$$
\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!} \% (10^9 +7)
$$
should be quite clear.

Let $S = s_1 + s_2 + … +s_n$ and modv = 10 ** 9 +7.
Compute factorial(i) % modv for all i from 1 to S.
Initialize ans to be factorial(S) % modv
For each group size s, we first compute the modular inverse of factorial(s) % modv as x using function egcd (note that factorial(s) % modv has same modular inverse as factorial(s)), then we update ans as (ans * x) % modv.
return the final answer.

Python Solution

We still need to multiple the answers from two sub problems (ways inside/outside group) to get the final answer, details are shown below.

def numberOfSequence(self, n: int, sick: List[int]) -> int:
    modv = 10 ** 9 + 7
    m = len(sick)
    size = [sick[0]] if sick[0] > 0 else []
    ans = 1

    for i in range(1,m):
        l = sick[i] - sick[i-1] - 1
        if l > 0:
            size.append(sick[i] - sick[i-1] - 1)
            ans = (ans * pow(2, l -1, modv)) % modv

    if sick[-1] < n-1:
        size.append(n-sick[-1] -1)

    # now it is a combinatorial problem
    total = sum(size)
    fac_mod = [0,1]
    p = 1
    for i in range(2, total +1):
        p = (p*i) % modv
        fac_mod.append(p)

    def egcd(a,b):
        if b == 0:
            return a, 1, 0
        d,x,y = egcd(b, a % b)
        return d, y, x - (a//b) * y

    def module_inverse(a):
        d,x,y = egcd(a,modv)
        return x

    ans = (ans  * fac_mod[total]) % modv
    for s in size:
        ans = (ans * module_inverse(fac_mod[s])) % modv

    return ans

Problem Analysis
The Combinatorial Problem
1. Merge One by One
2. Consider all Permutation
Division Modular and Modular Multiplicative Inverse
1. Modular Multiplicative Inverse
2. Extended Euclidean Algorithm
Solution of the Combinatorial Problem
Python Solution