In this post, I will dive into a hard question that involves combinatorics, modular operations of the division among two large numbers. The original question is located at 2954. Count the Number of Infection Sequences, and it is actually the last question of LeetCode Weekly Contest 374.
Problem Analysis
An important observation is that
Given an infected child at position i, the infection sequence of the children left to i has nothing to do with the infection sequence of the children right to i.
In other words, given a list of infected positions sick
, it essentially divides all of the children into several independent groups and we can first consider how the infection sequence would look like inside each group and then consider how the final whole sequence can be when the infection sequence inside groups are determined.
There are two types of groups, and let us first consider how many possible infection sequence in each group (assume the group size is k).
- Groups that between two infected children. In such cases, at any time, we have two possible choices (infect the left most or right most uninfected child) until the last 1 left. Therefore, the total number of possible infection sequences is $2^{k-1}$. Note that this number can be really large, and python allows us to compute the modular of this number efficiently using
pow(2,k-1,modv)
wheremodv = 10 ** 9 + 7
in our case. - Groups at the beginning or end (i.e., adjacent to only one infected child). In this case, at any time, we have only 1 choice to infect. Therefore, there is only 1 possible infection sequence.
The python code for this part is
1 | modv = 10 ** 9 + 7 |
The Combinatorial Problem
Now, we know the size of each group and how many possible ways to infect the children within each group. The next question is
Given one certain infection sequence of each group, how many ways we can formulate the final sequence. In other words, if the total size is $t$, the question is asking how many possible permutation of t children, such that the order of element inside each group is retained.
Assume the size of all groups are $s_1,s_2,…,s_n$, then we can solve the above using two different ways (the final answer is the same).
Merge One by One
Let’s first compute the answer when there is only two groups. In this case, it is quite straight forward. Because the group size are $s_1$ and $s_2$, the final sequence length would be $s_1 + s_2$, and we can first choose $s_1$ positions from them and fill in the infection sequence of $s_1$ with the same order, then we can fill the remaining positions with the infection sequence of $s_2$. Therefore, the answer is
$$C_{s_1+s_2}^{s_1} = \frac{(s_1+s_2)!}{s_1 ! s_2 !}$$
Similarly, we can further merge with the above results with new group $s_3$. The total length would be $s_1 + s_2 + s_3$, and we can first choose $s_1+s_2$ positions out of them and fill in the previous result sequence, then we fill the remaining positions with the infection sequence of $s_3$. The number of ways is now
$$C_{s_1+s_2+s_3}^{s_1 + s_2} = \frac{(s_1+s_2 + s_3)!}{(s_1 + s_2) ! s_3 !}$$
We can keep doing this and then product everything together and cancel some terms in the middle, and we have
$$
\frac{(s_1+s_2)!}{s_1 ! s_2 !} * \frac{(s_1+s_2 + s_3)!}{(s_1 + s_2) ! s_3 !} * …* \frac{(s_1+s_2 + … + s_n)!}{(s_1 + …+ s_{n-1}) ! s_n !} \\
= \frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!}
$$
Consider all Permutation
The total number of elements is $s_1 + … + s_n$, thus the total number of permutation is $(s_1+s_2 + … + s_n)!$.
However, we can easily see that, not all sequence satisfied the given element order in each group. Specifically, for the first group, among all the sequences, only $\frac{1}{s_1 !}$ of them satisfies the given order for group one. Similarly, for group 2, among all the remaining sequences, only $\frac{1}{s_2 !}$ satisfies the conditions.
Therefore, at the end, the number of sequences that satisfy all the order within each group is
$$
\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!}
$$
Note that the above analysis is not a rigorous proof, but I believe you can get the idea.
Now the problem is
How can we compute $\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!} \% (10^9 +7)$ without having integer overflow!
Although python support large integers and there is no overflow. However, give that $s_1 + s_2 + … + s_n$ can up to $10^5$, the direct computation would be very slow because of handling very large integers, which leads to TLE.
Division Modular and Modular Multiplicative Inverse
We know that
$$(a+b) \% m = (a \% m + b \% m) \% m$$
and
$$(a \cdot b) \% m = (a \% m \cdot b \% m) \% m$$
However
$$(a/b) \% m \ne (a \% m / b \% m) \% m$$
So the question is
how can we compute $\frac{a}{b} \% m$ when it is impossible to compute $\frac{a}{b}$ first?
Modular Multiplicative Inverse
Let us consider the following equation
$$bx + my = 1$$
if we can find $x,y$ such that the above formula holds, then we will have
$$\frac{a}{b} \% m = (ax) \% m$$
The proof is as follows.
$$
\frac{a}{b} \% m = z \Rightarrow \frac{a}{b} = km + z \Rightarrow a=kmb+bz \Rightarrow ax = kbxm + bx z \\
ax \% m = (bxz) \% m = ( (bx \% m) \cdot (z \% m ) ) \% m = (1 \cdot z) \% m = z
$$
Note that bx%m = 1
because we assume that equation bx+my = 1
holds. In addition, $z < m$ because it is the residual after modular m.
Now the question is
How can we compute the coefficient x and y ?
Extended Euclidean Algorithm
Euclidean algorithm is often used to compute the Greatest Common Divisor (GCD) of two numbers, it works as follows.
1 | def get_gcd(a,b): |
Now, in addition to find the GCD of the two numbers $a,b$, we are also interested in the coefficients $x,y$ such that we have
$$
ax+by = gcd(a,b)
$$
The problem can be solved using extended Euclidean algorithm, it works as follows.
- Base Case
Whenb == 0
, we know the GCD is a, we return the coefficients asx=1,y=0
. - Recursive
When we go up one level through the recursion tree, assume $x’,y’$ is the coefficients returned from the base case. We now have
$$
bx’ + (a \% b)y’ = bx’ + (a - (a//b) * b)y’ = ay’ + b(x’ - (a//b)y’) = gcd(a,b)
$$
From above formula, we find the new coefficient
$$
x = y’, y = x’ - (a//b)y’
$$
The recursive python implementation is shown as follows.
1 | def egcd(a,b): |
In our case, the b here is our modular number $10^9 +7$, which is prime. So for any $a$, we have $gcd(a,b) = 1$. Therefore, we can simply call
egcd(a, modv)
to get the modular multiplicative number of a.
Solution of the Combinatorial Problem
Now the way of compute
$$
\frac{(s_1+s_2 + … + s_n)!}{s_1 ! s_2 ! … s_n!} \% (10^9 +7)
$$
should be quite clear.
- Let $S = s_1 + s_2 + … +s_n$ and
modv = 10 ** 9 +7
. - Compute
factorial(i) % modv
for all i from 1 to S. - Initialize
ans
to befactorial(S) % modv
- For each group size s, we first compute the modular inverse of
factorial(s) % modv
asx
using functionegcd
(note thatfactorial(s) % modv
has same modular inverse asfactorial(s)
), then we updateans
as(ans * x) % modv
. - return the final answer.
Python Solution
We still need to multiple the answers from two sub problems (ways inside/outside group) to get the final answer, details are shown below.
1 | def numberOfSequence(self, n: int, sick: List[int]) -> int: |