This is the migrated version of the original Tumblr blog I posted last year.

Abridged Problem Statement

You are given a deck of \(m\) cards, one of which is a joker. You will shuffle the deck \(n\) times, with each of the \(m!\) permutations being equiprobable. Let \(x\) be the number of times the joker appears on the top of the deck. Find the expected value of \(x^k\) modulo \(998 \ 244 \ 353\).

Constraints: \(1 \le n, m < 998 \ 244 \ 353, 1 \le k \le 5 \ 000\).

Formulating the Generating Function

Let \(p_i\) be the probability that the joker appears exactly \(i\) times on the top of the deck. Of course, in one shuffle, the probability of the joker appearing on the top of the deck is \(1/m\). Therefore,

\[p_i = \left( \frac{1}{m} \right)^i \left( 1 - \frac{1}{m} \right)^{n - i} \binom{n}{i}.\]

Thus, the expected value that we want to find is just

\[E = \sum_{i = 0}^{n} p_i i^k.\]

An interesting thing happens if we view \(p_i\) as the coefficients of its generating function – let us call it \(P(x)\). In other words, \(p_i = [x^i]P(x)\). Let us first define \(h = 1/m\) for ease of notation. Therefore,

\[P(x) = \sum_{i = 0}^{n} p_i x^i = \sum_{i = 0}^{n} \left[ h^i (1-h)^{n-i} \binom{n}{i} x^i \right].\]

However, note that to find \(E\), we needed the coefficients of \(P(x)\). How do we connect these two equations together?

Let’s sidetrack for a moment. Let’s say we have a polynomial \(A(x) = \sum a_i x^i\). How do we find \(B(x) = \sum i a_i x^i\)? Remember that \(\text{d}/\text{d}x (a_i x^i) = i a_i x^{i-1}\). Thus,

\[\begin{aligned} B(x) &= \sum i a_i x^i \\ &= x \sum i a_i x^{i-1} \\ &= x \cdot \frac{\text{d}}{\text{d}x} A(x). \end{aligned}\]

Following the same steps, let us define

\[G_j(x) = \sum_{i = 0}^{n} \left[p_i \cdot i^j \cdot x^i \right].\]

From the definition of \(P(x)\), we know that \(G_0(x) = P(x)\). Also, from the relations of \(A(x)\) and \(B(x)\), we have the recurrence relation

\[G_j(x) = x \cdot \frac{\text{d}}{\text{d}x} G_{j-1}(x).\]

Now, how do we calculate the answer, \(E\), from these formulas? Notice that

\[\begin{aligned} E &= \sum_{i = 0}^{n} p_i i^k \\ &= \sum_{i = 0}^{n} \left[ p_i \cdot i^k \cdot 1^i \right] \\ &= G_k(1), \end{aligned}\]

which means that our answer is just \(G_k(1)\).

Computing \(G_k(1)\)

Let us first try to compute \(G_k(x)\).

For a polynomial of degree \(n\), differentiation and multiplication with \(x\) is a straightforward operation that could be done in \(\mathcal{O}(n)\) time if we store the polynomial in coefficient representation. However, as \(n\) is big in this problem, we need to somehow find another way to represent the polynomial such that these operations can be done efficiently.

Let’s go back to \(P(x)\). Let us try to simplify it:

\[\begin{aligned} P(x) &= \sum_{i = 0}^{n} \left[ h^i (1-h)^{n-i} \binom{n}{i} x^i \right] \\ &= \sum_{i = 0}^{n} \left[ (hx)^i (1-h)^{n-i} \binom{n}{i} \right] \\ &= (hx + 1 - h)^n \end{aligned}\]

by the binomial theorem. Thus, we have found \(G_0(x)\).

From here, we can find \(G_1(x)\):

\[\begin{aligned} G_1(x) &= x \cdot \frac{\text{d}}{\text{d}x} G_0(x) \\ &= x \cdot \frac{\text{d}}{\text{d}x} \left(hx+1-h\right)^n \\ &= xnh \left(hx+1-h\right)^{n-1} \end{aligned}\]

And similarly, we can find \(G_2(x)\):

\[\begin{aligned} G_2 \left(x\right) &= x\cdot \frac{\text{d}}{\text{d}x} G_1\left(x\right) \\ &= x\cdot \frac{\text{d}}{\text{d}x} \left[xnh\left(hx+1-h\right)^{n-1}\right] \\ &= xnh\cdot \frac{\text{d}}{\text{d}x} \left[x\cdot\left(hx+1-h\right)^{n-1}\right] \\ &= xnh\cdot\left(xnh\cdot\left(hx+1-h\right)^{n-2}+\left(hx+1-h\right)^{n-1}\right) \end{aligned}\]

From here, we can observe that \(G_i(x)\) for all \(i\) can be expressed as a linear combination of \(x^a (hx + 1 - h)^b\) for nonnegative integers \(a\) and \(b\). Therefore, we can try to define

\[L(p, q) = x^p (hx + 1 - h)^q\]

(we drop the \(x\) from the function for ease of notation). Let us now try to apply the operation “differentiate and multiply by \(x\)” to \(L(p, q)\). We get

\[\begin{aligned} x\cdot \frac{\text{d}}{\text{d}x} L\left(p,\ q\right) &= x\cdot \frac{\text{d}}{\text{d}x} \left[x^p\left(hx+1-h\right)^q\right] \\ &=x\cdot\left(x^phq\left(hx+1-h\right)^{q-1}+px^{p-1}\left(hx+1-h\right)^q\right) \\ &=hq\cdot x^{p+1}\left(hx+1-h\right)^{q-1}+p\cdot x^p\left(hx+1-h\right)^q \\ &=hq\cdot L\left(p+1,\ q-1\right)+p\cdot L\left(p,\ q\right). \end{aligned}\]

Of course, there are some corner cases to consider:

\[\begin{aligned} x\cdot \frac{\text{d}}{\text{d}x} L\left(0,\ q\right) &= hq\cdot\ L\left(1,\ q-1\right); \\ x\cdot \frac{\text{d}}{\text{d}x} L\left(p,0\right) &= p\cdot\ L\left(p,\ 0\right). \end{aligned}\]

Verifying these facts is left as an exercise to the reader.

That means, we can simply do \(k\) operations on \(G_0(x)\), plug in \(x = 1\), and get the answer!

One last fact to note is that when we plug in \(x = 1\),

\[x^p (hx + 1 - h)^q = 1^p (h + 1 - h)^q = 1^p \cdot 1^q = 1\]

which means that we only need to sum the coefficients of all \(L(p, q)\) resulting from the \(k\) operations.

Complexity Analysis

At a glance, it seems like because each \(L(p, q)\) term is broken into two terms when doing the “differentiate and multiply by \(x\)” operation, the complexity will be \(\mathcal{O}(2^k)\). However, it is actually \(\mathcal{O}(k^2)\).

To see why this is true, consider what happens when we differentiate \(L(p, q)\). In all present terms of \(L(p, q)\), notice that the value of \(p + q\) is always the same: \(n\). Also, \(L(p, q)\) branches into \(L(p+1, q-1)\) and \(L(p, q)\), which means that there are at most \(\mathcal{O}(k)\) terms at any point in time. Thus, during the implementation, all we have to do is store the coefficient of the current term and the value of \(p\).

Implementation details are left to the reader as an exercise. A notable fact is that one can treat the computation as a dynamic programming algorithm: \(\text{dp}[t][v] = c\) , where \(t =\) (number of times the “differentiate and multiply by \(x\)” operation has been done), \(v =\) (value of \(p\) in current term), and \(c =\) (coefficient of current term).

As a final note, after solving the problem, I read the comments in the Codeforces tutorial blog post, and found that there were many different generating function solutions to this problem with various time complexities. They are all worth the read — as always, there are many ways to Rome. This blog is but one approach to the solution; hopefully it may ignite the readers’ spark of insight.