Why \( O(n \sqrt n \log n) \) when you can \( O(n^2) \)?

After participating in the 4th Turing Cup Advanced Group several years ago (and getting miserably destroyed by the problems), I decided to join the 6th Turing Cup last week (the Advanced Group again, of course), expecting to perform a bit better with my improvement over the years. Surprise surprise, the problems were even harder this time! I only managed to get problem A plus one subtask in B. (That was not surprising, though – only one person solved B and no one solved C or D.) You can find the standings here.

The editorial mentioned an \(O(n \sqrt n \log n)\) solution for A, but I solved it in \(O(n^2)\). I found the approach quite interesting, albeit a bit overkill. I decided to make a post to share it, after a somewhat long vacuum in this blog.

Abridged Problem Statement
An \(O(n^3 \log n)\) Solution
An \(O(n^2 \log n)\) Solution: Fast Substring Comparison
Peculiar Constraints and The Editorial’s Quick Optimization
The Final Push to \(O(n^2)\)
Final Remarks

Abridged Problem Statement

The problem can be found and submitted to here.

6th Turing Cup Advanced Group A: LIS for String
Given a string \(s\) of length \(n\) consisting of lowercase Latin letters. Consider partitioning the string into a sequence of substrings \(T\) such that each character in the string is in exactly one substring after the partition. The LIS of \(T\) is the longest subsequence of \(T\) such that each substring is lexicographically strictly larger than the substring before it. Find the maximum length of the LIS that can be obtained over all possible partitions.
Constraints: \(1 \le n \le 20\,000\).
Time limit: 2 seconds (this will be important later).

Apparently, as said in this Codeforces comment, this problem has also appeared before: AtCoder Beginner Contest 240 Ex: Sequence of Substrings, but with \(n \le 25\,000\) and a time limit of 5 seconds.

An \(O(n^3 \log n)\) Solution

The 6th Turing Cup is an OI-style contest, so the problems have subtasks. The subtasks for this particular problem are as follows.

(5 points) \(n \le 15\)
(10 points) \(n \le 100\)
(15 points) \(n \le 300\)
(30 points) \(n \le 1\,000\)
(40 points) No additional constraints.

As people were quickly getting AC for this problem (the first AC time was less than 7 minutes!), and because brain.exe was still starting after waking up very early (the contest starts at 7:30 AM here), I immediately tried to find the full solution and didn’t think much about the subtasks. If you’re interested, you can find the official solutions for the subtasks here.

My first idea was as follows. Recall that when solving LIS using a segment tree, we maintain a DP with a segment tree, where initially each value is \(0\). The value at index \(i\), i.e. \(\mathrm{dp}(i)\), is the maximum length of the LIS ending at position \(i\) (when the element at position \(i\) must be taken).

Let us first assume that all elements are distinct. We process all elements in ascending order. If the current element is at position \(j\), we update \(\mathrm{dp}(j) = \max_{i < j} \mathrm{dp}(i) + 1\) by doing a prefix maximum query on the segment tree. If there are duplicate elements, we have to take care to make the segment tree updates “pending”, so we don’t accidentally compute the longest nondecreasing subsequence.

We adapt the solution above for this problem. Let \(\mathrm{dp}(i)\) now be the maximum length of the LIS when the last substring we chose ends at position \(i\). We now sort all substrings lexicographically. If the current substring is \(s[l, r]\), we update \(\mathrm{dp}(r) = \max_{i < l} \mathrm{dp}(i) + 1\). This part should be quite intuitive.

There are two bottlenecks to making this solution \(O(n^2)\):

Subproblem 1. Sorting all substrings naively takes \(O(n^3 \log n^2) = O(n^3 \log n)\) time, as comparing two substrings takes \(O(n)\) time.
Subproblem 2. Because we do updates and queries on the segment tree \(O(n^2)\) times, we take \(O(n^2 \log n)\) time in total.

This makes the total complexity \(O(n^3 \log n)\), which only would get 30 points. How do we optimize this further?

An \(O(n^2 \log n)\) Solution: Fast Substring Comparison

You may or may not consider this a well-known trick.

We can compare two substrings of a string lexicographically by using hashing. Remember that when comparing the lexicographic order of two strings, we essentially compare the first differing character (aside from the case when one of the strings is a prefix of the other). Thus, we can binary search to find until which position the two strings match.

After computing the polynomial hash in \(O(n)\), we can check whether two substrings are equal in \(O(1)\), so a comparison runs in \(O(\log n)\) time.

bool compare(int l1, int r1, int l2, int r2) {
    // binary search for longest common prefix
    int l = 0;
    int r = min(r1 - l1, r2 - l2);
    int ans = -1;
    while (l <= r) {
        int mid = (l + r) / 2;
        if (getHash(l1, l1 + mid) == getHash(l2, l2 + mid)) {
            ans = mid;
            l = mid + 1;
        } else {
            r = mid - 1;
        }
    }
    
    if (ans == r) {
        // one is the prefix of the other, the shorter one is smaller
        return (r1 - l1) < (r2 - l2);
    }
    // compare next character
    return S[l1 + ans + 1] < S[l2 + ans + 1];
}

Now, we have solved Subproblem 1 in \(O(n^2 \log n)\), making the time complexity \(O(n^2 \log n)\). This is enough for 60 points, but we still have to push further!

Peculiar Constraints and The Editorial’s Quick Optimization

Notice that because \(n = 2 \times 10^4\), there are approximately \(n^2 / 2 = 2 \times 10^8\) substrings of \(s\). The time limit is also 2 seconds! Coincidence?

Because of this, I immediately thought that I could squeeze a solution that is linear in the number of substrings. This may have made me quite fixated on finding an \(O(n^2)\) solution instead of an \(O(n \sqrt n \log n)\) one like the one in the editorial.

The editorial solution was quite simple. There is always a way to construct the LIS such that if you “add” strings to the LIS from left to right, when you add a substring of length \(l\), and \(l\) is greater than all of the strings that come before it, its longest common prefix with the previous substring has length \(l - 1\). (In other words, we can always “throw away” excess characters, and it is optimal to do so.)

Thus, if we use a substring of length \(l\) in our solution, we will also use substrings of length \(l - 1\), \(l - 2\), and so on. This means the longest substring we need to use has length \(O(\sqrt n)\), so the number of substrings we need to consider is just \(O(n \sqrt n)\)! Augmenting the \(O(n^2 \log n)\) solution with this observation results in a time complexity of \(O(n \sqrt n \log n)\).

Cursed complexity, of course, but the observation was simple and elegant. As I said before, though, I didn’t notice this small observation and proceeded to tread the path to \(O(n^2)\)….

The Final Push to \(O(n^2)\)

Counting Sort-ing the Substrings

(What is the present participle of ‘counting sort’…?)

In reality, I didn’t code the \(O(n^2 \log n)\) solution that was just discussed above. I just noted that it was possible to get 60 points and kept searching further for the purely quadratic solution.

Instead of using comparison-based sorting, as the alphabet size \(\vert\Sigma\vert = 26\) is small, we can use counting sort! However, it would be inefficient to store the \(O(n^2)\) pairs of starting and ending indices directly in the program, so we have to process the DP “as we go”. The recursive algorithm I came up with is as follows.

Initially, we have a list of indices \([1, 2, \dots, n]\) as the starting points of the substrings.
We sort them based on the substring of length \(1\) corresponding to the starting points. (At this point, it’s just sorting the characters.) For example, the string ababab from the sample input produces the sorted list \([1, 3, 5, 2, 4, 6]\), where the subarray \([1, 3, 5]\) corresponds to a and \([2, 4, 6]\) corresponds to b.
For each letter from a to z, we process the DP updating (explained above) of all elements of the subarray corresponding to it, then recurse for the substring of length \(2\) with this subarray before continuing to the next letter.

The invariant in the recursion is that when we are doing the recursive call for length \(i\), the substring of length \(i - 1\) corresponding to the starting points are all equal, so we are essentially only sorting the characters \(i - 1\) indices away from the starting points.

This method basically simulates traversing a trie without actually building the trie, so the order of traversal is lexicographically nondecreasing.

Below is my implementation. I avoided costly standard library data structures such as std::vectors, so I used C-style arrays instead.

int ord[maxN], idf[maxN], temp_dp[maxN];
int t_ord[Alph + 1][maxN]; // t_ord[*][0] stores size

void recurse(int l, int r, int depth) {
    for (int i = 0; i <= Alph; i++) t_ord[i][0] = 0;

    for (int i = l; i <= r; i++) {
        int id = ((ord[i]+depth > n) ? 0 : (s[ord[i]+depth]-'a'+1));
        t_ord[id][++t_ord[id][0]] = ord[i];
    }

    int ptr = l;
    for (int i = 0; i <= Alph; i++) {
        for (int j = 1; j <= t_ord[i][0]; j++) {
            ord[ptr] = t_ord[i][j];
            idf[ptr] = i;
            ptr++;
        }
    }

    // Recurse
    int lst = l - 1;
    for (int i = l; i <= r; i++) {
        if (i == r || idf[i] != idf[i+1]) {
            if (idf[lst+1] > 0) {
                // Process DP
                for (int j = lst + 1; j <= i; j++) {
                    temp_dp[j] = segtree.query(1, ord[j] - 1) + 1;
                }
                for (int j = lst + 1; j <= i; j++) {
                    segtree.update(ord[j] + depth, temp_dp[j]);
                }

                recurse(lst + 1, i, depth + 1);
            }
            lst = i;
        }
    }
}

At first glance, this seems to be \(O(n^2)\), so I left it there as for now.

\(O(1)\) RMQ How-To

Now, Subproblem 2 is (most likely) solved. Subproblem 1 remains.

Of course, we have not found a way to solve a general RMQ in \(O(1)\) per update and query – if we had, we’d have used it for all RMQ problems by now. What special properties are there in this problem’s RMQ?

There are \(O(n^2)\) updates and queries.
There are \(n\) indices.
The values only range from \(0\) to \(n\).

The last two points are important.

Let’s transform the RMQ as follows. We store an array \(M\) of length \(n\), where \(M[i]\) stores the value of the prefix maximum (i.e. the queries can be done directly by accessing the value of \(M[i]\)).

Notice that \(M\) is nondecreasing. Thus, when we update a particular index \(j\), we can iterate to the right until needed as follows.

int M[maxN];
void updateMax(int idx, int val) {
    if (M[idx] < val) {
        int r = idx;
        while (r < N && M[r+1] < val) r++;
        for (int i = idx; i <= r; i++) M[i] = val;
    }
}

Because each of the \(O(n)\) indices only have \(O(n)\) different values, we will update at most \(O(n^2)\) values in total. Thus, this RMQ runs in amortized \(O(n^2)\). This kind of amortization reminds me of a recent problem in a Codeforces contest (spoilers, of course).

Subproblem 1 is now solved! I submitted my code… and the contest system crashed… what perfect timing. While waiting for the system to be up again and the verdict to appear, I re-read problems B, C, and D to figure out which one I should focus on next. A few minutes later, I found out that my submission got TLE!

I wondered if it was the recursion that made it get TLE, so I simulated the recursion using a stack (implemented with a C-style array, of course) and a while loop. It still got TLE….

When Counting Sort is Too Slow

The problem with the current solution to Subproblem 2 is that each call of \(\mathrm{recurse}(l, r, depth)\) takes \(O(\max(\vert\Sigma\vert, r - l))\) time. Thus, in the deep layers of recursion, we may need to iterate a total of \(O(\vert\Sigma\vert n)\) time per layer, for a theoretical upper bound of \(O(\vert\Sigma\vert n^2)\). I don’t know if there is a way to construct a string \(s\) that achieves this upper bound, though.

Now, we need to remove the counting sort.

What is the counting sort needed for? Well, it is needed to get the recursive invariant correct, so it sorts the substrings as we go deeper into the recursion. But why don’t we sort the substrings first (in a hopefully faster time), and then do the recursive calls in purely \(O(n^2)\)?

Notice that the indices we are storing in that list are actually the suffixes of \(s\). After we are finished with our recursion, the suffixes will be sorted lexicographically. Why don’t we sort them from the beginning, so we don’t have to sort them while doing our recursion?

Well, how do we sort the suffixes of a string? We can create the suffix array of the string in \(O(n \log n)\) time!

The final stack-simulated recursion becomes as follows. Note that because we use a real stack, we push the recursive calls in reverse order to the stack, i.e. from z to a.

vector<int> sa = suffixArray(s);

ss = 1; // index of top of stack
stk_l[0] = 1; stk_r[0] = n; stk_d[0] = 0; // (l, r, depth)
while (ss) {
    ss--;
    int l = stk_l[ss], r = stk_r[ss], depth = stk_d[ss];

    if (depth > 0) {
        for (int i = l; i <= r; i++) temp_dp[i] = mx[sa[i] - 1] + 1;
        for (int i = l; i <= r; i++) updateMax(sa[i] + depth - 1, temp_dp[i]);
    }

    int nl = l;
    while (nl <= r && sa[nl]+depth > n) nl++;

    int lst = r + 1;
    for (int i = r; i >= nl; i--) {
        if (i == nl || s[sa[i]+depth] != s[sa[i-1]+depth]) {
            stk_l[ss] = i;
            stk_r[ss] = lst - 1;
            stk_d[ss] = depth + 1;
            ss++;
            lst = i;
        }
    }
}

I once again submitted this version, and it finally got AC!

To make the submission public, I also submitted it to the ABC problem. The submission can be viewed here. Even for \(n \le 25\,000\), it ran in less than 1.5 seconds.

This problem apparently has a difficulty rating of 3184 on AtCoder, which makes it the hardest AtCoder problem I have solved to date!

Final Remarks

After getting 20 points on B, I decided to spend my remaining time trying to get 55 points on D. I think I overcomplicated my solution and didn’t get to finish coding it in time. I do think I could’ve managed my time better (probably getting a nonzero score in C and D).

Still, I didn’t regret participating in the contest as I managed to solve a hard problem with a probably unintended solution. I enjoyed walking through the various steps needed to push the complexity to \(O(n^2)\), and I hope you did too while joining me in my explanation!

Zanite's Nook

6th Turing Cup: LIS for String