Saturday, May 18, 2013

Google Codejam 2013 Round 1C

Problem A. Consonants

In brief, given a string \(S\) return the total number of substrings containing at least \(N\) consecutive consonants. The vowels are define as a, e, i, o, and u, and our consonants are the remaining 21 letters.

Let's begin by solving the small input of this problem, because the size of the string \(L\) can be at most 100. We just simple iterate over all the possibles substrings of the given string. For each substring check if contains at least one occurrence of \(N\) consecutive consonants, if that's the case we count that substring in our final answer. Things get a little bit more interesting when we are facing the large input, the size of the given string \(L\) can be at most \(10^{6}\). Clearly, the previous algorithm is not going to be fast enough. So we need a different approach to attack this problem.

First let's get some intuition with the following example:

S =  "quxrtz"
L =  6
N = 2

Imagine that we are iterating our string, until we find the substring "xr" which has \(N = 2\) consecutive consonants:

Now we should ask ourselves the following question, how many substrings  contains the string "xr"?

Listing all the substring of the given string S, we have in total 9 of them:

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

Where this answer come from? let's imagine for a second that we are building all the substrings that contains the string "xr". To do that we could either add characters to the front or to the end of "xr".

In the case of the left side we have 3 different possibilities, we can add "u", "qu" or "" (the empty string) to the front of "xr". In the same manner, in the case of the right side we also have 3 different possibilities, we can add "t", "tz" or "" (the empty string) to the end of the string "xr". By the rule of product we have \(3 \cdot  3 = 9\) substring including the string "xr".

Formalizing more the previous statement, let's call \(i\) the index of the last consonant that we found (0-based), and, let's call \(C\) the count of consecutive consonants so far. If the condition \(C \geq N\) holds, our count of substring that include this \(N\) consonants is given by the expression \((L - i) \cdot (i - N + 1)\).

There is just one problem remaining, what happen then if we apply the same method to the next N consecutive consonants:

We end up double counting some of the substrings (in red color the substrings that we have counted twice):

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

The solution to this issue is simple, instead of taking the whole left part of our string, we just take the part that we haven't count yet, we can do that by maintaining a pointer to the last \((i - N + 1)\) seen so far.

Adding this details left, we end up with the following expression \((L - i) \cdot (i - N + 1 - P)\) where \(P\) is the pointer to the last \((i - N + 1)\).

Thursday, May 09, 2013

Google Codejam 2013 Round 1A

It's been a while since my last post, I want to apologize to the readers of my blog, I was kind of busy studying Chinese.

There is not to much left to say about this match, there is an excellent official contest analysis. Nevertheless, I want to share some ideas that may help somebody to better understand the problems.

A. Bullseye

Let's first begin with a brute force solution of the small-input, according to the problem statement Maria begins first by drawing a circle of radius \(r + 1\) around the first white circle of radius \(r\), if there is enough paint she continue with a ring of radius \(r + 3\) around a circle of radius \(r + 2\) and so on...

So the area of the \(k\) ring correspond to the expression:
\( A_{k} = (r + 2k - 1)^{2} - (r +2k - 2)^{2} \pi \) \(cm^{2}\)

For example to paint the black rings showed in the previous picture we need:

\(S = ((r + 1) ^2 - r^2) +  ((r +3)^{2} - (r + 2)^{2})\) millilitres of black paint.

Let's don't forget that according to the statement 1 \(ml\) of paint is required to cover an area of \(\pi\) \(cm^{2}\).

Because for the small-input the constraints are low, we can just simply sum up the areas of the rings until we run out of black paint:

Until this point everything seems like a fairy tale, unfortunate the constraints for the large-input are kind of evil :). To understand the idea behind the solution we may need some understanding of arithmetic progressions.

An arithmetic progression is a sequence of numbers such that the difference between the consecutive terms is constant, let's call this constant value \(d\), so the \(n\)-term of certain sequence \(a\) is given by the formula:

\( a_{n} = a_{1} + (n-1)d \)

In order to get some intuition, let's first take a sequence of 3 rings, by our previous area formula we get something like this:

\(a = ((r + 1) ^2 - r^2) , ((r +3)^{2} - (r + 2)^{2}) , ((r +5)^{2} - (r + 4)^{2})\)

So far our expression don't look really arithmetic :) let's apply some algebra and see where it take us:

\(a = (r^{2} + 2r + 1 - r^{2}) ,  (r^{2} + 6r + 9 - (r^{2} + 4r + 4))
\\ ,  (r^{2} + 10r + 25 - (r^{2} + 8r + 16))\)

\(a = (2r + 1) ,  ( 2r + 5)  , (2r + 9)\) 

Given this expression is easy to see that we have arithmetic progression, with a constant factor \(d = 4\):

\(a = (2r + 1) ,  ( 2r + 5)  , (2r + 9), \cdots, (2r + 4k - 3) \)
\( a_{k} =  2r + 4k - 3 \)

Another way of getting the s \(k\)-th term of our arithmetic sequence, is applying some algebra to the given area formula:

\( a_{k} =  (2k + r -1)^{2} - (2k + r -2)^{2} \)
\( a_{k} =  (2k + r -1)^{2} - ((2k + r - 1) - 1)^{2} \)
\( a_{k} =  (2k + r -1)^{2} - (2k + r - 1)^{2} + 2(2k + r - 1) - 1 \)
\( a_{k} =  2(2k + r - 1) - 1 \)
\( a_{k} =  2r + 4k - 3 \)

Now our next goal is the sum of the area of the first \(k\) black rings, such that  \( S_{k} \leq t\) where \(t\) is the total amount of black paint available.

To calculate the sum of an arithmetic progression we use the following formula \(S_{k} = \frac{k(a_{1} + a_{k})}{2} \), here is the proof.

By substituting \(a_{1} = 2r + 1\) and \(a_{k} = 2r + 4k - 3\) we get:

\(S_{k} = \frac{k(2r + 1 + 2r + 4k - 3)}{2} \)
\(S_{k} = \frac{n(4r + 4k - 2)}{2} \)
\(S_{k} = k(2r + 2k - 1)\) 

Up to this point we already have all the math that we need, but there still one more problem, we can't check value after value until some \(k\) holds, we are going to need binary search to look for the \(k\) that maximize our result. Because the constraints are a little bit to high, we should be careful with overflow. A work around is use Python, so you don't have to worry about overflow.