The blue path

Thursday, June 06, 2013

Codeforces Round #185

A. Whose sentence is it?

http://codeforces.com/contest/312/problem/A

You got the chat record of two of your friends Freda and Rainbow, by mere curiosity you want to classified the sentences by the following categories: (1) Written by Freda, (2) Written by Rainbow or (3) You don't know who is the writer.

To do that you know that Freda always says "lala." (without quotes) at the end of the sentence. Rainbow in the other hand, always says "miao." (without quotes) at the beginning of the sentence.

This is an implementation problem. There are just 3 cases to consider:

(1) If the sentence begins with prefix "miao." and don't end with the suffix "lala.", then it was written by Rainbow.
(2) If the sentence ends with suffix "lala." and don't begin with the prefix "miao.", then it was written by Freda.
(3) Otherwise, you don't know who is the writer of the sentence.

B. Archer

http://codeforces.com/contest/312/problem/B

There is a match of archery between SmallR and Zanoes, they try to shoot a certain target in turns, and SmallR shoots first. The probability that SmallR hits the target is of $\frac{a}{b}$ while for Zanoes is $\frac{c}{d}$. Given 4 integers $a$, $b$, $c$, $d$, return the probability of SmallR will win the match.

Let's first define some notation:

$P(A) = \frac{a}{b}$ probability of SmallR hit the target
$P(B) = \frac{c}{d}$ probability of Zanoes hit the target
$1 - P(A)$ probability of SmallR don't hit the target
$1 - P(B)$ probability of Zanoes don't hit the target
$P(W)$ probability of SmallR winning the archery match

Before writing any math, let's get some intuition about what is happening on each turn. There are 2 players shooting the target, that's give us 4 possibles outcomes:

(1) SmallR hits the target and Zanoes hits the target
(2) SmallR hits the target and Zanoes miss the target
(3) SmallR miss the target and Zanoes hit the target
(4) SmallR miss the target and Zanoes miss the target

Now because SmallR shoots first the probability of him winning in the first turn is equal to $P(A) = \frac{a}{b}$. If neither players win the match, or in other words, if SmallR and Zanoes both miss in the first turn, then, the probability that SmallR win in the second turn is $(1 - P(A)) \cdot (1 - P(B)) \cdot P(A)$.

Generalizing from the previous idea, we have that the probability that SmallR win in the $k$-th turn is: $ ((1 - P(A))(1 - P(B)))^{k-1} \cdot P(A)$

This means that our answer, the probability of SmallR winning is given by the expression:
$P(W) = \sum_{k=1}^{\infty } ((1 - P(A))(1 - P(B)))^{k-1} \cdot P(A)$ If we look carefully this turn out to be an Infinite geometric series, so we can reduce the infinite series to the expression: $\\ \sum_{k=0}^\infty ar^k = \frac{a}{1-r} \\ a = P(A) \\ r = (1 - P(A))(1 - P(B)) \\ \\ P(W) = \frac{P(A)}{1 - (1 - P(A))(1 - P(B))}$

Saturday, May 18, 2013

Google Codejam 2013 Round 1C

Problem A. Consonants

http://code.google.com/codejam/contest/2437488/dashboard#s=p0

In brief, given a string $S$ return the total number of substrings containing at least $N$ consecutive consonants. The vowels are define as a, e, i, o, and u, and our consonants are the remaining 21 letters.

Let's begin by solving the small input of this problem, because the size of the string $L$ can be at most 100. We just simple iterate over all the possibles substrings of the given string. For each substring check if contains at least one occurrence of $N$ consecutive consonants, if that's the case we count that substring in our final answer. Things get a little bit more interesting when we are facing the large input, the size of the given string $L$ can be at most $10^{6}$. Clearly, the previous algorithm is not going to be fast enough. So we need a different approach to attack this problem.

First let's get some intuition with the following example:

S = "quxrtz"
L = 6
N = 2

Imagine that we are iterating our string, until we find the substring "xr" which has $N = 2$ consecutive consonants:

Now we should ask ourselves the following question, how many substrings contains the string "xr"?

Listing all the substring of the given string S, we have in total 9 of them:

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

Where this answer come from? let's imagine for a second that we are building all the substrings that contains the string "xr". To do that we could either add characters to the front or to the end of "xr".

In the case of the left side we have 3 different possibilities, we can add "u", "qu" or "" (the empty string) to the front of "xr". In the same manner, in the case of the right side we also have 3 different possibilities, we can add "t", "tz" or "" (the empty string) to the end of the string "xr". By the rule of product we have $3 \cdot 3 = 9$ substring including the string "xr".

Formalizing more the previous statement, let's call $i$ the index of the last consonant that we found (0-based), and, let's call $C$ the count of consecutive consonants so far. If the condition $C \geq N$ holds, our count of substring that include this $N$ consonants is given by the expression $(L - i) \cdot (i - N + 1)$.

There is just one problem remaining, what happen then if we apply the same method to the next N consecutive consonants:

We end up double counting some of the substrings (in red color the substrings that we have counted twice):

{"q", "u", "x", "r", "t", "z", "qu", "ux", "xr", "rt", "tz", "qux", "uxr",
"xrt", "rtz", "quxr", "uxrt", "xrtz", "quxrt", "uxrtz", "quxrtz"}

The solution to this issue is simple, instead of taking the whole left part of our string, we just take the part that we haven't count yet, we can do that by maintaining a pointer to the last $(i - N + 1)$ seen so far.

Adding this details left, we end up with the following expression $(L - i) \cdot (i - N + 1 - P)$ where $P$ is the pointer to the last $(i - N + 1)$.

Thursday, May 09, 2013

Google Codejam 2013 Round 1A

It's been a while since my last post, I want to apologize to the readers of my blog, I was kind of busy studying Chinese.

There is not to much left to say about this match, there is an excellent official contest analysis. Nevertheless, I want to share some ideas that may help somebody to better understand the problems.

A. Bullseye

http://code.google.com/codejam/contest/2418487/dashboard#s=p0&a=2

Let's first begin with a brute force solution of the small-input, according to the problem statement Maria begins first by drawing a circle of radius $r + 1$ around the first white circle of radius $r$, if there is enough paint she continue with a ring of radius $r + 3$ around a circle of radius $r + 2$ and so on...

So the area of the $k$ ring correspond to the expression:
$ A_{k} = (r + 2k - 1)^{2} - (r +2k - 2)^{2} \pi $ $cm^{2}$

For example to paint the black rings showed in the previous picture we need:

$S = ((r + 1) ^2 - r^2) + ((r +3)^{2} - (r + 2)^{2})$ millilitres of black paint.

Let's don't forget that according to the statement 1 $ml$ of paint is required to cover an area of $\pi$ $cm^{2}$.

Because for the small-input the constraints are low, we can just simply sum up the areas of the rings until we run out of black paint:

Until this point everything seems like a fairy tale, unfortunate the constraints for the large-input are kind of evil :). To understand the idea behind the solution we may need some understanding of arithmetic progressions.

An arithmetic progression is a sequence of numbers such that the difference between the consecutive terms is constant, let's call this constant value $d$, so the $n$-term of certain sequence $a$ is given by the formula:

$ a_{n} = a_{1} + (n-1)d $

In order to get some intuition, let's first take a sequence of 3 rings, by our previous area formula we get something like this:

$a = ((r + 1) ^2 - r^2) , ((r +3)^{2} - (r + 2)^{2}) , ((r +5)^{2} - (r + 4)^{2})$

So far our expression don't look really arithmetic :) let's apply some algebra and see where it take us:

$a = (r^{2} + 2r + 1 - r^{2}) , (r^{2} + 6r + 9 - (r^{2} + 4r + 4))
\\ , (r^{2} + 10r + 25 - (r^{2} + 8r + 16))$

$a = (2r + 1) , ( 2r + 5) , (2r + 9)$

Given this expression is easy to see that we have arithmetic progression, with a constant factor $d = 4$:

$a = (2r + 1) , ( 2r + 5) , (2r + 9), \cdots, (2r + 4k - 3) $
$ a_{k} = 2r + 4k - 3 $

Another way of getting the s $k$-th term of our arithmetic sequence, is applying some algebra to the given area formula:

$ a_{k} = (2k + r -1)^{2} - (2k + r -2)^{2} $
$ a_{k} = (2k + r -1)^{2} - ((2k + r - 1) - 1)^{2} $
$ a_{k} = (2k + r -1)^{2} - (2k + r - 1)^{2} + 2(2k + r - 1) - 1 $
$ a_{k} = 2(2k + r - 1) - 1 $
$ a_{k} = 2r + 4k - 3 $

Now our next goal is the sum of the area of the first $k$ black rings, such that $ S_{k} \leq t$ where $t$ is the total amount of black paint available.

To calculate the sum of an arithmetic progression we use the following formula $S_{k} = \frac{k(a_{1} + a_{k})}{2} $, here is the proof.

By substituting $a_{1} = 2r + 1$ and $a_{k} = 2r + 4k - 3$ we get:

$S_{k} = \frac{k(2r + 1 + 2r + 4k - 3)}{2} $
$S_{k} = \frac{n(4r + 4k - 2)}{2} $
$S_{k} = k(2r + 2k - 1)$

Up to this point we already have all the math that we need, but there still one more problem, we can't check value after value until some $k$ holds, we are going to need binary search to look for the $k$ that maximize our result. Because the constraints are a little bit to high, we should be careful with overflow. A work around is use Python, so you don't have to worry about overflow.

Wednesday, February 06, 2013

UVA 12526 - Cellphone Typing

http://uva.onlinejudge.org/external/125/12526.html

You are part of a research team that is developing a new technology to save time when typing text messages in mobile devices. This new technology aids users to skip key-strokes by the following conditions:

The first letter always has to be input manually (can't be skipped).
If a given prefix $c_{1}, c_{2}, \cdots, c_{n}$ has been input into the system, and the user tried to insert a letter c such that for all the words that starts with $c_{1}, c_{2}, \cdots, c_{n}$ also starts with $c_{1}, c_{2}, \cdots, c_{n}\mathbf{c}$ then this key-stroke can be skipped. Otherwise the system waits for the users input.

Your task is, given the dictionary of words, calculate the average of key-strokes to input those words in the new system.

This problem can be solved using the Trie data structure, if you are not familiar with Tries I personally recommend the topcoder tutorial.

Back to the business :) In order to get a better intuition of the task let's first build a Trie with the following words `hello', `hell', `heaven' and `bye':

As we can see in the graph bellow the numbers in red are the vertex prefix counter, in simple terms, we can say that every time a word is insert in our Trie a certain path is follow along the tree, and for each of the vertex in that path we keep a counter that records all the visits.

For example the letter H has a prefix count of 3 because there are 3 words that begin with H, `hello', `hell' and `heaven'. We could also say that the prefix count of H is 3 because there are 3 paths that crossed by H.

Understanding this let's focus in our goal the average number of key-strokes per word. Looking the picture below is not hard to see that we can only skip key-strokes when two consecutive nodes has the same prefix count.

Let's try to convince ourselves about the previous statement, assume that we are moving along a path in our Trie, along this path we visit the nodes in the following order $1, 2, 3, \cdots, n, n + 1$ if a node $n$ and $n+1$ had the same prefix count implies that all the paths that past through $n+1$ also past through $n$ so we can say then that all the words that cross by the node $n$ also cross by the node $n+1$.

Finally, To get our result we just need to create a method that count the number of key-strokes. This method should not count the strokes of consecutive vertex with the same prefix count. The overall time complexity of this algorithm is $O(|L|)$ where $|L|$ is the length of the concatenate dictionary.

Wednesday, November 21, 2012

UVA 12532 - Interval Product

http://uva.onlinejudge.org/external/125/12532.html

You are given a sequence of $N (1 \leqslant N \leqslant 10^{5})$ integers $X_{1}, X_{2}, \cdots, X_{n}$. You have to perform $K (1 \leqslant K \leqslant 10^{5})$ operations over this sequence, which can be:

(1) Change an arbitrary value of the sequence.

(2) Given a range $[i, j]$ return if the product $X_{i}, X_{i+1}, ..., X_{j-1}, X_{j}$ is positive, negative or zero.

As usual we are going to start asking ourselves if it's possible to solve this problem under the 2 seconds time constraint using the brute force approach? The answer is "maybe" after the UVA administrator install the new Quantum Computers servers :), because this is not the case a Brute Force solution is going to timeout.

To solve this problem I am going to describe two approaches, the first one using Segment Tree and the second one using Binary Indexed Trees (BIT):

In case that the reader is not familiar with this data structures I personally recommend the following TopCoder tutorials:

(1) Solution using Segment Tree

The main problem of the Brute Force approach is that we need $O(j - i + 1)$ time to answer each of the second operation queries. Using Segment Trees we can reduce this time to $O(log N)$ where $N$ is the size of our sequence.

The main idea is to construct a Segment Tree of the sub-intervals products of our sequence, for example, given the sequence $(-2, 6, 0, -1)$ we got the following tree:

There is just a little issue with this, the values of the given sequence are in the range $-100 \leqslant X_{i} \leqslant 100$. This implies that we are in risk to get an overflow/underflow for some sequences, an easy one is $(100,100,100,100,100)$, if we execute the second operation over range $[0, 4]$ we got the value $100^{5}$ that clearly surpass the 32 bit integer capacity.

Let's remember that this problem care just for the sign (e.g. negative, positive) between certain interval $[i,j]$. So to avoid the overflow problem we simply need to substitute all the negative numbers with -1 and all the positives values by 1. After applying the previous operations we got a tree that look like this:

This solution will give us an overall time complexity of $O(K logN)$ which is enough to pass the 2 seconds time limit.

(2) Solution using Binary Indexed Trees

The solution with BIT is a little bit less intuitive than the previous exposed, but we can easily came with it by asking ourselves the correct questions.

1. When the product of an interval $[i,j]$ becomes zero?

When there is one or more zeros in the interval.

2. When the product of an interval $[i,j]$ becomes negative?

When there is not zeros in the interval, and the total number of negative number between $[i,j]$ is odd.

3. When the product of an interval $[i,j]$ becomes positive?

When none of the other conditions holds then product of the interval $[i,j]$ is positive.

After considering these three questions it seems important to keep track of the frequency of zeros and negative numbers for any interval $[i,j]$ of our sequence. The easiest way to accomplish this task is maintaining two arrays with the cumulative frequencies of both types, for example:

So if we want to know the numbers of zeros between certain interval $[i,j]$ we just need to apply the old trick of $freq[j] - freq[i - 1]$ (assuming that $freq[0]$ is 0).

$\\ freq[4] - freq[1] = 1 \\ freq[4] = 2 \\ freq[1] = 1 \\ 2 - 1 = 1 $

Until this point everything seems just fine, however there is another problem we haven't consider yet, what happen when we update one of the values of our sequence (operation #1) ? if that's the case we also need to update our cumulative frequency array. It's not hard to see that even when we can perform operation #2 in $O(1)$ time, the operation #1 is going to take in the worst case scenario $O(N)$ time, that due to the vast amount of queries we can't afford for this problem.

Considering this problems, the necessity of using BIT seems more evident. We just need to carefully maintain the cumulative frequency updated after each operation is performed. The overall time complexity of this solution is $O(KlogN)$.

Friday, November 02, 2012

UVA 11076 - Add Again

http://uva.onlinejudge.org/external/110/11076.html

You are given an sequence of $n (1\leqslant n\leqslant12)$ digits, your task is to return the summation of all the unique permutations of those digits.

For example given the digits ${1, 2, 3}$ the sum of all the permutations is $1332$:

$123 + 213 + 132 + 231 + 312 + 321 = 1332$

In order to understand the solution of the problem the reader should be familiar with the idea of permutations with repetitions.

The first question we should ask ourselves, it's possible to solve this problem using the brute force approach under the 3 seconds time limit constraint? The answer is no, the reason is simple, we are given at most 12 digits and 20,000 test cases. In the worst case scenario we have $20,000 \cdot 12! = 9,580,032,000,000$, that is a pretty big number.

After realized that brute force is not feasible we should take a more clever approach. To simplify things a little bit I am going to divide the solution in the following two cases:

(1) All the digits in the sequence are unique.

A really important observation is that all the digits appear the same number of time for any position. Let's consider the initial example where the digits are ${1, 2, 3}$:

All the permutations gave us the following set:

$\left \{ 123, 213, 132, 231, 312, 321 \right \}$

If we put attention we can see that all the digits appear the same number of times (2 times) in each of the positions hundreds, tens and ones. The following example mark with red the hundreds positions:

$\left \{ {\color{Red}1}23, {\color{Red}1}32 \right \}, \left \{ {\color{Red}2}13, {\color{Red}2}31 \right \},\left \{ {\color{Red}3}12, {\color{Red}3}21 \right \}$

Another important detail to notice is once you fixed some arbitrary digit in certain position (e.g. ones, tens, hundreds, ...), the number of times that digit is going to occupied that position is $(n - 1)!$, where $n$ is the length of the target sequence. Now let's denote $S_{A_{k}}$ as the sum of all the positions that the $A_{k}$ digit can occupied in all the unique permutations of the sequence, for each of the digits we got the following:

$\\ S_{A_{1}} = 1(100) + 1(100) + 1(10) + 1(10) + 1(1) + 1(1) \\ S_{A_{1}} = 1(100 + 100 + 10 + 10 + 1 + 1) \\ S_{A_{1}} = 1(2!)(100 + 10 + 1) \\ S_{A_{1}} = 1(2!)(111)$ $\\ S_{A_{2}} = 2(100) + 2(100) + 2(10) + 2(10) + 2(1) + 2(1) \\ S_{A_{2}} = 2(100 + 100 + 10 + 10 + 1 + 1) \\ S_{A_{2}} = 2(2!)(100 + 10 + 1) \\ S_{A_{2}} = 2(2!)(111)$ $\\ S_{A_{3}} = 3(100) + 3(100) + 3(10) + 3(10) + 3(1) + 3(1) \\ S_{A_{3}} = 3(100 + 100 + 10 + 10 + 1 + 1) \\ S_{A_{3}} = 3(2!)(100 + 10 + 1) \\ S_{A_{3}} = 3(2!)(111)$ To get our result we just need to add up the partials sum of each element in the sequence: $\\ S = S_{A_{1}} + S_{A_{2}} + S_{A_{3}} = 1332$ Generalizing the results obtained in the previous steps we have: $\\ S_{A_{k}} = A_{k} (n - 1)! (\frac{10^n - 1}{9}) \\ S = S_{A_{1}} + S_{A_{2}} + ... + S_{A_{n}}$ Do not get confused by the formula $\frac{10^{n} - 1}{9}$ this formula gave us the elements in the sequence $\{0,1,11,111, ...\}$ (you can check this sequence in the OEIS website) .

(2) There is one or more digits repeated in the sequence.

The only difference between this case and the previous one is that now we need to eliminate the repetitions. Let's denote $f(A_{k})$ as the frequency of the digit $A_{k}$ in the sequence $A$. We got the following expression:
$\\ S_{A_{k}} = A_{k} \frac{(n - 1)!}{f(A_{1})! \cdot f(A_{2})! \cdot ... \cdot (f(A_{k}) - 1)! \cdot ... \cdot f(A_{n})! } (\frac{10^n - 1}{9}) \\ S = S_{A_{1}} + S_{A_{2}} + ... + S_{A_{j}}$ Notice that because we can have repetitions the total sum $S$, is restricted just to the unique elements of the sequence $A$, this is why we introduce a variable $j$ where $1 \leqslant j \leqslant n $. We can also observe that the cases where each digit appear only one time, all the frequencies are going to gave us a product of 1, this means that the first formula is just one special case of the second one, knowing this we can forget about the first one and generalize both cases with the current formula.

If we implement the ideas set out above we are going to end up with an algorithm with overall time complexity of $O(n)$.

Tuesday, October 30, 2012

UVA 11548 - Blackboard Bonanza

http://uva.onlinejudge.org/external/115/11548.html

This problem statement tried to mislead you with some weird game between two characters Alice and Bob. Let's not put to much attention in that and focus in the the real task:

Given $n (2 \leqslant n \leqslant 70) $ strings, you need to return the length of the best vertical alignment between an arbitrary pair of strings $S_{i}$, $S_{j}$ where $i \neq j$. What exactly does that mean? Let's take a look to an example to clear out some obscures points of this statement:

The following two pictures shows two possible alignments for the strings "ABCD" and "ADC", among them the second one is the best alignment which include the letters A and C for a total length of 2.

The constraints of this problem are not to high, to solve it we can simply iterate over all the possible pair of strings, and for each of them calculate all the possible alignments and keep the length of the best one.

The hardest part perhaps is the procedure to calculate the best alignment for a given pair of strings. In order to do that, we just need to take one of the string as point of reference, let's call this string $S$ and the another string $P$, at each step we start to compare each character of $S$ with each character of $P$, once we are done we repeat the process but now, beginning in the next letter of the string $S$. During this process if one the string reach it's last letter we restart the counter of matches, and start to count again from the current position, which is equivalent to begin another alignment.

This may sound a little bit confusing let's take a look to an example:

The previous picture simulates the idea of the algorithm as we can see using this method we consider all the possible alignments between "ABCD" and "ADC".

The overall time complexity of this solution is $O(N^{4})$.