Functional Programming Competition 2018
part of the Formal Methods and Functional Programming (FMFP) course at ETH Zürich
Every week a special Haskell programming exercise is published here and in Code Expert. Each exercise is carefully selected by the Master of Competition (MC)—Dmitriy Traytel—to separate the wheat from the chaff. After 7 weeks, the intersection of the wheat with the FMFP students will receive delicious prizes in a inlecture ceremony.
Participants submit their solutions via Code Expert. Submissions by mail are accepted only in two exceptional cases: (1) the participant is not enrolled at ETH and therefore cannot use Code Expert, and (2) the solution uses a Haskell library/pragma that is not available on Code Expert. In case (2) precise instructions of how to run the submission (including version numbers) are a necessary requirement for the solution to be evaluated.
Every week, MC's favorite submissions are discussed in a blog post. The submissions are evaluated with respect to different criteria (but the function's correctness is always a must have), which are announced along with the exercise. The best 10 solutions receive from 1 to 10 points (more is better). The points and the names of the participants are made public in a Top 10 of the week and aggregated in a semester ranking.
Important: If you don't want your name to appear on a publicly visible webpage (but still want to participate), write to the MC and suggest a pseudonym.
Finally, the MC employs a zero tolerance policy against plagiarism. Please compete fairly and individually. Submissions that have been developed jointly by two contestants have to be clearly marked as such. Violations of those rules may result in arbitrary point deductions.
Outline:
 Semester ranking
 The Task and Results of Week 1
 The Task and Results of Week 2
 The Task and Results of Week 3
 The Task and Results of Week 4
 The Task and Results of Week 5
 The Task and Results of Week 6
 The Task and Results of Week 7
Final ranking  

#  name  points 
1.  Matthias Brun 💥  53.0 
2.  Moritz Knüsel 🎩  50.0 
3.  Gabriel Giamarchi 🖥  49.0 
4.  Simon Yuan 📐  36.0 
5.  Börge Scheel  35.0 
6.  Loris Reiff 🏆  34.0 
7.  Miro Haller 🧟🔝  26.0 
8.  Roger Baumgartner  24.0 
9.  Mike Boss  23.0 
10.  Chris Buob  14.0 
11.  Silvan Niederer  13.0 
12.  Patrick Ziegler  11.5 
13.  Kevin de Keyser  10.0 
Tynan Richards  10.0  
15.  Miklós Horváth  9.0 
16.  Antonio Russo  6.0 
17.  Noah Berner  5.0 
Zur Shmaria  5.0  
Jan Wiegner  5.0  
20.  Mathias Blarer  3.5 
21.  Giuseppe Arcuti  3.0 
Lukas Baege  3.0  
Andrin Bertschi  3.0  
Lukas Heimes  3.0  
Felix Huber  3.0  
Timon Jenne  3.0  
Nikola Kovacevic  3.0  
Andreas Kuster  3.0  
Sandar Felicity Lim  3.0  
Manuel Nowack  3.0  
Felix SarntheinLotichius  3.0  
Roberto Starc  3.0  
Elwin Stephan  3.0  
Valentin Stoppiello  3.0  
35.  Jorel Elmiger  1.0 
Stefan Zemlijc  1.0 
Task of week 1
Write a function oddSum
that takes five Int
arguments as input and computes the sum of the maximum, the minimum, and the median of the arguments.
For example:
oddSum 1 2 3 4 5 = 1 + 3 + 5 = 9 oddSum 5 4 1 3 2 = 1 + 3 + 5 = 9 oddSum 1 1 5 3 3 = 1 + 3 + 5 = 9
The shortest (in terms of number of tokens^{1}) solution wins! Library imports and type signatures are excluded from the token count. As a measure of reference: the MC has an optimized (but not too crazily optimized) 27 token solution.
^{1. This competition continues a longstanding tradition of inlecture functional programming competitions established at TU München. The definition of tokens is borrowed form there.}Results of week 1
Here are the best
Top 10 of week 1  

#  name  #tokens  points 
1.  Matthias Brun  17  10 
2.  Loris Reiff  19  9 
3.  Simon Yuan  26  8 
4.  Roger Baumgartner  27  7 
Moritz Knüsel  27  7  
Börge Scheel  27  7  
7.  Gabriel Giamarchi  29  4 
8.  Giuseppe Arcuti  31  3 
Lukas Baege  31  3  
Noah Berner  31  3  
Andrin Bertschi  31  3  
Mike Boss  31  3  
Chris Buob  31  3  
Miro Haller  31  3  
Lukas Heimes  31  3  
Timon Jenne  31  3  
Nikola Kovacevic  31  3  
Andreas Kuster  31  3  
Sandar Felicity Lim  31  3  
Manuel Nowack  31  3  
Tynan Richards  31  3  
Felix SarntheinLotichius  31  3  
Roberto Starc  31  3  
Elwin Stephan  31  3  
Valentin Stoppiello  31  3  
Jan Wiegner  31  3 
The MC is happy about 104 student participants in this first week of the competition (+ 1 tutor + 1 professor + the MC). Several participants have submitted more than one solution without any comments (such as  grade me
), leaving the MC the choice to pick the worst one in terms of tokens. However, the MC was so happy about the fact that every participant has submitted at least one correct solution (determined by the MC by a mixture of testing and manual inspection), that he decided to take the best submission in terms of tokens into account.
Before we start looking at the best solutions, let's see what one can do with what we've seen in the lecture so far. The following solution is MC's attempt to only use the material from the lecture and also one of the submissions by our number 3, Simon Yuan (and a few others, who wasted tokens by adding some needless parentheses or the like). With 58 tokens it is far away from being Simon's or the MC's best solution and even far away from the top 10.
oddSum a b c d e  a > b = oddSum b a c d e  b > c = oddSum a c b d e  c > d = oddSum a b d c e  d > e = oddSum a b c e d  otherwise = a + c + e
To understand how this solution works, one has to realize that the guards are matched sequentially: i.e., the program will only check c>d
after having established a≤b
and b≤c
. With this insight it is pretty obvious what this code does: the last trivial guard otherwise
is only reached once a b c d e
is sorted.
Now we have a baseline to compete against. It is fairly obvious, that sorting should be part of any successful solution. Can we sort the five elements in Haskell, without implementing our own sorting algorithm? Of course: we can just use the right libary function. To use this function we must import
the library Data.List
, which contains many useful functions about the Haskell list type (more on this next week in the lecture; what you see below should be mostly intuitive usage of finite sequences of Int
elements). But, hey, imports cost 0 tokens!
Once the list is sorted, we have to think how to extract the minimal, maximal, and median element. Many solutions, including the following one by an anonymous professor, decided for an indexbased addressing, written list!!index
in Haskell.
oddSum a b c d e = let xs = sort [a,b,c,d,e] in xs !! 0 + xs !! 2 + xs !! 4
This gives 34 tokens (or 33 when using where
) and brings us closer to the top 10.
To get into the top 10, this time it was enough to access the first and last elements of the sorted list using library functions (head
and last
). Note that head xs
are two tokens, while xs!!0
are three. Alternatively, but less efficiently, one can extract the minimum
or the maximum
of the (already sorted!) list. Alas, all 31token solutions, including one submitted by a tutor who likes applicative functors (more on this much later in the lecture), are variations of the following two.
import Data.List oddSum a b c d e = maximum xs + minimum xs + xs !! 2 where lst = sort [a,b,c,d,e] oddSum' a b c d e = head xs + last xs + xs !! 2 where lst = sort [a,b,c,d,e]
Side remark 1: Some participants mistook tokens for characters, and tried to minimize the number of characters used in their program. Readability suffered from this quite a bit, e.g., oddSum a b c d e=xs!!0+xs!!2+xs!!4 where xs=sort[a,b,c,d,e]
. Useful advice: leave the reader of your program some air to breathe. It is better to kill needless parentheses than spaces (also in terms of tokens).
All solutions further up in the ranking use techniques other than indexing to extract the three elements from the sorted list. Gabriel Giamarchi uses the following trick (omitting the Data.List
import):
oddSum a b c d e = sum ((subsequences (sort [a,b,c,d,e]))!!21)
Before we talk about the semantics lets see if syntactically this solutions does its best in terms of tokens. No! Prefix function application always binds stronger than any infix operator. That is, f a b c d !! g x y z
is always parsed as (f a b c d) !! (g x y z)
.
Have I already mentioned what to do with needless parentheses? Don't get me wrong: Gabriel did a great job by using subsequences
in the first place. But those two parentheses around subsequences ...
cost him three competition points, which in the end might decide between Lindt and Migros Budget.
So how does this solution work? The library function subsequences
computes the list of subsequences (i.e., a list whose elements are lists themselves) of the input list, where a list xs
is a subsequence of another list ys
if one can obtain xs
by removing zero or more elements from ys
(but without reordering the elements). For example: subsequences [1,2,3] = [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]
. Now, when applied to a list with five elements, the result of subsequences
will contain the subsequence consisting from the first, middle, and last element in the 22nd position (i.e., accessible using the 0based index 21). This holds regardless of the list elements' types or values and is exploited here. Finally, the library function sum
sums up the elements of this subsequence, which is exactly what is asked for. BTW: you might ask: is computing subsequences the most efficient way to do this? The MC answers: certainly not, but who cares? It is all about tokens.
The best solutions of Moritz Knüsel and Simon Yuan are variants of this idea, which are more economic about tokens. Here is for example Simon's solution which came in third with 26 tokens:
oddSum a b c d e = sum $ subsequences (sort [a,b,c,d,e]) !! 21
Using the infix operator $
is a useful trick to save tokens (and sometimes also to improve readability). It is nothing else than function application (f $ x = f x
), however, the low precedence of this infix operator helps to avoid parentheses: instead of f (g x)
we can write f $ g x
(or f $ g !! i
instead of f (g !! i)
).
Let's look at two other 27 tokens solutions. The first one was found by the two students Börge Scheel and Roger Baumgartner. The second is the MC's best solution.
oddSum a b c d e = sum $ map head $ chunksOf 2 $ sort [a, b, c, d, e] oddSum a b c d e = sum $ scanr () 0 $ sort [a, b, c, d, e]
The first solution works by sorting and then using the library function chunksOf
, which splits a list into shorter ones (of length at most 2 here as indicated by the passed argument). We have chunksOf 2 [1,2,3,4,5] = [[1,2],[3,4],[5]]
. The next applied function is the important map
function, which we will see again in the course. It applies its first function argument (yes, in Haskell functions can be passed around as arguments—map is called a higherorder function because it takes a function as an argument; more on higherorder functions in two weeks in the lecture) to every element of the list passed as its second argument. For example: map head [[1,2],[3,4],[5]] = [head [1,2], head [3,4], head [5]] = [1,3,5]
.
The second solution uses an even more complicated higherorder function than map
: scanr
. In our example, it works as follows: scanr () 0 [1,2,3,4,5]
becomes
[1  (2  (3  (4  (5  0))) , 2  (3  (4  (5  0))) , 3  (4  (5  0)) , 4  (5  0) , 5  0 , 0] =
or written differently
[1  2 + 3  4 + 5  0 , 2  3 + 4  5 + 0 , 3  4 + 5  0 , 4  5 + 0 , 5  0 , 0] =
Reading this columnwise, we observe that after sum
ming this list the elements in the even positions (2 and 4) cancel out, whereas the elements in the odd positions (1, 3, and 5) survive this telescoping.
Side remark 2: When writing this explanation, the MC noticed that it is possible to use a very similar function scanr1 ()
instead of scanr () 0
to save one token. But unfortunately, we are past the deadline now.
All these tricks are nice, but they are still about 10 tokens away from the two best solutions. The latter came up with a way to avoid writing the list [a,b,c,d,e]
or even the five arguments to oddSum
. Here is Matthias Brun's winning entry with all the imports and a nice comment describing how the solution works:
import Data.List (sort, subsequences) import Text.Printf (printf) import Data.Composition ((.::.)) {  This solution works by computing the power set (subsequences) of the sorted  list containing the arguments. The 22nd element (i.e. index 21) of the power  set (as generated by 'subsequences') is always the one containing the  minimum, median and maximum (for a sorted list of 5 values), so we take its  sum. The 'read .::. printf "[%d,%d,%d,%d,%d]"' part simply reads in the  arguments and transforms them into a list in a way that uses few tokens. } oddSum = sum . (!! 21) . subsequences . sort . read .::. printf "[%d,%d,%d,%d,%d]"
The MC is amazed. This is pointfree functional programming at its best. (The pointfree style is often critisized for obfuscating the program. One particular problem is that it might be hard to see how many arguments a function takes. But it is the best way to save tokens. And in this, case the solution is fairly easy to read.) To the above explanation, the MC only want to add some comments on the operators .
and the funny .::.
, which for some reason makes the MC think of Space Invaders. Both denote function composition. We have:
(f . g) x1 = f (g x1) (f .::. g) x1 x2 x3 x4 x5 = f (g x1 x2 x3 x4 x5)
Hence, the above solution is equivalent to:
oddSum a b c d e = sum ((!! 21) (subsequences (sort (read (printf "[%d,%d,%d,%d,%d]" a b c d e)))))
The sum ((!! 21) (subsequences (sort ...)))
part looks familiar. We've seen the function read
, which converts String
s to values of different types (or complains with a parse error) in the first exercise class. The type of printf
is to complex to explain at this point; however when being used the function behaves just as its homonymous C ancestor (i.e., the number of arguments it takes depends on the formatting string).
The second best solution by Loris Reiff is quite similar (also using printf
and .::.
), except that it uses a slightly less tokenefficient way to extract the first, middle, and last argument from a fiveelement list.
The MC could not resist the temptation to look at some incorrect submissions: a glimpse into the abyss in a sense. There, one meets participants who tried to sacrifice everything for tokens: even correctness. Some omitted the sorting or did it halfheartedly:
oddSum a b c d e = head xs + last xs + xs!!2 where xs = [a, b, c, d, e] oddSum a b c d e = c + min a (min b (min c (min d e))) + max a (max b (max c (max d e)))
Others tried out mysterious ways:
oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x  x < [1..], elem x [a,b,c,d,e]] oddSum a b c d e = l!!0 + l!!2 + l!!4 where l = [ x  x < [1000..], elem x [a,b,c,d,e]]
Fortunately, all of them found the path of the righteous (wo)man in the end.
And in case you wonder: the longest correct solution has 302 tokens and is, well, a bit repetitive.
min2 a b c d e  minimum [a,b,c,d,e] == a = minimum [b,c,d,e]  minimum [a,b,c,d,e] == b = minimum [a,c,d,e]  minimum [a,b,c,d,e] == c = minimum [a,b,d,e]  minimum [a,b,c,d,e] == d = minimum [a,b,c,e]  minimum [a,b,c,d,e] == e = minimum [a,b,c,d] max2 a b c d e  maximum [a,b,c,d,e] == a = maximum [b,c,d,e]  maximum [a,b,c,d,e] == b = maximum [a,c,d,e]  maximum [a,b,c,d,e] == c = maximum [a,b,d,e]  maximum [a,b,c,d,e] == d = maximum [a,b,c,e]  maximum [a,b,c,d,e] == e = maximum [a,b,c,d] oddSum a b c d e = a + b + c + d + e  min2 a b c d e  max2 a b c d e
Wrapping up: what have we learned this week?
 Conciseness should not be mistaken for understandability or efficiency. But conciseness is somewhat easier to measure than efficiency, and a lot easier to measure than understandability. That is why it is the perfect criterion for the competition. (But we'll come to efficiency eventually.)
 Tokens are not characters. Kill needless parentheses. Keep useful spaces.
 Use lists to handle many elements uniformly.
 Sometimes it is helpful to structure your program in subtasks. Here: assemble a list, sort, and extract the result. Optimize each subtask separately.
Task of week 2
Lists have still not been spotted in the lecture so far. So let's do some number theory instead.
The digital product of a number n
is the product of its
(base 10
) digits. The computation of the digital product can be iterated until a single digit is obtained. The single digit is called the (multiplicative) digital root, whereas the number of iterations needed is called the (multiplicative) persistence.
For example:
999 has digital product 9 * 9 * 9 = 729 729 has digital product 7 * 2 * 9 = 126 126 has digital product 1 * 2 * 6 = 12 12 has digital product 1 * 2 = 2
Thus the digital root of 999
is 2
and the persistence is 4
(as we had to compute four digital products).
Your task is to implement the function persistenceRoot :: Integer > (Integer, Integer)
that computes both the persistence (first component) and the digital root (second component). E.g., persistenceRoot 999 = (4,2)
. (For a change we want to use Haskell's Integer
type of unbounded integer numbers instead of the machine integers Int
.)
As last week, the MC is looking for the shortest solution tokenwise. (Next week we will move to some other criterion.) His best attempt needs 30 tokens. Just beat it!
Results of week 2
Here is the overdue top 10 for week 2. This week 66 students have submitted a solution.
Top 10 of week 2  

#  name  #tokens  points 
1.  Mike Boss  19  10 
Matthias Brun  19  10  
Chris Buob  19  10  
Gabriel Giamarchi  19  10  
Moritz Knüsel 🎩  19  10  
Simon Yuan 📐  19  10  
7.  Tynan Richards  21  4 
8.  Roger Baumgartner  24  3 
9.  Loris Reiff 🏆  27  2 
Antonio Russo  27  2 
You may wonder about the emojis used above. The MC will explain. But first let us look at a solution that reflects the state of the lecture at the submission date. It was submitted by the still anonymous professor with the comment "I didn’t try to be clever." and counts 57 tokens.
persistenceRoot x = aux 0 x aux n x  x < 10 = (n,x)  otherwise = aux (n+1) (dp x) dp x  x < 10 = x  otherwise = (dp (x `div` 10)) * (x `mod` 10)
Without trying to be clever, there are some obvious further tokenoptimizations possible in the above solution. (E.g., what have we learned last week about needless parentheses?) Altogether this might save around 10–15 tokens, which would be still miles away from the top 10.
To get into the top 10, we should rather try to be clever. Being clever for the whole problem at once may cause some headaches, so lets
look at a subproblem first: computing the digital product (i.e., what the above dp
function computes). Using div
and mod
is a natural way to approach this subproblem. Another natural approach to get the digits of a number x
is
to print the number: show x
. Well, almost. This only gives one a list of Char
s, which we would want to convert to a
list of Integer
s. Unfortunately, it seems that there is no function to do this conversion directly, i.e., we need to take the
detour over Int
s: map toInteger (map digitToInt (show x))
. Finally, to compute the product of the digits we can
conveniently use the product
function. Making all of this pointfree, we obtain the digital product function with 9 tokens:
product . map toInteger . map digitToInt . show
This expression or a slight variation of it occurs in solutions submitted by Roger Baumgartner and Antonio Russo as well as in MC's own 30 tokens solution. It is good, but not exactly the stuff dreams are made from. The stuff dreams are made from for this exercise could be found on Hackage in the Data.Digits module: the digits
function gives one the digits of a number with respect to a given base. Hence, one digital product costs only 4 tokens as noticed by everyone else from the top 10:
product . digits 10
But one digital product is not enough. We need to iterate it and count the number of iterations until we reach a single digits.
Until, hmm until
... Here is the approach of the MC:
persistenceRoot = flip curry 0 $ until ((<10) . snd) $ (+1) *** product . map read . map return . show
Or equivalently written in a bit saner and a bit less pointfree fashion:
persistenceRoot x = until ((<10) . snd) ((+1) *** product . map read . map return . show) (0, x)
The function until
is another perfect match for this exercise: until test f x
computes the first y = f (... f (f x) ...)
for which test y
is satisfied. The MC is a bit embarrassed to have wasted 5 tokens on the digital product and 3 tokens on the (+1)
functions, which exists in the Prelude
under the name succ
.
Tynan Richards was more careful and used a bunch of funky functions from somewhat exotic modules. And he wrote a nice explanation that speaks for itself. Interestingly, Tynan's solution is actually recursive, i.e., he did not use a combinator (a la until
) to iterate digital products.
 The online IDE fails to import these packages import Data.Digits (digits) import Data.Tuple.Update (upd1) import Data.Tuple.Extra (dupe, first) import Data.Bool.Extras (bool) import GHC.Integer (leInteger) {  The pointfree style makes it hard to see exactly what this function does, so  I'll briefly try to explain...  'bool' is a function that works like an ifthenelse statement. It checks the  value of 'leInteger 10', which is equivalent to '(<=) 10'. If that condition  is false, then the input is smaller than 10 and we have the digital root, as  well as persistence 0, so we return the value calculated by 'upd1 0 . dupe'.  'dupe x' gives the tuple (x, x), 'upd1 0 .' sets the first value of that  tuple to 0, giving (0, x).  If the input is greater than 10, we instead return the value calculated by  'first succ . persistenceRoot . product . digits 10', which recursively  calculates the digital product ('persistenceRoot . product . digits 10'),  then adds one to the first element of the tuple ('first succ') when it  returns.  The <*> operator applies the function's argument to each of the expressions  of bool, which enables the complete lack of parentheses. } persistenceRoot :: Integer > (Integer, Integer) persistenceRoot = bool . upd1 0 . dupe <*> first succ . persistenceRoot . product . digits 10 <*> leInteger 10
Lets finally look at the winning solutions. Five of the six were slight variations of the following one by Simon Yuan. Again, it is beautifully documented.
import Data.List import Data.Bifunctor import Data.Digits 19 Token persistenceRoot :: Integer > (Integer, Integer) { The code explained: "product . digits 10" calculates the digital product (lets call this function dp). digits 10 n is evaluated to a Integral list, with the digits (in base 10) of n as its elements. product then takes to product of all the elements in that list. "iterate dp n" is evaluated to [n, dp n, dp (dp n), ...] This list contains the input n as well as all the intermediate steps. "break (<10)" splits the list into 2, it is a tuple of 2 lists. The breakpoint is at the first element for which (<10) is true. As soon as an element x is less than 10 (x is the first 1 digit number), x will be equal to dp x. Hence x is the digital root. "bimap genericLength head (a,b)" is evaluated to (length a, head b) The length of a is how many intermediate steps it took to reach a 1 digit number (= persistence). The head of b (or any element of b) is the digital root. To save 2 tokens, make the function pointfree by omitting the input twice. } persistenceRoot = bimap genericLength head . break (<10) . iterate (product . digits 10)
Above genericLength
is used instead of length
to obtain an Integer
rather than an Int
. The iterate
function deserves special attention: yes iterate f x
just iterates the function f
as in [x, f x, f (f x), ...]
. Remarkably, the iteration never stops, i.e., the output of iterate f x
is unconditionally an infinite list (more on this in a few weeks in the course). The lazy evaluation strategy of Haskell saves us from waiting infinitely long for it to finish. In the following computation we only need to observe a finite prefix of the infinite list.
The remaining 19 token solution by Gabriel Giamarchi is somewhat different:
import Data.Digits import Data.Tuple {  The idea is to avoid costly (in tokens) termination criteria in recursivefunctions.  We use an easily seen property, namely that the digital persistence of x is smaller or equal to x (Except for zero !!).  This allows to write the following function, where we zip from the right so that the minimum (lexicographical probably) selects the correct tuple.  Luckily this also works for zero :) } persistenceRoot :: Integer > (Integer, Integer) persistenceRoot x = swap $ minimum $ iterate (product . digits 10) x `zip` enumFromTo 0 x
This one beautifully relies on a mathematical property of the persistence (to prove the above "easily seen" property, prove first that the digital product of a number is smaller that the number for numbers > 10) and the default lexicographic ordering (i.e., (a,b) ≤ (c,d) = a < c  a = c && b ≤ d
) on pairs in Haskell. Moreover, it is not even pointfree, which usually hints that it is possible to beat it in terms of tokens. And indeed, here is MC's rewrite of Gabriel's solution that saves 1 token:
import Data.Digits import Data.Tuple import Control.Arrow import Control.Monad persistenceRoot = swap <<< minimum <<< zip `liftM2` iterate (product . digits 10) `arr` enumFromTo 0
Here, <<<
is another operator for composition .
(with precedence 1, whereas .
has 9) and arr
is another operator for application $
(with precedence 9 as any function written in backquotes, whereas $
has 0). Finally, the general function liftM2
, whose type cannot be explained at this point, behaves in this particular instance as liftM2 z f g x = z (f x) (g x)
.
Great, now to the emojis. They can be considered as special awards for a distinguished achievement in a particular exercise. The precise meaning and value (e.g., in terms of Schoggi) of these awards is unclear yet (even to the MC), but hey, now you have a nice emoji stuck to your name.
First of all, Simon Yuan receives the mathematician award 📐. One of his early submissions read ... . take 100000 . iterate ...
and he wrote lengthy justifications why this is safe:
If you are interested to read, I have also done some research ( you really dont have to read this ;) ): It is conjectured that 11 is the max persistence for base 10. 277777788888899 is the first number with persistence 11, and they have tested it up to 10^233. I actually tried to proof that 100000 is way more than enough by proving an upper limit directly (i.e. I tried to prove the conjecture (before I knew it was an conjecture) but couldnt really get anywhere. Then I tried to come up with a lower bound of the number of digits needed for it to have persistence at least p. ... [something about the number of atoms in the universe] ...
The MC had advised Simon not to use this program and not to rely on the size of the universe. Clearly, we want to write programs (especially in this competition) that will work forever and under all circumstances. And as we know the universe is expanding. Incidentally, taking x first elements of that list when computing the persistence of x (instead of the fixed 100000 first elements) will always suffice—this was also exploited in Gabriel's solution.
The second award goes to Moritz Knüsel. He is a true black hat 🎩 as his 13 token solution demonstrates (comments: his, censorship: MC's).
import System.Process import System.IO.Unsafe import Text.Printf { If this isn't allowed, please disregard this submission } persistenceRoot :: Integer > (Integer, Integer) { very slow, gets killed after completing the last test, also using ghci incurs an extra token, because it prints extra stuff unless told not to persistenceRoot = read . unsafePerformIO . readCreateProcess (shell "ghci v0") . printf "\ \import Data.List\nimport Control.Arrow\n\ \genericLength *** head $ span (>=10) $ iterate (product <<< fmap read <<< fmap return <<< show) %d\n" }  this goes faster, but kinda defeats the purpose of a  haskell programming exercise persistenceRoot = read . unsafePerformIO . readProcess "bc" [] . printf "\ \define digprod (x) {\n\ \if (x==0) return 1\n\ \return (x %% 10)*digprod(x/10)\n\ \}\n\ \define droot (x) {\n\ \if (x >= 10) return droot(digprod(x))\n\ \return x\n\ \}\n\ \define drootcnt (x) {\n\ \if (x >= 10) return 1+drootcnt(digprod(x))\n\ \return 0\n\ \}\n\ \define void printit(x) {\n\ \print \"(\", drootcnt(x), \",\" ,droot(x),\")\n\"\ \}\n\ \printit(%d)\n"
In principle, Moritz would have won this exercise, if it was not for the censored "back door" he used. To Moritz' defense: this was not explicitly prohibited in the rules (but of course in the unspoken rules of the competition) and Moritz was already regretting his usage of such an evil and unportable function as his first comment indicates. Challenge for everybody: can you write a solution in such a style (i.e., using an external interpreter) without using the back door? For this to work, the interpreter would need to be pure (i.e., a side effect free calculator) and expressive enough to computer the digital root and persistence. If something like that is possible, the MC has to redefine the notion of tokens to take the length of a literal string into account.
The last award goes to Loris Reiff. He receives the fairness award 🏆 not for a solution he submitted (his was rather unspectacular), but rather for a solution that he has not submitted:
Btw, today I was looking for solutions online for some inspiration for next time. I found one on Rosetta Code which only uses 21 tokens. This is obviously not a submission, since copying is no fun ;) [and I've already submitted anyway] but it might be something for the blog (if no one has submitted this solution or a better one). 21 m=(g*h).s(>9).i(p.d1)  slightly adapted from https://rosettacode.org/wiki/Digital_root/Multiplicative_digital_root#Haskell  import Data.List import Data.Digits import Control.Arrow mpmdr :: Integer > (Integer, Integer) mpmdr = (genericLength *** head) . span (> 9) . iterate (product . digits 10)
The MC feels guilty. His task is to come up with exercises that are original enough not to be on Rosetta code. It is surprising that even the type signature matches. Fortunately, the solution there is not tokenoptimal, yet it is uncomfortably close to the the optimum. The MC promises to research such "related work" better next time.
Lessons of the week, apart from that the MC should do his related work research properly, are:
 To get to the top 10 one typically has to try to be clever.
 Walk down the right back alley on Hackage and you can find anything.
 Include a comment describing your implementation to make MC's job easy and the MC happy. But be careful: the comment might end up on a public webpage.
 The definition of tokens should be reconsidered with regard to literal strings.
Task of week 3
Yay, finally some lists in the lecture!
Here is the task of week 3: Given a finite directed graph G and a subset of its vertices S, define the function that computes the minimal elements in S with respect to G. An element n is minimal in S with respect to G if there is no other element m in S, such that there exists a path from m to n in G (note that only the start point m and end point n of such a path are required to be in S).
We represent directed graphs by their adjacency matrix using the type [(Int,[Int])]
.
For example, the entry (1,[2,3,4])
in such an adjacency matrix means that vertex 1
has outgoing edges to the
vertices 2
, 3
, and 4
. And the entry (2,[])
means that 2
has no outgoing edge (to the same effect we could have just omitted that entry). You may assume that for each vertex there is at most one entry in the adjacency matrix. Moreover, we are using lists [Int]
to represent sets of verdices.
So we are looking for a function of type graphMin :: [(Int,[Int])] > [Int] > [Int]
that performs the computation described above. The function may return the minimal elements in any order (and even with duplicates).
For example:
import Data.List sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4] = [0] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,2,3,4] = [1,2] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [1,3,4] = [1,4] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,3,4] = [0] sort $ nub $ graphMin [(0,[1,2]),(1,[3]),(2,[4])] [0,1,2,3,4,5,6] = [0,5,6] sort $ nub $ graphMin [] [0,1,2,3,4] = [0,1,2,3,4] sort $ nub $ graphMin [(0,[1,2]),(1,[0])] [1,2,3] = [1,3] sort $ nub $ graphMin [(0,[1,2]),(1,[0])] [1,0,3] = [3]
The evaluation criterion is time efficiency. Your function will be tested on large graphs, small graphs, dense graphs, sparse graphs, short lists, and longer lists (but probably not too long lists). The solution of the MC is fairly unoptimized. It can handle lists of length up to few hundreds elements for larger graphs (100000 edges) in a few seconds. There is some room for improvement.
Hint: :set +s
in GHCi is useful to obtain some timing and memory measurements.
Results of week 3
This week there were only 26 student submissions (plus one by the MC). The MC is slightly disappointed. A decrease of 30% compared to the previous week is ok; a decrease of 60% compared to the previous week indicates that the exercise was simply too hard or not perceived as interesting. And this even though the idea behind this competition problem originated from a small "realworld" subproblem that the MC's alter ego has encountered in his recent (not yet published) research: compute the minimal elements of a set with respect to a preorder.
Additionally, the upcoming quiz and the associated preparations can be presumably blamed as well. (Although not for a 100% decrease in professorial participation. The professors forfeited to the so far absolutely dominant student performance.)
The MC hopes for at least a nondecrease in participation next week. Even if you have close to 0 points at this point, it is still possible to get into the top 10 of the semester, although it will probably be hard to get into the top 5.
Anyway, enough rambling. Let's get down to business.
Top ≈10 of week 3  

#  name  points  
1.  Moritz Knüsel  10  
1+ε.  Miro Haller  ➖🧟  
2.  Mike Boss  9  
Simon Yuan  9  
4.  Roger Baumgartner  7  
5.  Loris Reiff  6  
6.  Matthias Brun  5  
Antonio Russo  ➖5  
Zur Schmaria  5  
9.  Noah Berner  2  
Jorel Elmiger  ➖2  
Tynan Richards  2 
How are these results coming together? Through a sequence of increasingly harder tests. We start with a soundness test. For that purpose the MC uses his favorite random testing library: QuickCheck. Here is a ghci script that the MC ran on each of the submissions, after briefly ensuring that the submissions won't attempt to erase his file system (please don't do this: it would cost you a zillion competition points, but also earn you a 🎩).
:m + Test.QuickCheck Data.List let getAll xs x = nub $ concat [zs  (y,zs) < xs, x == y] let dom = map (abs . fst) let union xs = map (\x > (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs let test sol = quickCheckWith stdArgs { maxSize = 10, maxSuccess = 3000 } (\xs ys0 > let ys = nub $ map abs ys0 in let g = union xs in (sort $ nub $ graphMinSol g ys) == (sort $ nub $ sol g ys)) :set +s test GraphMin.graphMin
The MC should explain the quickCheckWith invokation. QuickCheck will execute its function argument in this invokation on 3000 randomly generated integer lists ys0 and graphs xs of length at most 10 (also all randomly chosen integers in those lists are between 10 and 10). To ensure that the randomly generated graphs satisfy the promised property of having at most one adjacency matrix entry for each vertex, the MC used the union
function defined above. For each of the 3000 tests, the output of MC's unoptimized solution graphMinSol
(shown below) is compared (modulo sorting and duplicates) to the output of the contestant's solution GraphMin.graphMin
.
import Data.List path g = go [] where go vis x y  x == y = True  otherwise = case lookup x g of Nothing > False Just zs > let vis' = x : vis in or [go vis' z y  z < zs \\ vis'] graphMinSol :: [(Int,[Int])] > [Int] > [Int] graphMinSol g qs = let ps = nub qs in filter (\p > all (\q > p == q  not (path g q p)) ps) ps
Only 17 participants (including the MC) passed this test. If your name does not occur somewhere below, try running QuickCheck on your solution to see the actual bugs. Three of the remaining participants would have failed the test, if the MC had not removed the negative numbers from the random inputs (via abs
). Since nothing in the exercise stated that all numbers will be nonnegative, these solution could be directly disqualified, in principle. In an unexpected bout of mercy, the MC decided to see where these contestants will end up and potentially penalize them with a point deduction later. As a reminder, let's put a "heavy minus" emoji next to their names:
Antonio Russo ➖, Jorel Elmiger ➖, Miro Haller ➖.
The whole truth about Miro Haller's submission is even more shocking. He spent great effort on making his solution very efficient. Yet a minor bug (even on positive numbers) compromised his whole work. When the MC reported the bug to Miro after the submission deadline, he quickly came up with a fix that still failed on negative numbers, but seemed to be correct otherwise. It is clear that the MC cannot give points to such a late submission, but he decided to at least evaluate this submission's efficiency. In other words: here is Miro's well deserved zombie emoji 🧟.
Something similar happened also to Matthias Brun's efficient solution. However, he was suspecting a bug and took precautions. He submitted an alternative "naive" solution and asked the MC to grade that one. A wise decision to keep some points.
OK, next test. First, here are three particular (families of) graphs:
k i = map (\k > (k, [1..k1] ++ [k+1..i])) [1..i]  komplete [*] c i = map (\k > (k, [(k + 1) `mod` i])) [0..i1]  cycle l i = map (\k > (k, [k + 1])) [1..i]  line
[*] Wikipedia is not sure whether complete graphs are typically abbreviated with a "k", because of the German word komplett (which seems to be a false friend) or the mathematician Kuratowski.
The MC then tested the following three graphs.
let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [1..i]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (c i) [0..i1]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (l i) [1..i]) == [1]
And here are the results (in no particular order).
TIMEOUT Dan Kluser 0.01 0.01 0.21 Loris Reiff 0.05 0.10 0.05 Roger Baumgartner 0.01 1.48 4.05 Sandro Panighetti 0.01 0.01 0.01 Moritz Knüsel TIMEOUT Patrick Ziegler TIMEOUT Silvan Niederer 0.01 0.01 0.01 Mike Boss 0.01 0.01 0.02 Zur Shmaria 0.02 4.35 4.32 Noah Berner 0.02 5.32 2.61 Tynan Richards 0.01 0.04 0.07 Antonio Russo ➖ 0.02 5.32 1.92 Jorel Elmiger ➖ 0.01 0.28 0.56 MC 0.02 0.34 0.14 Matthias Brun 0.01 0.01 0.01 Miro Haller 🧟 0.06 0.01 0.01 Simon Yuan
Three timeouts and a bit fluctuation otherwise. Not really conclusive. But you already see three solutions that are equally fast on all graphs: Moritz Knüsel, Mike Boss, and our dead man walking. The timeouts are eliminated before the next round of course.
The MC also played around with some permutations and subsequences of the above arguments. The test let i = 20 in (sort $ nub $ GraphMin.graphMin (k i) [i,i2..0]) == []
has revealed a bug in Sandro Panighetti's solution, which managed to survive the QuickCheck tests. A good reminder that testing is good, but proving is better. The MC decided at this point that the remaining 11 contestants will all end up in the top 10. So the next tests are about the final placing.
Here is the next challenge: lines with many complete subgraphs along them.
:m + Data.List let getAll xs x = nub $ concat [zs  (y,zs) < xs, x == y] let dom = map (abs . fst) let union xs = map (\x > (x, filter (/= x) $ map abs $ getAll xs x)) $ dom xs let k i j = map (\k > (k, [i..k1] ++ [k+1..j])) [i..j] let l i = map (\k > (k, [k + 1])) [1..i] :set +s let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0..i]) == [] let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,10..i]) == [0] let i = 100 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,20..i]) == [0] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0..i]) == [] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,10..i]) == [0] let i = 300 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,20..i]) == [0] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0..i]) == [] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,10..i]) == [0] let i = 600 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,20..i]) == [0] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0..i]) == [] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,10..i]) == [0] let i = 900 in (sort $ nub $ GraphMin.graphMin (union (l i ++ concatMap (\i > k i (i + 9)) [0,10..i10])) [0,20..i]) == [0]And the corresponding results:
0.02 0.02 0.01 0.07 0.14 0.09 0.19 0.73 0.50 0.57 2.35 1.46 Loris Reiff 0.04 0.01 0.02 0.20 0.12 0.07 0.87 0.27 0.30 1.98 0.71 0.66 Roger Baumgartner 0.02 0.01 0.01 0.04 0.07 0.06 0.22 0.11 0.14 0.32 0.26 0.38 Moritz Knüsel 0.06 0.02 0.02 0.11 0.12 0.15 0.42 0.43 0.21 0.50 0.75 0.52 Mike Boss 0.03 0.02 0.02 0.13 0.09 0.10 0.61 0.25 0.47 1.52 0.68 0.99 Zur Shmaria 0.27 0.04 0.03 6.14 0.67 0.37 49.61 TIMEOUT Noah Berner 0.26 0.05 0.03 5.79 0.64 0.37 42.66 TIMEOUT Tynan Richards 0.03 0.02 0.02 0.12 0.11 0.12 0.48 0.36 0.43 0.90 0.58 0.50 Antonio Russo ➖ 0.37 0.02 0.02 9.44 0.14 0.09 TIMEOUT Jorel Elmiger ➖ TIMEOUT MC 0.07 0.02 0.01 0.79 0.15 0.11 5.47 0.78 0.52 17.09 2.27 1.35 Matthias Brun 0.06 0.01 0.02 0.06 0.06 0.05 0.23 0.23 0.22 0.48 0.48 0.33 Miro Haller 🧟 0.03 0.01 0.01 0.07 0.06 0.05 0.21 0.20 0.21 0.46 0.38 0.34 Simon Yuan
The MC is defeated. And with him three other contestants.
Next, we'll try something more intuitive and realistic. How about the prime numbers?
:m + Data.List Data.Hashable let multiples k b = [k * i  i < [1..b`div`k + 1]] let g i = map (\k > (k, multiples k i)) [1..i] let g1000 = g 1000 let g5000 = g 5000 let g10000 = g 10000 :set +s hash $ GraphMin.graphMin g1000 [2..length g1000] hash $ GraphMin.graphMin g5000 [2..length g5000] hash $ GraphMin.graphMin g10000 [2..length g10000]
Note, how the hash
function is used to avoid measuring the time needed to print the results. The minimal elements of the above graphs are exactly the prime numbers (up to the bound b). Perhaps this is the best algorithm for computing primes?
0.23 5.25 22.53 Loris Reiff 0.09 0.32 1.09 Roger Baumgartner 0.04 0.11 0.24 Moritz Knüsel 0.04 0.15 0.48 Mike Boss TIMEOUT Zur Shmaria TIMEOUT Antonio Russo ➖ TIMEOUT Matthias Brun 0.04 0.18 0.45 Miro Haller 🧟 0.08 0.42 1.87 Simon Yuan
Three students gone. Making the divisibility graph even bigger (e.g., let g100000 = g 100000
), while keeping the list size (hash $ GraphMin.graphMin g100000 [2..10000]
) also makes Loris Reiff's and Roger Baumgartner's solution time out.
Time for a final showdown. Let's go Big Data. In an intermediate submission Moritz Knüsel claims to have tested his solutions on graphs with 20 millions edges. The divisibility graph with 2 millions vertices, has roughly 31 millions edges.
:m + Data.List Data.Hashable let multiples k b = [k * i  i < [1..b`div`k + 1]] g i = map (\k > (k, multiples k i)) [1..i] let gx = g 2000000 :set +s hash $ GraphMin.graphMin gx [30000,29998..10000]
And here the final results:
13.84 Moritz Knüsel TIMEOUT Mike Boss 21.79 Miro Haller 🧟 TIMEOUT Simon Yuan
Pretty amazing, huh? Note that those huge graphs are processed without any parallelization on a usual laptop.
Finally, lets briefly sample a few of the best solutions. They are a bit long and the MC will (as usual) rely on the comments provided by the contestants.
Mike Boss uses Haskell's Data.Graph library:
module GraphMin where { Grade this (for real now)  Longer first part is just the conversion to type Graph. Problems with nonexisting vertices and renaming are fixed.  notInG: all endpoints of edges which are not in the graph  allOfG: all vertices from the graph (even ones with no outgoing edges)  sButNotG: vertices which are in the set but not in the graph =< are minimum  sAndG: vertices in the set which are also in the graph  convertSet: fixes renaming  Second part is the actual algorithm: Deletes all vertices reachable from the vertices in the set which means they aren't minimums. } import Data.Graph import Data.Maybe import qualified Data.IntSet as Set graphMin :: [(Int,[Int])] > [Int] > [Int] minimumInGraph :: Graph > Set.IntSet > Set.IntSet > [Int] graphMin graph set = [x  (x, _, _) < result] ++ (Set.toList sButNotG) where realSet = Set.fromList set edges = Set.fromList (concat $ map snd graph) vertices = Set.fromList (map fst graph) notInG = Set.difference edges vertices allOfG = Set.union vertices notInG sButNotG = realSet Set.\\ allOfG sAndG = realSet Set.\\ sButNotG correctedGraph = [(s, [])  s < (Set.toList notInG)] ++ graph (builtGraph, v, k) = graphFromEdges (map (\(a, b) > (a, a, b)) correctedGraph) convertSet = Set.map (fromJust . k) sAndG result = map v (minimumInGraph builtGraph convertSet convertSet) minimumInGraph graph set result  Set.null set = Set.toList result  Set.null result = []  Set.size set == 1 = Set.toList (result Set.\\ reachFromS)  otherwise = minimumInGraph graph (Set.delete head set) (result Set.\\ reachFromS) where reachFromS = Set.delete head (Set.fromList (reachable graph (head :: Int))) head = Set.findMin set
Miro Haller's corrected solution has a nice complexity analysis, which the MC didn't attempt to verify. First Miro has to fix the negative number problems, anyway.
 I use containers0.5.10.2 (https://hackage.haskell.org/package/containers0.5.10.2) module GraphMin where import Prelude hiding (fromList) import Data.IntMap as IntMap import Data.Sequence as Seq  I use IntMap because they have (almost) constant access time I think.  It depends on the number of bits the integers use.  Also I rely on the fact that the nodes are numbered in a more or less  ascending sequence of natural numbers graphMin :: [(Int,[Int])] > [Int] > [Int] graphMin graph subset = findUnreachable subset [] where  O(n) graphNodes = [node  (node, xs) < graph]  O(n) maxNodeNr = maximum (graphNodes ++ subset)  Visited array O(n) (I assume n = O(maxNodeNr)) visited = IntMap.fromList [(node, False)  node < [0..maxNodeNr]]  Convert out nodes lists to sequences O(n^2) graphSeq = [(node, Seq.fromList outNodes)  (node, outNodes) < graph]  Out edges O(n) outEdges = IntMap.fromList graphSeq  True if node is in subset  O(n) isInSubset = IntMap.fromList [(node, True)  node < subset]  Array with for each node nodes from which it is reachable  O(n) reachableFromNodesInit = IntMap.fromList [(node, [])  node < [0..maxNodeNr]]  DFS like algorith, write from node to reachable nodes  O(n^2 * log(n) ) dfsIter [] reachableFromNodes = reachableFromNodes dfsIter (subsetNode:subsetRest) reachableFromNodes = dfsIter subsetRest newReachableFromNodes where  O(1) connectedNodes = findWithDefault Seq.Empty subsetNode outEdges  O(n) newReachableFromNodes = dfsFindReachable connectedNodes reachableFromNodes visited where dfsFindReachable Seq.Empty reachableFromNodes visitedLoc = reachableFromNodes dfsFindReachable todoNodesSeq reachableFromNodes visitedLoc  Cases checking O(1)  findWithDefault False nextNode isInSubset = dfsFindReachable nodesRest extendedReachableFromNodes visitedNew  findWithDefault False nextNode visitedLoc = dfsFindReachable nodesRest reachableFromNodes visitedNew  otherwise = dfsFindReachable extendedListRest reachableFromNodes visitedNew where updateFn key x = True visitedNew = adjustWithKey updateFn nextNode visitedLoc  Both O(1) because it is the first entry nextNode = Seq.index todoNodesSeq 0 nodesRest = Seq.deleteAt 0 todoNodesSeq  O(1) extendedReachableFromNodes = IntMap.adjust (subsetNode :) nextNode reachableFromNodes  O( log( min{n1, n2} ) ) where n1,n2 are the lengths of the sequences extendedListRest = (findWithDefault Seq.Empty nextNode outEdges) >< nodesRest  Comp. stands above reachableFromNodes = dfsIter subset reachableFromNodesInit  Find all nodes that are at most reachable from itself (through a path through nodes  which are not in the subset S)  O(n^2) findUnreachable [] unreachableNodes = unreachableNodes findUnreachable (subsetNode:subsetRest) unreachableNodes  hasOtherNode (findWithDefault [] subsetNode reachableFromNodes) = findUnreachable subsetRest unreachableNodes  otherwise = findUnreachable subsetRest (subsetNode : unreachableNodes) where  Check if a node is reachable from another node  O(n) hasOtherNode [] = False hasOtherNode (otherNode:listRest) = if subsetNode == otherNode then hasOtherNode listRest else True
Finally, Moritz Knüsel has apparently found the right balance between using lazy and eager data structures. And it nicely handles edge cases separately.
{# LANGUAGE BangPatterns #} module GraphMin where import Data.List import qualified Data.IntSet import qualified Data.IntMap.Lazy  final submission type LG = [(Int,[Int])] graphMin :: LG > [Int] > [Int] graphMin g s = graphMin11 g s  Basic Idea:  from every node in S, we walk the graph and find all elements in S that are  reachable from that particular Node, except the node itself,  since a minimal element may still have paths coming back to itself  the union of all nodes discovered like this is then subtracted from S,  yielding only the nodes that are not reachable from any other   This would obviously be way too slow, so, on a walk starting at a node v,  we save all nodes we've seen, except the ones that are on a path from v to v  these nodes are saved in a map, together with v,  so we can avoid walking that path again later  This collection of nodes and the map with shortcuts is  then passed along for the search starting at the next node in S  because of stack overflow problems some strictness annotations  have been added, which seemed to solve them  shortcuts for all the datastructures and functions for accessing them type Set2 = Data.IntSet.IntSet ins2 = Data.IntSet.insert empty2 = Data.IntSet.empty toLst2 = Data.IntSet.toList fromLst2 = Data.IntSet.fromList del2 = Data.IntSet.delete isIn2 = Data.IntSet.member union2 = Data.IntSet.union size2 = Data.IntSet.size diff2 = Data.IntSet.difference type G2 = Data.IntMap.Lazy.IntMap [Int] lgToG2 = Data.IntMap.Lazy.fromList getEdges2 g v = Data.IntMap.Lazy.findWithDefault [] v g  type GoesToMap2 = Data.IntMap.Lazy.IntMap Int emptyGTM2 = Data.IntMap.Lazy.empty memberGTM2 = Data.IntMap.Lazy.member getGTM2 = Data.IntMap.Lazy.findWithDefault 0 unionGTM2 = Data.IntMap.Lazy.union fromSetGTM2 = Data.IntMap.Lazy.fromSet mergeSetGTM2 gtm set v = unionGTM2 gtm (fromSetGTM2 (const v) set)  recursively look at all nodes reachable from a given node, recording the path taken fun11 :: GoesToMap2 >  contains nodes that, in earlier searches,  were found to loop back to where the search started Set2 >  contains all nodes that go back to the value this search started at,  needed in case of loop back paths with cycles in them G2 >  the graph we're looking at Set2 >  the set S Set2 >  the set of points we've already visited. This set may  contain nodes that are in lgtm, since these will only be  removed later, so it's important to check lgtm first Set2 >  the nodes on the path we've come down Int >  the current node Int >  the node we started at (Set2,Set2)  returns the set of nodes seen and a set of nodes that lead back to sval.  If the current node is not on such a path, this will be empty fun11 gtm lgtm g s !seen pset v sval  we've come back around, so we need to return the nodes we took to get here  sval == v && not (vInSeen) = (seen,vpset)  we've come back around, but sval is already known not to be minimal  sval == v && vInSeen = (seen,empty2)  the current node is on a path that leads to sval, so we need to save how we got here  mlGTMv = (seen, vpset)  we've already visited this node at some point  vInSeen = (seen,empty2)  the current node is in S, we can't go further  isIn2 v s = (vseen,empty2)  the current node leads to an element in S, but we can skip going down the entire path  since all other nodes on that path are not in S, we don't need to add all of them to seen  mGTMv = (ins2 gGTMv vseen,empty2)  otherwise =  recurse on v's neighbors descend11 gtm lgtm g (getEdges2 g v) s vseen vpset sval where vInSeen = isIn2 v seen vpset = ins2 v pset vseen = ins2 v seen mGTMv = memberGTM2 v gtm mlGTMv = isIn2 v lgtm gGTMv = getGTM2 v gtm  helper function for the recursion  since, in the function above, we check lgtm before seen,  we can avoid removing elements from seen here descend11 :: GoesToMap2 > Set2 > G2 > [Int] > Set2 > Set2 > Set2 > Int > (Set2,Set2) descend11 _ lgtm _ [] _ seen _ _ = (seen,lgtm) descend11 gtm lgtm g (e:es) s !seen pset sval = let (nseen,npset) = fun11 gtm lgtm g s (seen) pset e sval in descend11 gtm (union2 lgtm npset) g es s nseen pset sval  searches on all elements in S  if a search discovered paths that go back to its starting node, all nodes on such paths  are taken out of seen (if we changed some guards around in fun11, this would not be necessary  but some quick tests indicate it helps if s is large and g is dense) and added  to the map gmhelp11 :: G2 > [Int] > Set2 > Set2 > GoesToMap2 > Set2 gmhelp11 g [] s seen gtm = seen gmhelp11 g (e:es) s !seen gtm = let (nseen,npset) = descend11 gtm empty2 g (getEdges2 g e) s seen empty2 e in gmhelp11 g es s (diff2 nseen npset) (mergeSetGTM2 gtm npset e)  transforms the graph and S into the datastructures used above, calls gmhelp11 and  subtracts the result from S graphMin11 :: LG > [Int] > [Int]  get trivial cases out of the way right away graphMin11 _ [] = [] graphMin11 _ [x] = [x] graphMin11 [] s = s graphMin11 g s = toLst2 $ diff2 s' (gmhelp11 (lgToG2 g) s s' empty2 emptyGTM2) where s' = fromLst2 s
Oh, and the heavy minuses translate into a one point deduction (truncated at 0).
What have we learned?
 Quickly check your program using
<insert bad pun about a great library>
. Seriously: Soundness is indispensable. Random testing does not ensure soundness, but helps eliminate silly mistakes.  Almost all
correctnot obviously wrong solutions got into the top 10.  Remember, remember, the negative numbers.
 Using the library is good. Outsmarting the library is better.
 When the MC evaluates efficiency, he makes random, subjective choices. Don't try to guess what he will evaluate, but just be efficient in all cases.
Task of week 4
A string s
is a subsequence of another string t
, if one can obtain s
by removing characters from t
. For example: The string "foo"
is a subsequence of "fofo"
, "frohlocken"
, and "funktionale Programmierung is einfach toll"
. It is not a subsequence of "ofo"
, "FoO"
, "oooo"
, and "tofu"
.
A lists of strings xs
is subsumed by another list of strings ys
, if for each string s
from xs
, there exists a string t
in ys
, such that s
is a subsequence of t
. Moreover, you may not use strings from ys
more than once (as witnesses for the above "exists").
Write a function subsub :: [String] > [String] > Bool
that checks whether its first argument is subsumed by its second argument.
All following examples should evaluate to True
:
subsub ["a", "ab", "abc"] ["false", "bab", "cdb", "cabAdAbc", "true"] subsub ["FM", "FP"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"] subsub ["FP", "FM"] ["Ferrovia Mesolcinese", "Microsoft FrontPage"] subsub ["1+2=2"] ["12+12=24"] not $ subsub ["FM", "FP"] ["FMFP"] subsub ["FM", "FP"] ["FMFP", "FMFP"] subsub ["FM", "FM"] ["FM", "Ferrovia Mesolcinese"] not $ subsub ["FM", "FM"] ["FM"]
Evaluation criterion: efficiency. Your solution will be tested with short and long lists and short and long strings.
Results of week 4
Evaluation time! The MC apologizes for taking so much time.
As last week, there were 26 student submissions. A 0% decrease is exactly what the MC hoped (and asked) for. Maybe for the next week, the MC should ask for a slight (say 10%) increase in participation.
Here is the top 16:
Top 16 of week 4  

#  name  points  
1.  Moritz Knüsel  10  
2.  Gabriel Giamarchi 🖥  9  
3.  Mathias Brun  8  
Miro Haller 🔝  8  
Loris Reiff  8  
Börge Scheel  8  
7.  Simon Yuan  4  
8.  Felix Huber  3  
Patrick Ziegler  3  
10.  Roger Baumgartner  1  
Mathias Blarer  1  
Mike Boss  1  
Chris Buob  1  
Kevin De Keyser  1  
Tynan Richards  1  
Stefan Zemljic  1 
As usual, the MC first consulted his good old friend QuickCheck, to rule out incorrect submissions. Here are the tests that each submission had to pass:
:m + Test.QuickCheck Data.List Safe let alt xs ys = case xs of [] > ys ; (x:xs) > x : alt ys xs let qc = quickCheckWith stdArgs { maxSize = 10, maxSuccess = 3000 } let t0 sol = qc (\xs ys > subsub xs ys == sol xs ys) let t1 sol = qc (\xs ys > sol xs (map (alt ys) xs)) let t2 sol = qc (\xs > sol xs (reverse xs)) let t3 sol = qc (\xs > sol xs (sort xs)) let t4 sol = qc (\xs > sol xs (concat xs : tailSafe xs)) let t5 sol = qc (\xs > sol xs (replicate (length xs) $ concat xs)) :set +s t0 SubSub.subsub t1 SubSub.subsub t2 SubSub.subsub t3 SubSub.subsub t4 SubSub.subsub t5 SubSub.subsub
Only t0
compares the outcome of a student submission SubSub.subsub
with the following naive solution subsub
by the MC.
import Data.List subseq [] _ = True subseq _ [] = False subseq (x : xs) (y : ys)  x == y = subseq xs ys  otherwise = subseq (x : xs) ys subsub :: [String] > [String] > Bool subsub [] ks = True subsub (l : ls) ks = any (\k > if subseq l k then subsub ls (delete k ks) else False) ks
All other tests do not require a "master solution", i.e., could have been easily run by the participants. The function tailSafe
from the Safe
module (cabal install safe
) returns []
for input []
.
Only 17 student submissions passed these tests. All but one of them have received some points this week. The unlucky one—Valentin S.—who passed the QuickCheck tests but didn't get a point, was caught producing a wrong result on one of the subsequent (nonrandomized) tests. A good reminder that testing can only prove the presence of bugs, but not their absence!
The naive solution by the MC is exponential in the worst case (more precisely something like length ks!
where !
is a factorial and not an exclamation!): each call to subsub
may yield up to length ks
recursive calls. The 1 point solutions are all variations of this approach (with more or fewer optimizations for special cases, e.g., sorting the lists of strings by lengths and filtering out too short strings from ks
or using Sequences instead of lists). Regardless whether the optimizations were there, all exponential solutions time out on one of the following simple performance tests:
:m + Data.List let s = "FMFP" let xs = subsequences s let n = length xs :set +s SubSub.subsub xs (replicate n s) not $ SubSub.subsub xs (map (\_ > 'x') s : replicate (n  2) s) not $ SubSub.subsub xs (map (\_ > 'x') s : map (\_ > 'x') s : replicate (n  2) s) :q
For transparency, here are the timing results (not sorted in any interesting order).
0.02 0.01 0.01 Felix Huber 0.01 0.00 0.01 Loris Reiff TO Roger Baumgartner 0.01 0.00 0.01 Gabriel Giamarchi TO Mathias Blarer 0.01 0.00 0.00 Börge Scheel 0.02 0.00 0.00 Miro Haller 0.01 0.00 0.01 Moritz Knüsel 0.04 0.02 0.02 Patrick Ziegler 0.03 0.01 0.03 Simon Yuan BUG V. Stoppiello TO Kevin De Keyser TO Mike Boss TO Chris Buob TO Tynan Richards TO Stefan Zemljic 0.01 0.00 0.01 Matthias Brun 0.01 0.01 0.01 MCmemo TO MCnaive TO MCsorted
Actually, the above statement about all exponential solutions timing out is not entirely true. MC's solution MCmemo is a refinement of his MCsorted solution, which in turn is a refinement of MCnaive. All of them are exponential in the worst case. MCmemo uses memoization to avoid recomputing recursive calls with the same parameters. This allows it to handle these test cases efficiently. Memoization has a cost though, which becomes especially visible when considering memory consumption (not shown here). But lets leave the optimization of memory consumption for another competition task.
import Data.List import Data.Function.Memoize subseq [] _ = True subseq _ [] = False subseq (x : xs) (y : ys)  x == y = subseq xs ys  otherwise = subseq (x : xs) ys subsub :: [String] > [String] > Bool subsub [] _ = True subsub _ [] = False subsub ls ks  length ks < length ls = False  otherwise = subsub' (prepare ls) (prepare ks) where prepare = sortOn minimum . filter (/= []) subsub' = memoize2 subsub'' subsub'' [] ks = True subsub'' (l : ls) ks = let ms = dropWhile (\k > minimum k > minimum l) ks in any (\k > if subseq l k then subsub' ls (delete k ms) else False) ms
As the above comments suggest, everybody else used a more tractable algorithm to solve the problem. To quote Felix Huber "the problem statement screamed for a flow graph approach", and indeed all the remaining solution have used one or another algorithm solving the maximal cardinality bipartite matching problem. Namely: if one views the strings in the two lists as vertices in two partitions of a bipartite graph where two vertices are connected if one is a subsequence of another, then the problem at hand immediately reduces to finding a matching of a certain size.
Side remark: When he was coming up with this exercise, the MC had indeed first started with the NPcomplete subsumption problem, in which the subsequence test is replaced by a matching test (on predicate logic terms). The MC decided to ask the simpler problem, since algebraic datatypes (useful to represent terms) have not been covered in the lecture on the due date of this exercise. However, the MC didn't bother to change his exponential solution, which was originally crafted for an NP complete problem. After all, the task of the MC is to leave the stage for the competing participants and their creativity. (This is a cheap way of saying that the MC didn't realize how much simpler the problem became complexitywise.)
Certainly, not all polynomial solutions are equally fast. We scale the previous test up by starting from let s = "FMFPPFMF"
instead of let s = "FMFP"
. This eliminates two competitors as well as the MC's memoizing solution:
TO Felix Huber 7.04 6.86 5.74 Loris Reiff 6.13 0.25 5.23 Gabriel Giamarchi 6.23 0.00 5.47 Börge Scheel 0.67 0.00 0.78 Miro Haller 2.45 1.90 1.80 Moritz Knüsel TO Patrick Ziegler 6.85 6.41 6.47 Simon Yuan 0.79 0.82 0.78 Matthias Brun TO MCmemo
Interestingly, the two eliminated solutions both use the Data.Graph library. Apparently, not a very optimized library for this task.
A better library seems to be available in a separate, noteasytoinstall package. It implements an augmenting path algorithm for computing the maximum matching. Five of the remaining seven submissions have used this package, which in a comment says: More efficient than using MaxFlow from FGL with constant weight and additional source and sink nodes. But not as good as Hopcroft–Karp would be.
Miro Haller's submission uses another library, which implements the HopcroftKarp algorithm. At 365 lines of code, Miro earns the award for going hopelessly over the top with a competition exercise 🔝. (Or in other words: that's the spirit!) And even if you deduct the 200 lines that come from the HopcroftKarp library (as acknowledged in a comment by Miro), this is still the longest solution. But is it also the best?
To be honest, the MC had a hard time finding the fine line between those 7 solutions, which all were pretty efficient. On the other hand, he didn't want to give the full points to all 7. The solution is very pragmatic. The MC decided to select a few tests (completely at random, unbiased, and very objective), measure the time for all the submissions, take the sum, and see where we end. For this ingenious approach, the MC awards himself the dice of true randomness 🎲.
Here are the selected tests: First something pseudorandom:
:m + Data.List Data.Char Data.Hashable Data.List.Split let next seed xs = chr ((hash xs * seed) `mod` 26 + 97) let mk seed xs = let x = next seed xs in x : mk ((seed * ord x) `mod` 500 + seed) (x : xs) let len = 100 let num = 400 let seed = 42 let ys = take num $ chunksOf len $ mk seed "" let xs = take (num `div` 2) (map (take (len `div` 10)) ys) :set +s SubSub.subsub xs (reverse ys)
This yields a bipartite graph with 200+400 vertices and 19009 (roughly evenly distributed among the 200 nodes in the smaller part of the graph) edges.
Second, something with similar characteristics but false.
:m + Data.List Data.Char Data.Hashable Data.List.Split let next seed xs = chr ((hash xs * seed) `mod` 26 + 97) let mk seed xs = let x = next seed xs in x : mk ((seed * ord x) `mod` 500 + seed) (x : xs) let len = 100 let num = 400 let seed = 42 let ys = take num $ chunksOf len $ mk seed "" let xs = take (num `div` 2) (map (take (len `div` 10)) ys) :set +s not $ SubSub.subsub xs (map (filter (/='m')) $ reverse ys)
Third, something with a lot of duplicated junk.
:m + Data.List let alt xs ys = case xs of [] > ys ; (x:xs) > x : alt ys xs let xs = map (flip replicate 'a') [5 .. 300] let ys = concatMap (flip replicate "abc") [5 .. 300] let zs = alt xs ys :set +s SubSub.subsub xs zs
Fourth, something with quite some junk.
:m + Data.List Data.Char Data.Hashable Data.List.Split let next seed xs = chr ((hash xs * seed) `mod` 26 + 97) let mk seed xs = let x = next seed xs in x : mk ((seed * ord x) `mod` 500 + seed) (x : xs) let alt xs ys = case xs of [] > ys ; (x:xs) > x : alt ys xs let xs = map (flip replicate 'a') [5 .. 300] let ys = take 4514 $ chunksOf 4 $ mk 42 "" let zs = alt xs ys :set +s SubSub.subsub xs zs
Fifth, something that is close to a full bipartite graph with 300+300 vertices.
:m + Data.List Data.Char let max = 299 let len = 10 let alph = map chr [0 .. max] let xs = map (\i > replicate len (chr i) ++ [chr ((i + 1) `mod` (max + 1))]) [0 .. max] let ys = map (\i > concatMap (replicate len) (drop i alph ++ take i alph)) [0 .. max] :set +s SubSub.subsub xs ys
And here are the results that translate to points as shown above. To create the ranking, the MC had to make several tough decisions. In particular, the fairness award from exercise 2 helped Loris Reiff to be grouped together with Miro Haller's solution.
1  2  3  4  5  Sum  
Moritz Knüsel  23.98  22.92  5.54  4.55  4.59  61.58 
Gabriel Giamarchi  15.59  17.33  30.43  11.85  11.02  86.22 
Miro Haller  26.52  19.77  19.37  7.99  48.20  121.85 
Matthias Brun  19.97  18.03  21.97  8.17  57.71  125.85 
Börge Scheel  20.61  16.61  23.20  11.85  59.40  131.67 
Loris Reiff  19.91  17.90  31.74  17.74  58.60  145.89 
Simon Yuan  20.10  19.75  49.10  31.43  59.92  180.30 
We have not talked about the winner yet. Moritz Knüsel has implemented something halfway homebrewed. From his comments, he did not seem to believe that his solution will beat the commonly used package. Well, it seems that the dice of true randomness was friendly to Moritz. Anyway, here is his solution.
module SubSub (subsub) where import Data.List import Control.Arrow import Data.Char import qualified Data.IntMap.Lazy as M import qualified Data.IntSet as S { based on an algorithm for bipartite matching found at https://olympiad.cs.uct.ac.za/presentations/camp2_2017/bipartitematchingrobin.pdf unfortunately I couldn't come up with anything faster than Data.Graph.MaxBipartiteMatching (https://hackage.haskell.org/package/maxBipartiteMatching0.1.0.0/candidate/docs/DataGraphMaxBipartiteMatching.html) but here goes }  In order to speed up the subseq check, we build a map from characters to the  indices of these characters for every string in Y  Later, when we check a string from X, we can look up the position  of the next character we need subOf :: String > M.IntMap S.IntSet > Bool subOf xs y = aux xs y (Just (1))  aux takes as it's last argument the found position of the previous character where aux _ _ Nothing = False  prev char wasn't found aux [] _ _ = True  all characters found  get all indices of x in y, and then get the smallest one that's greater than  the index of the previous character. If x doesn't occur in y, we stop  immediately aux (x:xs) y (Just yidx) = maybe False (aux xs y . S.lookupGT yidx) (M.lookup (ord x) y)  turn a string into a char>indices map strToIdxMap :: String > M.IntMap S.IntSet strToIdxMap y = M.map S.fromList $ M.fromListWith (++) $ map (ord *** (:[])) $ zip y [0..]  define a new type to pass all the information and state around type Work = ( Int, Int, M.IntMap [Int], S.IntSet, M.IntMap Int)  accessors for work xn (n,_,_,_,_) = n yn (_,n,_,_,_) = n em (_,_,m,_,_) = m sn (_,_,_,s,_) = s ma (_,_,_,_,m) = m  do we already have a match for i, and if yes what is it? match :: Work > Int > Maybe Int match w i = M.lookup i (ma w)  have we already seen i? seen :: Work > Int > Bool seen w i = S.member i (sn w)  mark i as seen doSeen :: Work > Int > Work doSeen (a,b,c,sn,e) i = (a,b,c,(S.insert i sn),e)  match up i and j doMatch :: Work > Int > Int > Work doMatch (a,b,c,d,ma) i j = (a,b,c,d,(M.insert i j ma))  get all neighbors of i, if it has any edges :: Work > Int > Maybe [Int] edges w i = M.lookup i (em w)  empty the seen set seendel :: Work > Work seendel (a,b,c,_,e) = (a,b,c,S.empty,e)  checks if there is a match for i by looking for an unmatched neighbor, and if  the neighbor is already matched, we try to find another match for  said neighbors match  Tried to make the state management a bit neater, so instead of returning  a pair of the changed state and the actual result, we pass functions that  are to be applied to the changed state, depending on the result  Doesn't seem to matter performance wise, but doesn't seem to hurt either findMatchCont :: Int > Work > (Work > Bool) > (Work > Bool) > Bool  I'm pretty sure we can just return False here if a string has no superseqs in y findMatchCont i w tcont fcont = maybe False (\es > fmchelp es i w) $ edges w i where fmchelp (j:js) i w  we've seen j, check remaining neighbors  seen w j = fmchelp js i w  if j already has a match, try to find a match for j's match  if that search succeeds, this search also succeeds and matches j to i  if that search doesn't succeed, try the remaining neighbors of i  otherwise = let w' = doSeen w j in maybe (tcont (doMatch w' j i)) (\n > findMatchCont n w' (\z > tcont (doMatch z j i)) (fmchelp js i)) (match w' j)  no neighbors remaining, search has failed fmchelp _ i w = fcont w  find a match for all elements in x aux :: Int > Work > Bool aux n w  n == (xn w) = True  all matched, we're good  find a match for element n, if the search succeeds, find a match for (n+1), if not, return false  otherwise = findMatchCont n (seendel w) (aux (succ n)) (const False)  the hope here is that lazy evaluation helps us by not checking every  combination of x and y unless it's needed subsub :: [String] > [String] > Bool subsub xs ys = aux 0 (length xs, length ys, em, S.empty, M.empty) where em = M.fromList $ zip [0..] $ [[yidx  (yidx,y) < zip [0..] yl, subOf x y]  x<xs] where yl = map strToIdxMap ys
For comparison, here is Gabriel Giamarchi's solution. Sometimes a bit too wide (Gabriel must have a wide screen 🖥), but nicely commented. The MC specifically appreciates the commented Test.QuickCheck import, which indicates that QuickCheck has been used to test this solution.
module SubSub where import Test.QuickCheck import Data.List import Data.Map.Strict import Data.IntSet The problem has 2 main phases [1]. Identify which strings are subsequences of what strings and build a (bipartite) graph representing this relation. [2]. Decide if a matching with the cardinality of the first set of strings exists in this graph. The solutions I use are: [1]. We preprocess the second list of strings first to speedup the checks (see subsequencep and process) [2]. During the building of the graph we use Hall's marriage theorem partially to be able to say "false" quickly if we can (see buildGraph and  buildGraphHelper)  Then, when the graph is built, I use a slightly modified version of https://hackage.haskell.org/package/maxBipartiteMatching0.1.0.0 to check if  the matching exists. Since the package is on hackage I deemed it was okay to use it, but I hand included the code to prevent the imports and also  because I modified it very slightly for our purpose.  Using HopcroftKarp would be better and apparently there is an implementation: https://github.com/ivant/graphmatching  However, I didn't understand the code so I do not use it. Preprocessing we do on strings, used by our subsequence predicate It transforms a string into a map of char > IntSet where each char maps to the set of indices it appears at (starting at 1) This allows us to check if a string is a subsequence of another with complexity based mainly (not entirely) on the length of the first string. By using (++) and not (flip (++)) we are appending at the end of the list of length 1 (better !) but then we cannot use fromDistinctAscList... process str = Data.Map.Strict.map Data.IntSet.fromList (Data.Map.Strict.fromListWith (++) [(a,[b])  (a,b) < zip str [1..]]) Predicates that checks iif the string (c:cs) is a subsequence of x where x is a preprocessed string. Complexity should mainly depend on the length of the first string.  t is a helper variable used during the recursion, the function should begin with t=0  t indicates the index of the last character we used in the second string > we can only take characters after t subsequencep _ [] _ = True subsequencep t (c:cs) x t is the value we must be bigger than  Data.Map.Strict.null x = False we checked the case (c:cs) null before  otherwise = let indices = (Data.Map.Strict.lookup c x) in maybe False (\ind > maybe False (\b > subsequencep b cs x) (Data.IntSet.lookupGT t ind)) indices A recursive graphbuilding procedure with preprocessing. This build the graph representing the relation on strings "is included in". We will then search for a matching in this graph. The function first applies preprocessing to the entire second list of strings to speedup the subsequencep checks buildGraph :: ([Char]>p) > (Int>[Char]>[Char]>Bool) > [[Char]] > [[Char]] > g buildGraph proc pred strsLeft strsRight = let idxedProcStrsR = (Data.List.map proc strsRight) `zip` [1..] indexedStrsL = strsLeft `zip` [1..] in buildGraphHelper pred indexedStrsL idxedProcStrsR 0 Data.IntSet.empty We check during execution that the number of neighbors of all the elements computed so far (unioned into the set rset) are bigger or equal in number to the number of elements checked (lnum) > If not, hall's marriage theorem tells us that the matching we want is not possible. buildGraphHelper pred [] strsRight lnum rset = Just Data.Map.Strict.empty buildGraphHelper pred (sl:sls) strsRightProcessedIdxed lnum rset = let neighbors = (\(str,lidx) > (Data.List.map snd (Data.List.filter ((pred 0 str) . fst) strsRightProcessedIdxed))) sl newrset = (Data.IntSet.union rset (Data.IntSet.fromList neighbors)) newlnum = lnum + 1 in if (Data.List.null neighbors)  (newlnum > Data.IntSet.size newrset) then Partial application of HALLMARRIAGE theorem Nothing else maybe Nothing (\oldgraph > Just (Data.Map.Strict.insert (snd sl) neighbors oldgraph)) (buildGraphHelper pred sls strsRightProcessedIdxed newlnum newrset)  Main function subsub :: [String] > [String] > Bool subsub [] _ = True subsub strsL strsR = maybe False alphaMatchedPred (buildGraph process subsequencep strsL strsR)  Matching functions   > {  > Description : Maximum cardinality bipartite matching  > Copyright : © 2012 Stefan Klinger <http://stefanklinger.de/>  > License : GNU AGPL 3 <http://www.gnu.org/licenses/agpl3.0.html>  > Maintainer : http://stefanklinger.de  > Stability : experimental  >  > Find a maximum cardinality matching on a bipartite graph, using an  > augmenting path algorithm. More efficient than using MaxFlow from FGL  > with constant weight and additional source and sink nodes. But not as  > good as Hopcroft–Karp would be.  > } alphaMatchedPred semilist  Data.Map.Strict.null semilist = False  otherwise = (Data.Map.Strict.size $ matchingfromSemiAdjList semilist) == (Data.Map.Strict.size semilist) We do not need to convert to a 'fwd' mapping since we can directly build one in the buildGraph function ! matchingfromSemiAdjList semilist = opt (Data.Map.Strict.keys semilist, []) semilist Data.Map.Strict.empty Optimization func opt :: (Ord a, Ord b) =< ([a],[a]) > Data.Map.Strict.Map a [b] > Data.Map.Strict.Map b a > Data.Map.Strict.Map b a opt (x:free,failed) fwd mat = either (flip (opt (free,x:failed)) mat) (opt (free++failed,[]) fwd) $ right fwd [] x where right rem path x = maybe (Left rem) (left $ Data.Map.Strict.delete x rem) $ Data.Map.Strict.lookup x rem where left rem [] = Left rem left rem (y:ys) = maybe (Right $ Data.List.foldr (uncurry $ flip Data.Map.Strict.insert) mat path') (either (flip left ys) Right . right rem path') (Data.Map.Strict.lookup y mat) where path' = (x,y):path opt ([],failed) fwd mat = mat
Normally, Miro Haller's 🔝 solution would be shown here as well, for being quite unique. However, this blog has a strict line limit per week, which we are approaching. And we definitely want to fit the lessons of the week:
 You can use QuickCheck without having a master solution at hand. Please do use it!
 Polynomial beats exponential. And even if exponential algorithms are heuristically tweaked to handle special cases fast, the MC will find the worst case in which the exponential nature shows up.
 Even competition experts use accessors. (Seems like this should count as a correct answer for exercise 4.1 from now on.)
 The MC 🎲 makes sometimes random, unscientific decisions, which his alter ego may not approve. But he is the MC and this is his power. And who told you this would be a fair and scientifically correct competition anyway?
Task of week 5
The MC loves regular expressions. But sometimes he manages to insert sputid tysop in his text that cause a later regex search to fail. In this task, your goal is to develop a regex matcher that is immune against scrambled letters.
We consider the following algebraic datatype (wait for the lecture on Tuesday, where those will be introduced) of regular expressions that should be familiar from a Theoretical Computer Science course:
data RE = Zero  Zero matches nothing  One  One matches only the word []  Atom Char  Atom c matches only the word [c]  Plus RE RE  Plus r s matches all words matched by r or s  Concat RE RE  Concat r s matches all words w = u ++ v  such that u is matched by r and v is matched by s  Star RE  Star r matches [] and all words w = u ++ v  such that u is matched by r and v by Star r
Hence, for a standard definition of match :: RE > String > Bool
we would have:
match (Star (Atom 'a' `Concat` Atom 'b')) "" match (Star (Atom 'a' `Concat` Atom 'b')) "ab" match (Star (Atom 'a' `Concat` Atom 'b')) "abab" match (Star (Atom 'a' `Concat` Atom 'b')) "ababababab" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "a" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "abba" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "baba" not $ match (Star (Atom 'a' `Concat` Atom 'b')) "ababa"
However, here we are not interested in the standard match
, but rather in a dyslexicMatch :: RE > String > Bool
function that accepts a string s
exactly if there is a permutation t
of s
(of the same length, with the same multiplicities for duplicated characters) that would be accepted by the usual match
function.
For example, all the following should yield True
:
dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababababab" not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "a" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "abba"  consider "abab" dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "baba"  consider "abab" not $ dyslexicMatch (Star (Atom 'a' `Concat` Atom 'b')) "ababa"
Evaluation criterion: feficiency. Your solution will be tested with short and long strings and simple and deeply nested regular expressions.
Results of week 5
What a strange week! At first there were 15 submissions. In addition, an anonymous professor attempted a comeback, which was severely constrained by the fact that the professor could literally spend only 10 minutes on the task.
In the first round of QuickCheck testing something shocking happened: Only three solutions passed the tests. So for a change we will not start with a Hall of Fame.
Casualties of QuickCheck  

name  regex  string  wrong output 
anonymous professor  Star (Star One)  ""  ⊥

Roger Baumgartner  Concat (Star (Atom 'a')) (Star (Atom 'b'))  ""  False

Mathias Blarer  Star (Star (Atom 'a'))  "a"  ⊥ 
Mike Boss  Concat One One  ""  False

Matthias Brun  Plus (Star Zero) One  "a"  💥 
Gabriel Giamarchi  Star Zero  ""  False

Moritz Knüsel  Concat (Atom 'a' `Plus` Atom 'b') (Atom 'a')  ""  True

Silvan Niederer  Concat (Star Zero) (Star (Atom 'a'))  "a"  False

Loris Reiff  Concat One One  ""  False

Valentin Stoppiello  Star (Plus (Atom 'a') One)  "b"  ⊥ 
Jan Wiegner  Concat One One  ""  False

Patrick Ziegler  Star Zero  ""  False

Simon Yuan  Star (Star (Atom 'a' `Concat` Atom 'b') `Concat` Atom 'b')  "ab"  True

Above, ⊥ denotes nontermination (or at least "exceeding MC's patience") and 💥 denotes a "Nonexhaustive patterns in function glushkovVars" runtime failure.
Addendum: The MC was asked about the details of his QuickCheck tests. Indeed this is useful information that the MC should share.
When using QuickCheck there are two main questions to answer. How to generate arbitrary values of a type? And what properties to test? The latter is easy to answer if you have a master solution (or an unoptimized solution for which you are certain that it is correct) at hand. In this case, the MC was testing the property \r w > sol r w == submission r w
where sol
was his naive solution using derivatives (see below) and submission
is the solution under test. To test this property, QuickCheck must be able to generate random regular expressions and strings. For strings there is some standard setup; for our own data type RE
a manual setup is necessary.
Generation of random values in QuickCheck is based one the type class Arbitrary
. I.e., we need to make RE
an instance of Arbitrary
. Unfortunately, there is no builtin way in GHC to do this automatically via deriving
(as we can do for Eq
or Ord
). But of course there is a Hackage package to do this automatically. Here are the three magic lines to make the above test work.
{# LANGUAGE TemplateHaskell #} ... import Data.DeriveTH ... derive makeArbitrary ''RE
This will generate random regular expressions, presumably using the uniform distribution to sample the next constructors (and recursion for the recursive components). The MC used this Arbitrary
instance for the first test. The generation of the regular expression and the word are thereby completely independent from each other. As a consequence, most match problems generated that way should evaluate to False. This is not ideal (and very incomplete), but was sufficient to discover all the above bugs with one exception. The bug in Simon Yuan's program was found much later during the actual performance test (still with randomly generated regular expressions and words, but restricting the occurring items to come from a small alphabet).
To restrict the alphabet and avoid the Zero
constructor, the MC had used the following custom Arbitrary
instance in his later tests. The definition uses some combinators for generating random values from Test.QuickCheck
—the MC recommends to look at the Hackage documentation for sized
, frequency
, and elements
.
import Control.Monad import Test.QuickCheck ... instance Arbitrary RE where arbitrary = sized regexp regexp :: Int > Gen RE regexp 0 = frequency [ (1, return One) , (4, Atom `fmap` elements "abcde") ] regexp n = frequency [ (2, Plus `fmap` subexp `ap` subexp) , (2, Concat `fmap` subexp `ap` subexp) , (1, Star `fmap` regexp (n1)) ] where subexp = regexp ((n  1) `div` 2)
So it seems that only Kevin De Keyser, Miro Haller, and Börge Scheel are going to get any points in this exercise. However, the MC was in a good mood and decided to take a closer look at the wrong submissions. It turned out (or rather again: QuickCheck with a custom RE generator helped finding out), that three of the failing ones were correct if the input regular expression did not contain the Zero
constructor. For both Gabriel Giamarchi and Patrick Ziegler, the problems with Zero
came from the fact that they would rewrite Star Zero
to Zero
(whereas One
would be correct)—a quite common mistake that MC's alter ego has encountered while being a TA for introductory automata theory courses. Silvan Niederer's solution seems to suffer from similar problems, although the MC could not pinpoint the source of the problem as precisely. Matthias Brun performed this rewrite correctly. However, his function that would eliminate all Zero
s from an expression (or rewrite it to Zero
) turned out to be highly incomplete: He forgot some recursive calls. And the rest of his code relied on the fact that there are no Zero
s, which explains the 💥.
The MC decided to leave these incorrect submissions in the competition (without using Zero
s from now on). However, at the end there will be a severe penalty. In particular, no buggy solution should get more points than a correct solution, no matter how well it performs in terms of efficiency.
How to rank the different submissions? At first, the MC tried what he usually does: picking handselected examples to see which solutions time out and which don't. After doing this for several rounds, the MC noticed that some solutions perform better on long words and simple expressions, while others can handle complex expressions for shorter words. It was hard to differentiate between those two categories.
This is why the MC decided to do something more automatic, but also more random. He used QuickCheck's random generator machinery to generate 10 random regular expressions of sizes 5, 10, and 30 (over the alphabet "abcde") and random words of length 10 and 1000 (over all different nonempty subsets of the alphabet "abcdf"). For each generated regexword pair, the MC ran all 8 (including his own) submissions, with a tight timeout of 1 second. (I.e., longrunning executions were not penalized to much, which may be not ideal. But the MC didn't have a cluster to perform the evaluation at his disposal, so it had to work within 1h on MC's laptop.)
And here are the results with numbers indicating how many seconds each submission needed to process (or timeout on) all the examples:
29.909823999999986 Gabriel Giamarchi 57.285384999999906 Matthias Brun 134.8980010000001 Patrick Ziegler 157.62982699999995 MC 310.188045 Silvan Niederer 340.03123100000005 Börge Scheel 779.5406349999998 Kevin De Keyser 1962.9649770000012 Miro Haller
Since this is based on random expressions, the MC rerun this evaluation several times. The individual times would vary from time to time, but the trends remained the same. The above results translate to the following ranking of the week taking the Zero
bugs into account. (We put the 💥 next to Matthias Brun to remind us that bugs can happen even to the competition's top performers.) Funnily, the solutions that worked correctly with Zero
s are the worse ones in terms of performance (yet they get the most points). Of course, this is just a coincidence: the speedup was certainly not achieved by treating Zero incorrectly.
Top 10 (naja) of week 5  

#  name  points  
1.  Börge Scheel  10  
2.  Kevin De Keyser  9  
3.  Miro Haller  8  
4.  Gabriel Giamarchi  7  
5.  Matthias Brun 💥  6  
6.  Patrick Ziegler  5  
7.  Silvan Niederer  4 
The MC loves derivatives of regular expressions. If you sneak around at his (alter ego's) webpage, you will find many of his papers that use these derivatives in matching and other algorithms. A derivative of a regular expression with respect to a alphabet letter c
computes the regular expression that what remains to be matched after reading c
. To go dyslexic, we can use the following version of the derivative, that reads (or consumes) a letter c
occurring at any position in the expression.
der :: RE > Char > RE der Zero _ = Zero der One _ = Zero der (Atom a) c = if a == c then One else Zero der (Plus r s) c = der r c `Plus` der s c der (Concat r s) c = (der r c `Concat` s) `Plus` (r `Concat` der s c) der (Star r) c = der r c `Concat` Star rMoreover, we can easily decide whether a regular expression accepts the empty string:
eps :: RE > Bool eps Zero = False eps One = True eps (Atom a) = False eps (Plus r s) = eps r  eps s eps (Concat r s) = eps r && eps s eps (Star r) = TrueThose two function are enough to obtain a nice but inefficient version of
dyslexicMatch
:
dyslexicMatch :: RE > String > Bool dyslexicMatch r = eps . foldl der rThe actual solution by the MC, is a bit smarter. It eagerly simplifies the expressions (in particular exploiting that for our task Concat can be considered being commutative) and memoizes the
der
function (using some TemplateHaskell black magic to make the code concise). Moreover, it splits the result of the derivative accross the Plus
constructor into a set of expressions. If you think of der :: RE > Char > RE
as symbolically computing the next state of a deterministic automaton whose states are labeled with regular expression, you can think of the version that splits the Plus
constructor as computing the next state of a nondeterministic automaton.
{# LANGUAGE TemplateHaskell #} module DyslexicRegex where import Prelude hiding (concat) import qualified Data.Set as S import Data.Function.Memoize import Data.DeriveTH data RE = Zero  One  Atom Char  Plus RE RE  Concat RE RE  Star RE deriving (Show, Read, Eq, Ord) deriveMemoizable ''RE eps Zero = False eps One = True eps (Atom a) = False eps (Plus r s) = eps r  eps s eps (Concat r s) = eps r && eps s eps (Star r) = True plus r Zero = r plus Zero r = r plus (Plus r s) t = plus r (plus s t) plus r (Plus s t) = if r == s then Plus s t else if s < r then plus s (plus r t) else Plus r (Plus s t) plus r s = if r == s then r else if s < r then Plus s r else Plus r s concat r Zero = Zero concat Zero r = Zero concat r One = r concat One r = r concat (Concat r s) t = concat r (concat s t) concat r (Concat s t) = if s < r then concat s (concat r t) else Concat r (Concat s t) concat r s = if s < r then Concat s r else Concat r s star Zero = One star One = One star (Star r) = Star r star r = Star r norm (Plus r s) = norm r `plus` norm s norm (Concat r s) = norm r `concat` norm s norm (Star r) = star $ norm r norm r = r pder = memoFix2 (\d r a > case r of Zero > S.empty One > S.empty Atom b > if a == b then S.singleton One else S.empty Plus r s > d r a `S.union` d s a Concat r s > S.map (`concat` s) (d r a) `S.union` S.map (r `concat`) (d s a) Star r > S.map (`concat` Star r) (d r a)) epss = not . S.null . S.filter eps pders rs a = S.foldl S.union S.empty (S.map (\r > pder r a) rs) dyslexicMatch :: RE > String > Bool dyslexicMatch r = epss . Prelude.foldl pders (S.singleton $ norm r)Now, let us look at the best performing solution by Gabriel Giamarchi, who doesn't sound too optimistic (or to say it positively sees much room for improvement).
module DyslexicRegex where { This program is probably horribly inefficient, but it *should* be correct. I didn't have time to finish it, but the idea was as follows: Considering permutations of strings means only the number of occurences of each char has importance > This creates surprising syntactical properties of regexes, for instance: (notation (Atom 'a') := a, Concat written with . Plus with + and Star with *) (a+b)* = a*.b* a.(a+b) = a.a+a.b and so on The idea was to use these rules to expand (syntactically) the regex into something easier to manager in the form: r1+r2+...+r_n where r_i has the form (abcd)*.(abd)*.(xag)*...(zaggq)*.(abcd) Each r_i would then be a knapsack problem but hopefully easier to solve than the whole thing, especially if we could memoize/remove duplicate r_is This form for regexes is referred in the following code as normalform_a (I had originally intended normalforms_b,c and so on :) ) There are a LOT of problems in the following code; Nothing is optimized and the program is too slow on big regexes. Also the solution of the knapsack problem is the trivial solution. } import Data.List import Data.Set import Data.Maybe import Data.Map.Strict as M  import Control.Monad  import Test.QuickCheck  You may add 'deriving' clauses but should not modify the RE type otherwise data RE = Zero  Zero matches nothing  One  One matches only the word []  Atom Char  Atom c matches only the word [c]  Plus RE RE  Plus r s matches all words matched by r or s  Concat RE RE  Concat r s matches all words w = u ++ v  such that u is matched by r and v is matched by s  Star RE  Star r matches [] and all words w = u ++ v  such that u is matched by r and v by Star r deriving (Show,Eq) An improved type for our normal form data NFARE = NFAZero  Zero matches nothing  NFAAtom (Set [Char],[Char])  Atom c matches only the word [c]  NFAPlus NFARE NFARE  Plus r s matches all words matched by r or s  NFAConcat NFARE NFARE  Concat r s matches all words w = u ++ v  NFAStar NFARE deriving (Eq, Ord) instance Show NFARE where show = pprint2 Converts a list to a map of elems > num occurences listToMap s = M.fromListWith (+) (zip s [1,1..1]) Checks that all the keys in the list are in the map and remove them from the map If this is not possible, then return Nothing TODO: Make this more efficient mapdifference map [] = Just map mapdifference map (x:xs) = if (M.member x map) then let v = fromJust (M.lookup x map) in if v == 0 then Nothing else mapdifference (M.insert x (v1) map) xs else Nothing Given a tuple derived from an atom in nfa (via toCondList) and an occurence map, returns True or False if it matches We basically need to solve a knapsack problem in the first argument matchAtomicCond _ Nothing = False In this case the string represented by occmap must be empty matchAtomicCond ([],[]) (Just occmap) = Data.List.sum (M.elems occmap) == 0 matchAtomicCond ([], necessaryLetters) (Just occmap) = matchAtomicCond ([],[]) (mapdifference occmap necessaryLetters) matchAtomicCond (scond:sconds,necessaryLetters) (Just occmap)  Data.List.null necessaryLetters = (matchAtomicCond (scond:sconds,[]) (mapdifference occmap scond))  (matchAtomicCond (sconds,[]) (Just occmap)) Knapsack problem  otherwise = If there are both necessary letters and star conditions, we need to remove the necessary ones first matchAtomicCond (scond:sconds,[]) (mapdifference occmap necessaryLetters) toNfare :: RE > NFARE toNfare Zero = NFAZero toNfare One = NFAAtom (Data.Set.empty,[]) toNfare (Atom c) = NFAAtom (Data.Set.empty,[c]) toNfare (Plus r s) = (NFAPlus (toNfare r) (toNfare s)) toNfare (Concat r s) = (NFAConcat (toNfare r) (toNfare s)) toNfare (Star r) = (NFAStar (toNfare r)) pprint :: RE > String pprint (Zero) = "0" pprint (Plus x y) = "(" ++ (pprint x) ++ "+" ++ (pprint y) ++ ")" pprint (Concat x y) = (pprint x) ++ "" ++ (pprint y) pprint (Star x) = "(" ++ (pprint x) ++ ")" ++ "*" pprint (Atom c) = show c pprint2 :: NFARE > String pprint2 (NFAZero) = "0" pprint2 (NFAPlus x y) = "(" ++ (pprint2 x) ++ "+" ++ (pprint2 y) ++ ")" pprint2 (NFAConcat x y) = (pprint2 x) ++ "" ++ (pprint2 y) pprint2 (NFAStar x) = "(" ++ (pprint2 x) ++ ")" ++ "*" pprint2 (NFAAtom c) = show c A ``normal'' regex matching algorithm. match :: RE > String > Bool match Zero _ = False match One w = Data.List.null w match (Atom c) w = w == show c match (Plus r s) w = (match r w)  (match s w) match (Concat r s) w = any (\(x,y) > (match r x) && (match s y)) $ zip (inits w) (tails w) match (Star r) [] = True match (Star r) [c] = (match r [c]) The next case bugs for one letter words match (Star r) w = any (\(x,y) > (match r x) && (match (Star r) y)) $ tail $ zip (inits w) (tails w) match (Star r) w = True A bruteforce dyslexic matching algorithm, used to test correctness. bruteDyslexicMatch :: RE > SafeString > Bool bruteDyslexicMatch r ws = let w = show ws in any (match r) $ permutations w NormalformA is r1+r2+...+r_n where r_i has the form (abcd)*(xyz) [NO + in r_i] If re is of the form (Op re1 re2) where re1 and re2 are already in normalform > Outputs the normalform of re normalforma2 :: NFARE > NFARE normalforma2 (NFAPlus NFAZero y) = y Zero neutral for plus normalforma2 (NFAPlus y NFAZero) = y Zero neutral for plus normalforma2 (NFAPlus x y) = (removeNFADuplicates (NFAPlus x y)) We want all plusses at the top level normalforma2 (NFAStar NFAZero) = NFAZero star is identity on zero normalforma2 (NFAStar (NFAPlus x y)) = Distributivity of star for plus normalforma2 (NFAConcat (normalforma2 (NFAStar x)) (normalforma2 (NFAStar y))) normalforma2 (NFAStar (NFAAtom (as,an))) = if (Data.List.null an) then (NFAAtom (as,[])) else if (Data.Set.null as) then (NFAAtom (Data.Set.insert an as,[])) else (NFAPlus (NFAAtom (Data.Set.insert an as, an)) (NFAAtom (Data.Set.empty,[]))) normalforma2 (NFAConcat NFAZero x) = NFAZero Zero in ``multiplication`` normalforma2 (NFAConcat x NFAZero) = NFAZero Zero in ``multiplication`` normalforma2 (NFAConcat (NFAAtom (as,an)) (NFAAtom (bs,bn))) = (NFAAtom (Data.Set.union as bs,an++bn)) Property of concat on nfaatoms normalforma2 (NFAConcat x (NFAPlus y z)) = NFAConcat distributivity left (and commutativity already used) (NFAPlus (normalforma2 (NFAConcat x y)) (normalforma2 (NFAConcat x z))) normalforma2 (NFAConcat (NFAPlus y z) x) = NFAConcat distributivity right (NFAPlus (normalforma2 (NFAConcat y x)) (normalforma2 (NFAConcat z x))) normalforma2 (NFAConcat x (NFAStar y)) = (NFAConcat (NFAStar y) x) Commutativity of concat normalforma2 x = x Otherwise Two horribly slow function to remove duplicates :/ removeNFADuplicates (NFAZero) = (NFAZero) removeNFADuplicates (NFAAtom a) = (NFAAtom a) removeNFADuplicates plustree = Data.Set.foldr (\x re > if re == (NFAZero) then x else (NFAPlus x re)) (NFAZero) (plustreeToSet plustree) plustreeToSet (NFAPlus (NFAAtom a) (NFAAtom b)) = Data.Set.fromList [(NFAAtom a), (NFAAtom b)] plustreeToSet (NFAPlus (NFAAtom a) (NFAPlus x y)) = Data.Set.insert (NFAAtom a) (plustreeToSet (NFAPlus x y)) plustreeToSet (NFAPlus (NFAPlus x y) (NFAAtom a)) = Data.Set.insert (NFAAtom a) (plustreeToSet (NFAPlus x y)) plustreeToSet (NFAPlus (NFAPlus x y) (NFAPlus w z)) = Data.Set.union (plustreeToSet (NFAPlus x y)) (plustreeToSet (NFAPlus w z)) Computes the normalform_a of the regex Now we need to basically compute a tree fold using normalforma2 tonfa2 :: NFARE > NFARE tonfa2 (NFAPlus x y) = normalforma2 (NFAPlus (tonfa2 x) (tonfa2 y)) tonfa2 (NFAConcat x y) = normalforma2 (NFAConcat (tonfa2 x) (tonfa2 y)) tonfa2 (NFAStar x) = normalforma2 (NFAStar (tonfa2 x)) tonfa2 x = normalforma2 x Once the normalform_a of the regex has been computed, we convert it to a list of tuples of the form ([[starcond],[...]],[necessary]) Then, we will know that the regex matches iif one of these matches, and each subproblem is basically a knapsack problem which we solve by linprogramming toCondList :: NFARE > [...] toCondList (NFAPlus x y) = (toCondList x) ++ (toCondList y) toCondList (NFAAtom (starset, necessaryset)) = [(Data.Set.toList starset, necessaryset)] toCondList (NFAZero) = [] toCondList x = undefined We should NEVER need to go there Main function dyslexicMatch re str = let occmap = (listToMap $ str) in Data.List.any (\x > matchAtomicCond x (Just occmap)) (toCondList (removeNFADuplicates (tonfa2 $ toNfare re)))
Besides of the buggy part normalforma2 (NFAStar NFAZero) = NFAZero
, it rewrites expressions in an interesting way, exploiting our particular setting in this exercise. The MC believes, that this was the key to victory, as most "random" expressions would be rewritten to something trivial (or very simple at least).
Finally, Matthias Brun's solution is quite nice, too. It uses Glushkov automata, a.k.a. positions automata, which are related to the approach in MC's favorite paper on regex matching. If you don't know anything about those automata, skip the below code first, but read the linked paper! The MC assumes, that the automaton gets large for some of the random expressions. This solution would also benefit from some sophisticated rewriting that Gabriel Giamarchi has performed.
module DyslexicRegex where import Data.List (sort, group, groupBy, sortBy) import Data.List.Extra (nubOrd) import Data.Maybe (fromJust) import Data.Char (ord, chr) import Data.Sequence (Seq, (>)) import qualified Data.Sequence as S import Data.Vector (Vector, (!)) import qualified Data.Vector as V import Data.IntMap.Strict (IntMap) import qualified Data.IntMap.Strict as IM import Control.Arrow ((&&&), first)  The idea behind my solution is to build a Glushkov automaton, as described on pages 105ff of  http://nlp.cs.swarthmore.edu/~richardw/cs93f07/flexiblePatternMatchingInStrings.pdf.  (My code is largely based on that chapter but has some significant differences.)  With this automaton I can efficiently [O(n^3)] match regular expressions.  For matching all permutations of a string one notices that this is equivalent to matching all the  input characters but in any arbitrary order. So rather than computing the permutations and  attempting to match them I just keep track of what characters are left and how many of them. During  the state transitions I then check all transitions (which read an available input character) rather  than just the one that would come next in the original input.  I haven't commented my code too extensively but I think it should be fairly understandable, if one  knows how Glushkov automata work (which, presumably, the MC does) or has glanced over the  book I mentioned. (Link)  There are still some optimizations I didn't get around to implementing but it's Saturday 23:24 now,  so that's that I guess. type State = Int type Input = IntMap Int type First = [State] type Last = [State] type Empty = Bool type Delta = Seq [State]  StatesF provides a mapping from state to what atom it represents  States are first accumulated in a list (using cons) and then converted to a vector for O(1) access type States = [Int] type StatesF = IntMap Int data RE = Zero  Matches nothing  One  Matches only the word []  Atom Char  Atom c matches only [c]  Plus RE RE  Plus r s matches all words matched by r or s  Concat RE RE  Concat r s matches all words w = u ++ v such that u is matched by r and v is matched by s  Star RE  Star r matches [] and all words w = u ++ v such that u is matched by r and v by Star r deriving (Eq, Show)  Any regular expression containing zeroes can either be rewritten as one without zeroes or as just Zero.  That's what `purgeZero` does. purgeZero :: RE > RE  Plus Zero Zero matches nothing purgeZero (Plus Zero Zero) = Zero  Plus left Zero matches only left purgeZero (Plus left Zero) = left  Symmetric case purgeZero (Plus Zero right) = right  Concat with a Zero matches nothing purgeZero (Concat Zero _) = Zero purgeZero (Concat _ Zero) = Zero  Star Zero only matches [] purgeZero (Star Zero) = One purgeZero (Plus left right)  left' == Zero && right' == Zero = Zero  left' == Zero = right'  right' == Zero = left' where left' = purgeZero left right' = purgeZero right purgeZero (Concat left right)  left' == Zero = Zero  right' == Zero = Zero  otherwise = Concat left' right' where left' = purgeZero left right' = purgeZero right purgeZero (Star r)  r' == Zero = One  otherwise = Star r' where r' = purgeZero r purgeZero x = x  The getInput function transforms the input into an IntMap of characters with an associated count of  how many times they can still be matched. This essentially provides an input string but in a format  that is more useful for my algorithm. getInput :: [Char] > Input getInput = IM.fromList . map (ord . head &&& length) . group . sort  A helper function to decrement the count of a character in the input and delete it if there was only  one left. decAndDelIfLast :: Int > Input > Maybe Input decAndDelIfLast key input  val == Just 1 = Just $ IM.delete key input  val == Nothing = Nothing  otherwise = Just $ IM.adjust pred key input where val = IM.lookup key input  merge is a small helper function that merges (wow!) two lists and removes duplicates.  (i.e. basically set union but I think sets would have been overkill here) merge :: Ord a => [a] > [a] > [a] merge xs ys = nubOrd $ xs ++ ys  glushkovVars is a translation of the function with the same name in the book that I mentioned. Since  the book provides an imperative implementation, this is somewhat reflected in the way this is  structured. I'm sure it could be done more elegantly.  The function provides the states, the delta function and the final states. It also tells us whether  the regex accepts the empty word. glushkovVars :: RE > (Delta, Int, States) > (First, Last, Empty, Delta, Int, States) glushkovVars (Plus l r) (delta, pos, states) = (first' `merge` first'', last' `merge` last'', empty'  empty'', delta'', pos'', states'') where (first', last', empty', delta', pos', states') = glushkovVars l (delta, pos, states) (first'', last'', empty'', delta'', pos'', states'') = glushkovVars r (delta', pos', states') glushkovVars (Concat l r) (delta, pos, states) = (firstF, lastF, empty' && empty'', deltaF, pos'', states'') where (first', last', empty', delta', pos', states') = glushkovVars l (delta, pos, states) (first'', last'', empty'', delta'', pos'', states'') = glushkovVars r (delta', pos', states') firstF = if empty' then first' `merge` first'' else first' lastF = if empty'' then last' `merge` last'' else last'' deltaF = foldr (S.adjust' (merge first'')) delta'' last' glushkovVars (Star r) (delta, pos, states) = (first', last', True, deltaF, pos', states') where (first', last', empty', delta', pos', states') = glushkovVars r (delta, pos, states) deltaF = foldr (S.adjust' (merge first')) delta' last' glushkovVars One (delta, pos, states) = ([], [], True, delta, pos, states) glushkovVars (Atom c) (delta, pos, states) = ([pos'], [pos'], False, delta > [], pos', (ord c):states) where pos' = pos + 1  A small wrapper to make glushkovVars' easier to call. glushkovVars' :: RE > (Last, Delta, Empty, States) glushkovVars' r = (last, S.update 0 first delta, empty, states) where (first, last, empty, delta, pos, states) = glushkovVars r (S.singleton [], 0, [0])  The actual match function uses glushkovVars' to generate the automaton and then goes through all the  state transitions to check whether or not the input is accepted. dyslexicMatch :: RE > String > Bool dyslexicMatch r s = if r' == Zero then False else match' (getInput s) [0] where r' = purgeZero r (last, delta, empty, states) = glushkovVars' r' states' = V.fromList . reverse $ states match' :: Input > [State] > Bool match' i [] = (not . IM.null) i match' i ys  IM.null i && ys == [0] = empty  IM.null i = any (`elem` last) ys  otherwise = any (uncurry match') $ newstates where newstates = map (first fromJust) . filter ((/= Nothing) . fst) . map ((\(x:_) > decAndDelIfLast (states' ! x) i) &&& id) . groupBy stateeq . sortBy statecmp $ reachable reachable = (nubOrd . concatMap (delta `S.index`)) ys stateeq = \a b > (states' ! a) == (states' ! b) statecmp = \a b > (states' ! a) `compare` (states' ! b)
Lessons of the week:
 As the tasks get more difficult, correctness gets more rewarding (in terms of competition points).
 We should make QuickCheck part of the FMFP curriculum, to ensure that more people use it.
 Different solutions optimize different aspects of a problem. Thus, it is difficult to compare them with respect to efficiency in general.
Task of week 6
This task is rather different from the ones before: implement an AI for paper soccer. There is no Code Expert project this time. Instead, please submit your solutions (with the a extended Easter deadline on Monday, 2nd April, 23:59) by email to the MC. Please use "FP Competition Paper Soccer" as the title of your email.
Paper soccer is a twoplayer strategy game in which unsurprisingly the goal is to score a goal. A description follows below. The MC recommends to also read the Wikipedia article and actually try out the game before jumping on the AI. Our game will take place on a field of the below shape and size. Player 1 tries to score in the right goal, whereas player 2 in the left one.
A turn of a player consists of moving the ball (black dot) to one of the neighboring 8 grid points (including diagonal moves). The (possibly diagonal) edge along which a move took place is marked persistently. It is prohibited to move the ball along marked edges a second time. The player do not strictly alternate in their turns. Instead, the turn changes only if the ball moves to a previously unvisited grid point. In other words, a player continues, if (s)he moves the ball to an already visited grid point (i.e., one with an adjacent marked edge). Such "bounces" also apply to the border (i.e., we count the border grid points as being visited by default).
There are several winning conditions for the players. To win, you have to either score a goal, force the opponent to score an own goal, or force the opponent into a dead end. Moreover, each player has only 5 minutes for the game in total.
We model the game in the module Soccer. You might need to install a few Hackage packages to load the module. The instructions are in the source. The state (State
) of the game is the position (Pos=(Int,Int)
) of the ball together with an assignment of marked adjacent edges for each grid point. We represent the assignment using a matrix indexed by positions ((0,0)
is the top left corner of the field; (5,4)
is the initial position of the ball; the actual goals are nor part of the matrix) that stores an 8 bit word for each grid point. Each bit corresponds to a direction (Dir
)—the conversion between directions and bit indices can be done using fromEnum
and toEnum
. If a bit is set for a grid point, then the edge going to the corresponding direction is marked. Note how the borders are modeled as preset words. The size parameters (xMax
and yMax
) set in this module are fixed and will not change for the competition.
An AI for paper soccer is a pure function of type step: Double > State > m > (m, Dir)
that inputs the remaining time (in seconds), a state, and some polymorphic memory slot, which you may instantiate with any type you want and use it to store reusable information. The function should return the direction in which the ball should move next together with an update of the memory. Be careful to produce only valid moves (producing invalid ones is an additional losing condition).
A player (Player
) is then represented by such a function extended with a name (please use your name as in "<FirstnameLastname>"
) and an initial memory content. The idea is that the evaluating the initial memory is instant. (This is not yet enforced by the module, but please do not pack a big table or the like there right from start in order to survive potential updates of the Soccer module.)
As examples, the MC has prepared a few dumb AIs, that don't even try to think strategically. You certainly can do better than that. Please submit a single file called Player<FirstnameLastname>.hs
which implements two Player
s: player1
that targets the right goal (and has the first move) and player2
that targets the left goal (and goes second). In principle, you may use the same strategy for player 2 as for player 1, by using the inversePlayer
function (which rotates the field, computes a direction and inverts it again). However, you may also use different strategies for the two players.
The submissions will be evaluated in a tournament, similarly to the one shown in the Main module.
Oh, and graphical visualizations of the state that are nicer than the boxdrawing ASCII art of the MC are very welcome contributions, too. The MC is not sure, if they would yield any competition points, though (depends on the beauty).
Results of week 6
Let the games begin! We have 11 student contestants. The MC did not participate this time: he decided to spend his energy on carrying out this evaluation, which includes some timeconsumingtogenerate drawings.
The MC has decided for a leaguelike tournament: everybody plays 21 matches (11 as the first player, 11 as the second player, including one match against oneself). The final states of all the 121 games are visualized in the below table. The rows are headlined by the first players, the columns by the second ones. A black border around a picture means that the second player has won that game. Otherwise, the first player is the winner. The best players are thus those that have many entries without a border in a row, and many entries with a border in a column.
Each picture is clickable and opens a gif (produced by a modified Soccer.hs) that replays the match. The replays are not in real time: if the turn changes, there is a 1 second pause; otherwise the ball moves every 0.2 seconds independently how long the contestant's AI was "thinking" to produce that move. The green bar highlights the player to move. Also the contestants' initials are clickable and bring up the (as usual welldocumented) implementations.
Alternatively to clicking through 121 gifs, you can also go to this webpage and watch all the games simultaneously. For this you will need a big screen (we know that Gabriel 🖥 has one) and a good internet connection (as it will load >200 MB of gifs).
BS  GG  LR  MBl  MBr  MHa  MHo  MK  PZ  SN  SY  
BS  
GG  
LR  
MBl  
MBr  
MHa  
MHo  
MK  
PZ  
SN  
SY 
We have a clear winner! In total, we obtain the following scores (=number of won games) and ranking. The MC had a nice Easter break and decided to award points to all 11 contestants.
Top 11 of week 6  

#  name  ranking (1st/2nd/total)  points 
1.  Gabriel Giamarchi  11/10/21  11 
2.  Matthias Brun  9/10/19  10/2=5 
3.  Miklós Horváth  9/8/17  9 
4.  Silvan Niederer  7/7/14  8/2=4 
5.  Miro Haller  6/6/12  7 
6.  Börge Scheel  5/6/11  6 
7.  Mathias Blarer  5/4/9  5/2=2.5 
8.  Simon Yuan  2/4/6  4 
9.  Moritz Knüsel  1/4/5  3 
10.  Loris Reiff  2/2/4  2 
11.  Patrick Ziegler  1/2/3  1/2=0.5 
We see four "/2" punishments. In three instances, the AI was sometimes (roughly: for half of the games) failing with an index outofbounds runtime error. Two of those were invoking MC's move function without respecting its (unwritten) preconditions: the ball may not leave field (and scoring a goal counts as leaving the field although the direction of the move is valid). The third one was an even more blatant array indexing error. The above sources are free of those errors, because the EasterMC was very kind to fix them (and leave the contestants in the competition).
The fourth punishment is for Mathias Blarer's solution, which would at first time out in every game. Mathias solution would adjust the search depth based on the remaining time. Apparently, Mathias messed up the function that computes the adjustment. The above source is his second version (that arrived after the deadline and after the MC asked about the timeouts), which now times out only for half of the games.
To play the tournament, the MC at first used a high degree of parallelization. All games were then evaluated in about 5 minutes (many games didn't take much time to play due to simplistic AI's). However, the MC's simplistic time measurement using the timeit package was not functioning accurately. In the end, the MC discarded the results of the parallel games, in favor of a single threaded evaluation (which took a few hours, but did not suffer from spurious timeouts and other glitches).
Interestingly, most solutions did not take the time parameter of the step function into account. These solutions behaved deterministically (and independently of the number of threads). A few others (e.g., Gabriel or Mathias), however, have based their search depth on the time parameter which resulted in (mildly) pseudorandom behavior. The whole situation has reminded the MC of the following anecdote about the Vampire prover for firstorder logic. Vampire takes part in a competition in which provers have a fixed amount of time (say 5 minutes) to solve a particular problem. The organizer of the competition was in the process of determining a suitable timeout. He used Vampire on a particular problem with a timeout of 5 minutes and Vampire solved the problem in 4:59. Next, he tried a timeout of 3 minutes and Vampire solved it in 2:59. This went on and on until Vampire was able to solve the problem within seconds with a timeout of a few seconds. How could this happen? Vampire was aware of the timeout and adjusted its strategies (or heuristics) for solving the problem accordingly. Only the very last strategy tried was successful for the problem at hand. And it was typically tried only in the last few seconds.
Finally, a few words about the techniques used: minimax, alphabeta pruning, and game tree algorithms were recurring themes in the better performing solutions. All these generic, advanced (but not too difficult to implement) techniques crucially rely on a good cost functions for the game's state. Moritz Knüsel's solution didn't have a good cost function and performed badly despite using such techniques (implemented in the Hackage package gametree). Oh, and Moritz has used TABs in his solution. Probably that's another source of problems. In turn, Gabriel seems to have found a very good cost function and the MC sees no TABs.
Any lessons? Sure!
 Measuring time in a multithreaded environment is not easy.
 Time is a fairly good (for the purpose to have an AI that behaves differently) source of pseudorandomness.
 An AI for a game must know the game's good and bad states (as precisely as possible).
 Do not use TABs!
Task of week 7
The following is the final task of the competition. The submission deadline is Sunday, 8th April, 23:59. The results of this task will be announced (and followed by the award ceremony) in the last lecture of the FP part of the course on Tuesday, 10th April.
The actual task is very simple: Be original!
You are required to submit by email to the MC a picture with any content you want (in one of the following formats: jpg, png, svg, pdf, gif) and a Haskell program (a single file: Art.hs) that was used to generate the picture. Please use "FP Competition Art" as the title of your email. Here are some examples of graphics libraries that you are encouraged to use (others are welcome, too).
The pictures will be evaluated with respect to the aesthetics of the artwork (by an independent but highly subjective jury) and the aesthetics of the code (by the independent but highly subjective MC).
Hint: Such an artistic task is a traditional ending of a competition. You can look at previous year's submissions for some inspiration. Various fractals have been fairly popular among the past submissions. If done properly (i.e., beautifully), they will guarantee you a secure place somewhere in the middle. But typically, to win this task one has to create something more original and surprising.