Postgres SQL Lessons From Advent of Code Challenges
Disclaimer: We did not complete all 25 days in SQL (judging by the links from this HN thread, it looks like pretty much no one did), but we still think the things we learned about SQL are useful and worth sharing, especially for non-experts. If you’re an expert, you may not learn anything here.
Window function ranges are awesome
First, a quick refresher on window functions: you probably know that window functions perform calculations across a set of table rows that are somehow related to the current row. The example from the Postgres docs is calculating the department-specific average salary for a particular employee. The SQL looks like this:
In the above example, the “window” of rows considered are all rows whose department matches the department of the current row, but while working on the challenges, we learned that it’s also possible to define windows using ranges.
For example, on day 1, we’re asked to count the number of times the sum of values in a sliding window of three elements increases. You can express a sum of a sliding window of three elements using window function ranges. The SQL looks like this:
And the overall solution looks like this:
Ranges are just the tip of the iceberg. There are many other methods of defining windows. Check out this page of the Postgres docs for more.
Common Table Expressions for readability and iteration
Common Table Expressions or CTE's are temporary, scoped, and aliased tables created via a simple query expression. Subqueries can achieve the same result in most cases, and although subqueries are more common in textbooks and database documentation than CTEs, we found that using CTE's instead of subqueries eliminated a great deal of nesting.
There are plenty of resources that explain why deeply nested code is a code smell. Nesting can quickly become difficult to reason about. Following the flow of execution adds additional cognitive load when the reader has to jump around to get context. CTEs are sequential, and it's easier for us humans to follow sequential steps.
Another advantage of using CTEs is that they can refer to their own output if you use the
RECURSIVE keyword. The docs recommend using recursive CTEs to work with arbitrarily nested hierarchical data, but they can also be used for iteration.
We learned how to use CTEs for iteration while working on day 3. In the second part, we’re asked to:
Start with a list of binary numbers and consider just the first bit of those numbers.
Discard numbers whose first bit does not match the most common bit in the first-bit position of all remaining numbers
If you only have one number left, stop; this is the value you’re looking for
Otherwise, repeat the process, considering the next bit to the right.
The SQL for this is below. The interesting part where we use a recursive CTE is lines 14 - 40:
There are plenty of good explanations of how recursive CTEs work out there, so we won’t repeat those explanations here.
How to think relationally instead of iteratively
On day 4, we’re given a list of numbers and a list of bingo boards and we’re asked to find which bingo board wins first. When we started writing the SQL portion, we assumed that we would need to use a recursive CTE so that we could iterate over the boards, but the exercise of solving the problem in SQL helped us see the problem through a new lens. We learned that just because there's an iterative aspect in the problem statement doesn't mean we can't abstract it away.
Breaking down the problem into smaller pieces allowed us to see another way. We needed to find the first winning board, but what does that mean? Well, first, we need to think about how a board wins. A board wins when all the numbers in a column or row have been called. So, we need to find out when each column and row won.
Breaking it down in this way allowed us to see that it was a sorting problem. We already knew the order the numbers were called in. You can use the number order to assign a “time” to each column and row that represents when that column or row won. Then we just find the min of that time for each board. Once you have that, you know when the board won. Then you sort the boards by when they won and you have your answer!
Translating this to SQL looks like this:
This exercise was a great way to add additional tools to our algorithmic toolboxes.
SQL is surprisingly compact
And here’s the same solution in SQL:
We had another goal in writing this post. We’re hiring and these learnings provide a nice glimpse into the engineering culture at Heap. We use a lot of Postgres. We participate in Advent of Code challenges together. We have a growth mindset and love learning new things; if that sounds like a neat place to work, check us out.
*This post was co-authored by Heap Software Engineers Matt Dupree and Amanda Murphy.