In a previous article we quickly glanced through array fundamentals. We saw how array are presented in RAM, the difference between static and dynamic arrays and how an array can be used to build a data structure called a stack. Here we are going to dive a bit more deeper into different algortihmic patterns for arrays and their solutions.
The Kadane’s algorithm
Imagine you have a list of numbers, something like daily changes in your bank account. We have some positive (you earned money) and some negative (you spent money). You want to find the best consecutive sequence of days where your total gain was the highest. This is exactly what the Kadane’s Algorithm does. Instead of checking every possible sequence (which would be slow), it scans the list just once. At each step, it asks:
- Is it better to start a new sequence here?
- Or to add this number to the sequence I’m already building?”
It keeps track of the highest sum it has seen so far and, by the end of the list, we get our answer.
This algorithm while simple is widely used in finance, signal processing, or anywhere you need to find the “best contiguous segment” in a series of numbers.
The brute force
In the leetcode language, problem should be framed as followed.
Find a non-empty sub-array with the largest sum
Given the array [4, -1, 2, -7, 3, 4] we can quikcly find a brute force approach using two pointers i and j.
1 | def brut_force(nums: list[int]) -> int: |
The tome complexity of this algorithm is O(n^2). Each loop is O(n) so we have O(n * n).
The kadane’s way
The idea behind the kadane’s algorithm is pretty simple. If your sum is below 0. Reset your max to 0 and keep adding the element of the list until the end. The code will look like this.
1 | def kadane(nums: list[int]) -> int: |
This algorithm is very smart and allows you to obtain the answer in O(n). Simple and efficient.
Let’s suffer
A set of exercises to practice the concept.
Sliding window
The Kadane’s algorithm finds the maximum sum of any contiguous subarray in one pass. But looking at it closer, you may realize that what the code is doing is ust managing a window.
Indeed, t does not recalculate from zero every time. Instead, it evaluates to “Keep going… or start fresh here?” If the running sum is good (namely based on your rules, before was not < to 0), it extends. If it goes below 0, it resets. This is the pattern of a sliding with the right edge always moving ahead. We can acutally rewrite the kadane algorithm by expleciting the widnow:
1 |
|
And once you see it here, you will see windows everywhere:
- Longest substring without repeats? Sliding window.
- Max average over k elements? Sliding window.
- Highest-scoring DNA segment? Sliding window again.
Problems with static windows
In some situation your windows will have a defined size. A typical problem maybe:
Given an array, return True if there are two element withing a window of size k that are equal
There are ways of brute forcing it with two for loops but the trick here to use the right data structure: a hashSet (set() in python).
In Python, the set data structure is an unordered collection of unique elements. It is implemented using a hash table, which allows for average-case constant time complexity (O(1)) for operations such as insertions, deletions, and membership tests. REMEMBER:
O(1)does not mean “one operation”. It means that the speed of operation is not dependent of the number of element present in your data structure.
1 | def find_duplicate(nums: list[int]): |
We could acutally have done the same with a dictionnary.
1 | def find_duplicate(nums: list[int]): |
Practice with these exercises
Problems with dynamic windows
At Sunrise, I built an algorithm to measure hypoxic burden. The idea was to scan the SpO2 values. For that we used a dymanic windows:
- When SpO2 drops below a threshold (e.g., 90%)
- Next, we keep it open as long as levels stay low
- Then we close it only when oxygen recovers and stabilizes.
- Finally we calculate severity (area under the curve) before resetting the window.
Knowing how to play with dynamic sliding window is very convenient and can be applied to many other situations:
- You need to fin enriched regions in ChIP-seq or ATAC-seq data? Slide a window, but adjust its size based on local signal-to-noise.
- You are detecting bursts of activity? Expand the window while traffic is high for analysis and contract it when it the traffic descrease.
This pattern is common. Let’s see some basic problem examples and the code to solve them.
Find the length of the longest subarray with identical values
1 | def find_longest_subarray(nums: list[int]): |
Find the length of the smallest subarray where the sum of its elements is greater than or equal to a given target. Assume all values are positive.
1 | def find_min_subarray(nums: list[int], target: int): |