Binary search with multiple matches
Suppose you’re doing a binary search on an array that contains multiple matching elements. How can you control which element you end up at?
Common setup
We’ll search through array, which contains multiple copies of target. Because of these multiple copies, we’ll refer to the one that we actually end up picking as the final answer.
To perform the binary search, we initialize lo and hi to the lower and upper bound, respectively, of the region that might contain the final answer. Our binary search will continue narrowing this range until it only contains one element.
array = [0, 1, 2, 3, 4, 4, 4, 4, 5, 6, 7, 8, 9]
target = 4
lo = 0
hi = len(array) - 1Let’s create some helper functions to help us visualize what’s happening.
def stringify(lo, hi, idx, num):
prefix = "|" if idx == lo else " "
suffix = "|" if idx == hi else " "
return f"{prefix}{num}{suffix}"
def print_state(lo, hi, array):
s_array = [stringify(lo, hi, idx, x) for idx, x in enumerate(array)]
print(" ".join(s_array))Standard binary search finds the leftmost copy
while lo != hi:
print_state(lo, hi, array)
mid = lo + (hi - lo) // 2
if target <= array[mid]:
hi = mid
else:
lo = mid + 1
print_state(lo, hi, array)|0 1 2 3 4 4 4 4 5 6 7 8 9|
|0 1 2 3 4 4 4| 4 5 6 7 8 9
0 1 2 3 |4 4 4| 4 5 6 7 8 9
0 1 2 3 |4 4| 4 4 5 6 7 8 9
0 1 2 3 |4| 4 4 4 5 6 7 8 9The standard binary search that most people learn has the behavior of finding the leftmost copy of the target:
- We first compute the midpoint
midofloandhi.mid = lo + (hi - lo) // 2is similar tomid = (lo + hi) // 2, except that it avoids overflow. - We compare the element at
midto thetarget.- If
target <= array[mid], then nothing to the right ofarray[mid]can be the leftmost copy oftargetand we can safely sethi = mid. We can’t sethi = mid - 1becausearray[mid]itself still might be the final answer. - Otherwise, we know that
target > array[mid]. This means thatarray[mid]itself (and everything to the left) can’t possibly contain the final answer, so we can safely setlo = mid + 1.
- If
- The loop is guaranteed to terminate because each loop iteration guarantees that the range of possible elements gets smaller.
- If
hi - lo > 1, thenlo < mid < hiand we’ll definitely either increaseloor decreasehi. - If
hi - lo == 1, thenlo == mid < hi. We will either setlo = mid + 1, which increaseslo, orhi = mid, which decreaseshi.
- If
Finding the rightmost copy
while lo != hi:
print_state(lo, hi, array)
mid = lo + (hi - lo + 1) // 2
if array[mid] <= target:
lo = mid
else:
hi = mid - 1
print_state(lo, hi, array)|0 1 2 3 4 4 4 4 5 6 7 8 9|
0 1 2 3 4 4 |4 4 5 6 7 8 9|
0 1 2 3 4 4 |4 4 5| 6 7 8 9
0 1 2 3 4 4 4 |4 5| 6 7 8 9
0 1 2 3 4 4 4 |4| 5 6 7 8 9This is quite a small modification!
- The high-level logic is the same as before.
- Previously when we were looking for the leftmost copy of
target, observing thattarget <= array[mid]allowed us to discard everything to the right ofarray[mid]. - Now that we’re looking for the rightmost copy of
target, observing thattarget >= array[mid]allows us to discard everything to the left ofarray[mid]. This means one of our update conditions will belo = mid. - To guarantee that the loop terminates, we need at least one of
loorhito move on each loop iteration. This implies that the other update condition must behi = mid - 1. - This in turn implies that when
hi - lo == 1, we need to setmid = hi. We achieve this viamid = lo + (hi - lo + 1) // 2.