• The Sorting Problem
  • Selection Sort, Heapsort
  • Mergesort
  • Insertion Sort
  • Quicksort
  • Shell’s Sort (Extra)

Sorting – Definitions

A sort is a permutation (re-arrangement) of a sequence of elements that bring them into order according to some total order. A total order \(\preceq\) is

  • Total: \(x \preceq y\) or \(y \preceq x\) for all \(x, y\).
  • Reflexive: \(x \preceq x\)
  • Antisymmetric: \(x \preceq y\) AND \(y \preceq x\) \(\Longleftrightarrow\) \(x = y\).
  • Transitive: \(x \preceq y\) and \(y \preceq z\) implies \(x \preceq z\).

Unequal items may be “equivalent” under these definitions. e.g. given strings sorted by length, either may be put first. Strings of equal length are equivalent under this sorting, but not equal.

Sorting: An Alternate Viewpoint

An inversion is a pair of elements that are out of order.

Example: 0 1 1 2 3 4 8 6 9 5 7 has 6 inversions out 66 possible ones. The inversions are: 8-6, 8-5, 8-7, 6-5, 9-5, 9-7.

So, goal of sorting is: Given a sequence of elements with Z inversions, perform a sequence of operations that reduces inversions to 0.

Performance Definitions

Characterizations of the runtime efficiency are sometimes called time complexity of an algorithm. e.g. DFS has time complexity \(\Theta(V+E)\).

Chracterizations of the “extra” memory usage of an algorithm is sometimes called the space complexity of an algorithm. e.g. DFS has space complexity of \(\Theta(V)\). Note that the graph tiself takes up \(\Theta(V+E)\) space, but we don’t count this as a part of DFS’s runtime.

Selection Sort

  • Find the smallest item of the unsorted portion.
  • Move this item to the front (but end of the sorted portion of the array) and “fix” it.
  • Repeat for unfixed items until all items are fixed.

Time complexity: \(\Theta(N^2)\) time if we use an array (or similar data structures). This is inefficient because we keep looking at almost the whole array again and again each time we find the min.

Space complexity: \(\Theta(1)\) since we don’t need extra space.


Idea: Make getting the minimum fast by using a heap.

Note: for reasons that’ll become clear later, we’ll use a MaxHeap instead of a MinHeap.

Naive heapsorting N items:

  • Insert all items into a MaxHeap, and discard the input array. \(\Theta(N\log N)\) time.
  • Create output array. \(\Theta(N)\) time.
  • Repeat N times:
    • Delete largest item from the MaxHeap \(O(\log N)\) time.
    • Put it at the end of the unused part of the output array.

What’s the total runtime of Heapsort?

\(\Theta(N\log N + N + N \log N) = \Theta(N\log N)\). Much better than Selection sort!

Why MaxHeap?

Because we can’t fluidly swap to the end.

In-place Heapsort

  • Rather than inserting into a new array of length N + 1, we can use a process known as “bottom-up heapification” to convert the array into a heap (without using any extra memory).
  • This avoids the need for an extra copy of all the data.

Bottom-up Heapify

  1. Sink nodes in reverse level order (go through array in reverse order).
  2. After sinking, guaranteed that tree rooted at position k is a heap.

But after we bottom-up heapify the input array, the array is still not sorted. But, here’s the magic:

We removeMax of the heap and add to the half-end of the array.

In-place Heapsort Runtime


  • Bottom-up Heapification: \(O(N\log N)\) time. (For expert, prove that this is actually \(\Theta(N)\) in the worst case.)
  • Selecting largest item: \(\Theta(1)\) time.
  • Removing largest item: \(O(\log N)\) for each removal.

Time complexity: \(O(N\log N)\).

Memory complexity: \(\Theta(1)\). (Note: if done recursively, actually \(\Theta(\log N)\) to track recursive calls.)



  • Split items into two roughly even pieces
  • Mergesort each half
  • Merge the two sorted halves

Time complexity: \(\Theta(N\log N)\).

Memory complexity: \(\Theta(N)\).

Faster than Heapsort? How? Because faster in tilde notation, and for reasons to be known in 61C. (Hint: Heapsort has bad cache performance.)

Insertion Sort


  • Start with an empty output sequence
  • Add each item from input, inserting into output at right point.

Naive approach: If output sequence contains k items, worst cost to insert a single item is k because we need to move everything over.

Efficient method: Do everything in space using swapping.

In-place Insertion Sort


  • Repeat for i=0 to N-1:
    • Designate item i as the traveling item.
    • Swap item backwards until traveller is in the right place among all previously examined items.

Flavor: Some travellers move very little, some move far.

Insertion Sort Runtime

\(\Omega(N)\), \(O(N^2)\).

Picking the Best Sort

Suppose some evil entity give you a list that is sorted for all but one element, which sorting algorithm should you use?

Use Insertion Sort. Insertion sort works very well for almost sorted arrays because if an array has \(J\) inversions, the work needed for Insertion sort is \(\Theta(N+K)\).

  • Define an almost sorted array as one in which the number of inversions \(\leq cN\) for some \(c\).
  • Less obious observation: For small arrays (\(N<15\) or so), empirical fact is that insertion sort is fastest.

Q: For an array that is sorted, I pick a random element and change it, then which algorithm is the best?

A: Insertion Sort. Because worst case is linear time, but merge sort is always $N \log N$.

Insertion Sort has a special property: its runtime is proportional to the number of inversions. \(\Longrightarrow \Theta(N+K)\) runtime, where \(K\) is the number of inversions.

As a result, even though its worst case is \(O(N^2)\), it can be really good in certain cases.

Less obvious thing: For small arrays (N < 15 or so), insertion sort is fastest. Rough idea is that: Heapsort and Mergesort are divide and conquer algorithms with too much overhead, whereas insertion sort goes straight to the conquest.


Much stranger core idea: Partitioning

Invented by Tony Hoare, a novice programmer at the time.


To partition an array a[] on element x=a[i] is to rearrange a[] so that:

  • x moves to position j (which may be same as i)
  • All entires to the left of x are less than or equal to x
  • All entires to the right of x are greater than or equal to x

Interview Question (Partitioning)

Given an array of colors where the 0th element is white, and the remaining elements are either red or blue. How can you rearrange the array so that all red squares are to the left, and all blue are to the right of the white square? The algorithm must take \(\Theta(N)\) time with no space restriction.

Approach #1: Red pointer, blue pointer approach

  • Create an empty array of size N. Set red pointer at 0 and blue pointer at N - 1.
  • Go through the array, if red, add at red pointer and increment red pointer, if blue, add at blue pointer and decrement blue pointer.

Approach #2: Two array solution.

  • One array for red, one array for blue. At the end, merge the two array wiht the white in the middle.

Partition Sort (a.k.a Quicksort)

Quick sort

  • Notice that 5 is already in the right place when we first partition pivoted on 5.
  • Also note that the left half and right half are somewhat sorted since they have to be to the left or right of 5.
  • That means we can just sort the two halves separately, using the leftmost item of each half as the pivot.

Quicksort Algorithm

  • Partition on leftmost item
  • Quicksort left half
  • Quicksort right half

Quicksort Runtime

Theoretical analysis:

  • Partitioning cost \(\Theta(K)\) time, where \(K\) is the # of elements being partitioned.
  • Interesting twist: overall runtime is non-deterministic, and it’ll depend on where the pivot lands.

Best case: Pivot always lands in the middle. Then the runtime is \(\Theta(N\log N)\).

Worst case: Pivot always lands at beginning of array. Then the runtime is \(\Theta(N^2)\).

Now compare this to Mergesort, it seems that Quicksort isn’t really fast. Really?

Argument #1: 10% Case (In General)

Suppose pivot always ends up somewhere at least 10% away from either edge.

Then runtime is \(O(NH)\) where \(H \approx \log_{\frac {10} 9} N = O(\log N)\). So overall \(O(N\log N)\).

Argument #2: Quicksort is really just Insertion into a BST

It turnsout Quicksort is just BST Sort. This idea might take a while to sink in …

  • Couldn’t we just check the inversion and see if it’s better to use Insertion Sort or Quicksort?
  • It turns out checking for inversion takes \(N\log N\) too.

Quicksort Performance

The performance of Quicksort depends critically on:

  • choice of pivot
  • how you partition around the pivot
  • other optimizations you might add

Now, here comes the question:

Do we need to worry about the worst case of giving Quicksort an already sorted array?

  • No? We can check if it’s already sorted before doing Quicksort.
  • Yes? If everything except one element is sorted, Quicksort is still slow.
  • So, using Quicksort is fast but kinda risky…

Avoiding the Worst Case

If pivot always lands somewhere “good”, then Quicksort is \(\Theta(N\log N)\). So what can we do?

Hint: Recall that the fundamental issue is that the leftmost item is always chosen as pivot, and after partitioning, items on each side of pivot appear in the same relative order.

Possible Answers:

  • Don’t pick the leftmost item.
    • Pick a random item as pivot.
    • Scramble the array before starting.
    • Pick a smarter pivot, maybe pick 3 items from array and pick the middle value. Maybe use the median.

Hug’s Answers:

  • Randomness: Pick a random pivot or shuffle before sorting.
  • Smarter pivot selection: Calculate or approximate the median.
  • Introspection: Switch to a safer sort if recursion goes too deep.
  • Try to Cheat: If the array if already sorted, don’t use Quicksort.

Philosophy 1: Randomness (Preferred Approach)

We know that very rarely \(\Theta(N^2)\) cases do happen in practice. e.g:

  • Bad ordering: Array already in sorted order
  • Bad elements: Array with all duplicates

Two strategies to deal with bad ordering:

  1. Pick pivots randomly
  2. Shuffle before you sort

Strategy 2 requires care in partitioning code to avoid \(\Theta(N^2)\) behavior on arrays of duplicates. (This is a common bug in textbooks. See A Level Problems)

Philosophy 2: Smarter Pivot Selection (Linear time pivot pick)

The best possible pivot to pick is the median, because it splits the problem into two problems of size \(N\over 2\).

Raises interesting question of How do you compute the median of an array?

  • How do you compute the median of an array?

Is it possible to find the median in \(\Theta(N)\) time? Yes! Use “BFPRT”, which is called PICK in the original paper. In practice, this algorithm is rarely used.

Philosophy 2b: Approximate the Median

The bad thing is that for any pivot selection strategy that is deterministic, constant time, the resulting Quicksort has a family of dangerous inputs that an adversary could easily generate.

See Mcllroy’s A Killer Adversary for Quicksort.

Philosophy 3: Introspection

Take a look at the recursion depth, if it goes too deep, change to merge sort.

Tony Hoare’s In-place Partitioning Scheme

  • Best case: \(\Theta(N\log N)\).
  • Worst case if always leftmost pivot: \(\Theta(N^2)\).
  • Random pivot or Shuffle before sorting: \(\Theta(N\log N)\)
  • Non-random quicksort with a constant-time deterministic pivot selection all have a “dangerous” input.
  • Space usage: \(\Theta(N)\).

We can actually imporve both the space complexity and make algorithm faster by a constant factor by using in-place partitioning.

This approach’s Main Idea:

  • Start with using the leftmost item as the pivot.

  • Left pointer loves small items, hates large or equal items (as compared to a pivot).
  • Right pointer loves large items, hates small or equal items (as compared to the same pivot).
  • The two pointer walks towards each other, stopping on a hated item. When both pointers have stopped, swap and move pointers by one.
  • Swap pivot with G.

Using this parititoning scheme alone will not make things fast, we still need to pick pivots intelligently. Shuffling and leftmost is pretty good.

Quick Select

Remember that we can avoid the worst case of \(\Theta(N^2)\) if we can find the median really fast, but so far no one has found an algorithm to compute median that’s fast enough to be worth it.

The Selection Problem

Given an array of N items, find the k^th^ smallest item for an arbitrary k.

How would we do this?

  • For \(k=1\), find the smallest item.
  • For \(k=2\), either sort the array and get 2nd smallest, or keep 2 variables, one smallest one 2nd smallest.
  • For \(k = \frac N 2\) (the median)? Many approaches take \(\Theta(N\log N)\), but can we do it in \(\Theta(N)\) time somehow?

Quick Selection

Goal: Find the median.

Worst-case performance? Give an array that causes worst-case performance.

  • If our array is already sorted, like [1, 2, 3, ..., N]. Then, if each time we use the leftmost item, it goes from 1, 2, …, all the way to N/2.
  • So, runtime is \(\Theta(N^2)\) assuming we are not doing the Tony Hoare way.

Expected performance? \(\Theta(N)\) time. Why? Because “on average”, we first do ~̴N compares, ~̴N/2 compares, ~̴N/4 compares and so on.


Definition: A sort is stable if order of equivalent items is preserved.

Is Insertion sort stable? Yes, when equivalent items in the later (more rightward) part of the array gets picked as traveller and move left, they never move left past their equivalent brethren.

Is Quick sort stable? Maybe. Depends on the partitioning strategy.

Is Merge sort stable? Yes.


In Java, Arrays.sort uses:

  • Merge sort (TimSort actually), when sorting Objects.
  • Quick sort, when sorting primitives.

Why? Obviously it has to do with stability, but it’s good to look at the A Level problems to come up with a ELI5 explanation.

Optimizing Sorts

There are additional tracks we can play:

  • Switch to insertion sort with array size < 15
  • Make sort adaptive: Exploit existing order in awway (Insertion Sort, SmoothSort, TimSort (the sort in Python and Java)).


There are many ways to shuffle.

Easiest way: Associate each item to a random number, and sort the items by their associated random numbers.

Hey, kudos for making it this far! If you've liked this, you might also like Asymptotics.