Algorithms to Live By: Book Review

Algorithms to Live By: The Computer Science of Human Decisions (2016), Brian Christian and Thomas Griffiths. An algorithm is a set of rules followed in calculations or other problem solving, particularly by computers. For me, algorithms were a set of if-then statements to get my statistical programs working. Christian and Griffiths take a casual definition (sequence of steps to solve a problem) and go off in many different directions. Following a recipe is an algorithm. They start out trying to find an apartment in the hot market of San Francisco. How do you make an informed decision, when you know little but multiple people are ready to make offers? There solution was to spend 37% of the hunt on looking, then seriously shopping, called “optimal shopping,” which they consider an algorithm. It’s the right balance between impulse and overthinking, the explore/exploit tradeoff. Sorting, caching, and scheduling are yet to come. “Tackling real-world tasks requires being comfortable with chance, trading off time with accuracy, and using approximations” (p. 5).


Chapter 1: Optimal Stopping: When to stop liking. Again, 37% rule, but based on the secretary problem for hiring the single best applicant. Don’t stop too early or too late. It creates the look-then leap rule, the threshold rule (must satisfy a particular criterion). Cost-benefit analysis of the waiting game.


Chapter 2: Explore/Exploit: The latest vs. the greatest. “Exploration is gathering information, and exploitation is using the information” (p. 32). The multi-armed bandit problem is based on a casino full of slots, looking for the best expected value. One scenario is win-stay, lose-shift. This can apply to picking a restaurant or other decision, like reliving favorite experiences or new ones. Studies make sequels because of a guaranteed fan base. Unilever asked John Gittens to optimize drug trials to find which compounds likely are effective. The Gittins index was a dynamic allocation index. The upper confidence bound algorithm picks the option with the top confidence interval is highest.

A/B testing of a webpage starts with different drafts, then uses specific metrics like click-rate, used by Google and Amazon.

Tuskegee Syphilis Study ran from 1932-72, untreating black men. Ethical studies have been proposed, like adaptive trials, a version of win-stay, lose-shift.

“Life should get better over time. What an explorer trades off for knowledge is pleasure” (p. 58); like shifting to favorite things.


Chapter 3: Sorting: Making order. “Sorting is at the very heart of what computers do” (p. 60). Herman Hollerith invented the Hollerith Machine to count and sort using punched manila cards, which was adopted for the 1890 census. This became IBM. Google as a search engine is a sorting machine. Sorting is the opposite of economies of scale, unit costs rise. For computers it’s factorial time, even worse that exponential time. Bubble sort uses quadratic time moving each item at a time. Insertion sort is a bit better. IBM invented collators that could merge two separately ordered stocks in 1936, a merge sort. The Preston Sort Center for the Library of Congress moves 167 books a minute using a bar code scanner into 96 bins; a similar system is used by the New York Public Library, a bucket sort.


A central tradeoff in computers is sorting and searching. Key point is sorting is unnecessary for something that will never be searched in a waste, suggesting the need for up-front sorting. Single elimination tournament is a form of sorting, common in several sporting events. Alternatives include round-robin format, ladder tournaments and bracket tournaments, merge sorts. Randomness happens, which is noise in computer science. Comparison counting sort is a quadratic-time algorithm, like a round-robin.


“Displacement happens when an animal uses its knowledge of the hierarchy to determine that a particular confrontation simply isn’t worth it” (p. 80), based on pecking order, a search-sort tradeoff. Dominance hierarchies are information hierarchies. The Fortune 500 list is a corporate hierarchy.


Chapter 4: Caching: Forget About it. What to keep and how to arrange it. On a computer memory space is limited. Folders form hierarchies. They use caching, making a size, speed tradeoff. There are small, fast working memory and larger storage memories. Computers and tablets have six-layer memory hierarchies, with replacement policies or evection policies, based on Belady’s Algorithm. Alternatives are least recently used or first-in, first-out. Akamai is in the caching business and handled a quarter of all internet traffic. Content distribution networks (CDNs) maintain copies of popular websites, including Akamai. Amazon has its own patent for shipping popular items. Psychology takes notice. Practicing helps memory, then the forgetting curve as memory fades.


Chapter 5. Scheduling: First Things First. Study of scheduling began with Frederick Taylor in 1874, to use machinery more efficiently, based on scientific management. Henry Gantt built Gantt charts for construction projects. Washing and drying was an early example of an optimal algorithm. Goals must be explicit. Earliest due date, shortest processing time. Apply to debt: after debt avalanche. Single highest interest rate. Coming up with the right goals can be difficult. Switching tasks is a context switch. A CPU works on one program at a time but can swap quickly. Becomes a scheduling and caching problem, dealing with responsiveness and throughput.


Chapter 6: Bayes’s Rule: Predicting the future. Start with a prediction from a single data point. Thomas Bayes was an English minister born in 1701. His math emphasized hypotheticals based on likelihood. Pierre-Simon Laplace, expended Bayes on probability, creating Laplace’s Law: for a possible drawing of w winning tickets in n attempts, the expectation is wins plus one, divided by attempts plus two: (w+1)/(n+2). This is an attempt to use small data in the world. Bayes’ Rule combines preexisting beliefs with observed evidence and multiplying their probabilities together. Start with prior probabilities: Assign a prior probability. A human lifetime follows a Gaussian (normal) distribution. Many distributions do not follow a bell curve, like income or wealth levels which are power functions. In Bayes having an idea of the distribution is useful.


“For any power-law distribution, Bayes’ Rule indicates that the appropriate prediction strategy is a multiplicative rule: multiply the quantity observed so far by some constant factor” (p. 139). For a normal curve, it’s an average rule. The book shows six distributions (p. 141). People made fairly good predictions when they have good priors. Accurate information makes the best decisions. The media is not good at representing the frequency of events.


Chapter 7: Overfiting: When to think less. The authors start with Darwin and Franklin putting the pros and cons of marriage. Psychologists go with the two-factor model for marriage and happiness. Problem with noise and mismeasurement. Taste is a proxy measure for health and important nutrients. Overfitting can be a problem with incentive structures, which create consequences, like dysfunctional consequences of performance measures. In law enforcement and military, rote training is used on certain motions and tactics. One strategy for detecting overfitting is cross-validation, how well a model fits the data. Standardized tests offer one measure of education success. The tests can be cross-validated say with oral exams or essay tests. One way to choose a model is Occum’s razor, the simplest hypothesis or model. Heuristics (rules of thumb) are a substitute for overfitting, like invest 50/50 in bonds and stocks. Fads are not likely the best antidote.


Chapter 8: Relaxation: Let it slide. Constrained optimization problem. Lincoln rode the judicial circuit, preferring to visit all necessary towns in as few miles as possible, akin to the traveling salesman problem. “A problem is considered tractable if we know how to solve it using an efficient algorithm” (p. 172). Constrained relaxation: remove some constraints to make the problem easier. “Finding the shortest route under these looser rules produces what’s called the minimum spanning tree” (p. 174). Planners try to place fire trucks so every house can be reached within a fixed amount of time. Lagrangian relaxation. Use a scoring system, reducing impossible to costly. Scheduling major league baseball games, including bending the rules.


Chapter 9: Randomness: When to leave it to chance. Using random number tables. Laplace: estimate by sampling. Monte Carlo method of simulation developed at Los Alamos. John Rawls tried to reconcile liberty and equality: should a society be more free or more equal to be just? His approach was the veil of ignorance. Choose society before you know your status. Persuasion usually is based on cherry-picked anecdotes and aggregate statistics. There are negotiated tradeoffs: like searching versus caching. Greedy algorithm or myopic algorithm. Baseline itinerary versus alternatives. Local versus global maximums—in hill climbing algorithm. Analogy-based approach and simulations.


Chapter 10: Networking: How we connect. The internet is a collection of protocols, beginning with transmission control protocol or TCP, building on the idea of telephones using circuit switching. Then packet-switching where messages are broken into packets and merge them. Problem of bufferbloat to smooth out bursts, using queues. Email was designed by Ray Tomlinson.


Chapter 11: Game Theory: The minds of others. Game theory led to algorithmic game theory. Game theory is: “two-person contests where one player’s gain is another player’s loss. Mathematicians analyzing these games seek to identify a so-called equilibrium: that is, a set of strategies that both players can follow such that neither player would want to change their own play, given the play of their opponent” (p. 233). John Nash determined that every two-player game has at least one equilibrium, a Nash equilibrium. That doesn’t mean that an equilibrium is trackable or can be found.


The most well-known problem is the prisoner’s dilemma, where both confessing is the stable solution. “The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true” (p. 237). People will all overgraze their cattle on commons of public pasture. Mechanism design (reverse game theory): what rules will give us the behavior we want to see? Some equilibrium strategies might be rational but bad for everyone.

“How selfish soever man may be supposed, there are evidently some principles in his nature, which interest him in the fortune of others, and render their happiness necessary to him, though he derives nothing from it, except the pleasure of see it” (Adam Smith, the theory of moral sentiments). Involuntary selflessness. Capitalists, economists, and sociopaths follow self-interest, but then go to Machiavelli.

Auctions: sealed-bid first-price auction: presumably the winner always overpays. Dutch auction, descending auction (common in flower auctions). English auction: ascending auction. Problem is information cascade (people make the same decision in a sequence; similar to herd behavior). Vickrey auction is a sealed bid, but winner pays the amount of the second-place bidder.


Conclusions: Computational kindness as a design principle. If its hard, do what makes the most sense in the least amount of time.