Literature Review on the Iterated Prisoner’s Dilemma

Jon Kaplan

The possibility of war looms over two rival countries like a thick cloud of smog as rising diplomatic tensions continue to exacerbate the effects of rampant jingoism. Neither country wants a military conflict, but both countries must be prepared in case the foe launches an attack. The problem facing both entities here is none other than the prisoner’s dilemma, a fundamental topic in the field of game theory.

Game theory was introduced as an academic discipline in 1928 by Princeton mathematician John von Neumann when he published his first paper on the subject (Dixit and Nalebuff 2008). Essentially, it is the science behind strategies, employing math and logic in an attempt to determine the optimal moves for people (players) to take in various situations (games) (Dixit and Nalebuff 2008). In every game, the overall outcome depends on the decisions made by all players, and every player attempts to maximize his gains. Some examples of games include chess and rock-paper-scissors, but they can also arise in many real-life situations.

The prisoner’s dilemma is one such game, in which two criminal suspects (Bonnie and Clyde) are separated by the police and given an option—they can either confess and betray the other, or remain silent and deny everything. Three possibilities occur: if both deny everything, they will each serve 1 year in prison; if Bonnie confesses and Clyde denies, Bonnie will go free and Clyde will spend 3 years in jail; if both confess to the crime, they will each serve 2 years. Confessing is referred to as a dominant strategy, as it raises a player’s standing regardless of the other player’s move—if Bonnie confesses, then Clyde would also be wise to confess to get 2 years instead of 3, and if Bonnie denies, then Clyde’s best move is still to confess and get off free. This is the rational decision, but if both players arrive at this decision and confess, they serve 2 years, as opposed to the 1 year they would serve if they cooperated and denied it. As such, this is a dilemma in which the dominant strategy is not always the best.

Using prisoners is just one of many ways to model this game—it can also be represented by the pricing choices of competing oligopolistic companies (Dixit and Nalebuff 2008). In all other situations, the equivalent of prisoners denying is referred to as cooperating, and the equivalent of confessing is referred to as defecting. If company A sets low prices and company B does not, company A profits more due to having more customers. If both defect and set their prices low, both companies profit less than if they cooperated and left their prices high. The arms race of the Cold War is another example; neither country benefits from all the money spent on weapons, but both appear to be better off arming themselves (Dixit and Nalebuff 2008).

Even more fascinating is the iterated case of the prisoner’s dilemma, in which the same two players engage in multiple rounds, making long-term trust a vital factor. If a player defects in one round, he may benefit temporarily but will likely suffer from losing the other player’s trust, reducing the chance of mutual cooperation and decreasing potential future payoff (or reward). Every player in this game must have a strategy, or a set of predetermined responses to the outcomes of previous rounds. Initially explored by Robert Axelrod in 1981, this variation of the prisoner’s dilemma showed that greedy strategies (frequent defection) fail over long periods of time while altruism (frequent cooperation) thrives (Axelrod and Hamilton 1981). He used this as an explanation regarding the origin of altruism through natural selection—the uncooperative behaviors diminished in favor of collaboration to allow population growth.

Because of its prolonged nature, the iterated prisoner’s dilemma is an exceptional model for cooperation tendencies in the real world. For instance, guppies inspect predators by swimming together in a pair, taking turns advancing and stopping only when the other guppy stops first after getting a sufficiently good look at the predator (Sinervo 1997). The strategy they employ is called tit-for-tat, which entails initially cooperating and then mirroring the opponent’s previous move in future rounds. Other animals that use prisoner’s dilemma strategies include bees, ants, termites, naked mole rats, and woodpeckers (Sinervo 1997), and as such their populations can be studied using the prisoner’s dilemma as a mathematical model. By establishing a deeper knowledge base through research of all the external factors and strategic possibilities, these models can be substantially improved to allow for solving, via human intervention, real-world population crises caused by problematic strategies.

Several studies have been conducted on the effects of external factors on outcomes of the iterated prisoner’s dilemma. In a 2014 study, researchers Çetin and Bingol required participants to either consistently cooperate in every single game or defect in every single game. Players could choose new partners between rounds and refuse to work with certain people—here, the important external factor was the ability to remember who does what. Naturally, if a cooperator partners with a defector he gets nothing, so the optimal strategy is to remember the defectors and refuse to play with them. To be clear, a cooperator is still a cooperator if he refuses to play with a defector—the term only refers to selecting the cooperation choice within the game itself. The results showed that cooperator payoffs were unaffected by memory capabilities, but defectors suffered from having opponents with exceptional memories. Additionally, having more defectors led to more competition between them with less average payoff per player. Because the players here had to remain cooperators or defectors for the entire study, the defectors were faced with social isolation against cooperators with strong memories. Applying these results to an animal population, the initial defectors may end up cooperating once they begin to feel the negative effects of social isolation. This ties back into Axelrod’s discovery that selfish behaviors are snuffed out in favor of altruistic tendencies over time (Axelrod and Hamilton 1981).

A similar 2014 study by Duffy and Smith explored the effect of a different external factor—the ability to recall information outside of the game. Players were given either a 2 digit or a 7 digit number to memorize at the start of the game and they would have to remember it at the end. Those with the longer number were found to be less strategic, more prone to changing strategies throughout the game, and less able to use previous results to adopt a new strategy. These two studies show the limits of mental capacity and demonstrate one of the imperfections with the standard given conditions of the academic problem—in real life, the players do not always behave rationally. Factors that realistically add mental load are abundant; for example, animals often have to remember when their predators hunt and where they can hide. The researchers concluded that there are indeed “brains in games,” and the outcomes are heavily dependent on the capabilities of those brains to manage information. By applying the information about how these external factors affect strategizing, population models can be improved to allow for quicker human intervention before an incoming external factor (like having to remember where food is located during the winter) drastically affects an animal population.

In a slightly different vein, a lot of research is being done on paper and in simulations. In a 2011 study by Brown and Ashlock, three strategies (always cooperate, always defect, and tit-for-tat) were evaluated by their respective abilities to produce a dominant strategy, or one that is ideal regardless of the opponent’s play (like confessing in the prisoner’s dilemma). It was found that players who always defect are dominated by anti-nice players, or players who never cooperate first, and tit-for-tat players are dominated by nice players, or players who never defect first. No conclusion was stated for dominance of players who always cooperate, but based on Axelrod’s investigations, an anti-nice player would most likely dominate by taking advantage of the cooperator’s naiveté (Axelrod and Hamilton 1981). As such, this ties back into Çetin’s study in which cooperators shied away from defectors in order to avoid being exploited (Çetin and Bingol 2014). This study is highly applicable to the behaviors of interacting animals and the stability of their populations. For instance, if bunnies always run away from wolves (always defect), the wolves may benefit most from always chasing (anti-nice); if the wolves stop chasing to take a break (thereby not being anti-nice), they may give the bunnies too much of a lead, making the benefit of resting lower than that of chasing. Information like this helps to eliminate unknowns about strategy success rates and improve models that analyze increasing and decreasing populations to predict problems and potentially allow early human intervention.

Often a situation arises in which one player has leverage over another. In a 2014 study by Chen and Zinger, the capabilities of extortion were explored. Extortion refers to a situation in which one player always receives a higher payoff and can ensure that the other player’s best choice is to cooperate. This paper specifically looked at the scenario in which an extortionist plays against an adapting player who may not know that his best decision is to cooperate. It was shown that the extortionist will always be able to force the other player into cooperation and receive a similarly high payoff eventually, regardless of how long it takes for the unknowing player to decide to cooperate. Unlike previous studies, this one examined the case in which one player has more power than the other, so these results apply only to situations in which an extortionist animal interacts with a weaker one, and can potentially be expanded to model whole animal populations controlling large payoffs, such as food sources. This information can help models predict when the extortion becomes problematic for the underdog (likely when it realizes it needs to cooperate), and allow for humans to intervene to prevent unnecessary extinction.

Another potential condition on the iterated prisoner’s dilemma involves limiting the number of rounds. In a 2014 study, Kim, Myungkyoon, and Son examined all 32 possibilities of an iterated prisoner’s dilemma in which the players have a memory span of one round. In the first round they can either cooperate or defect, leading to four possible results: CC (both cooperate), CD (one cooperates), DC (the other cooperates), and DD (both defect). Each of these results can lead to a different response in the second round, yielding five separate decisions per strategy, or 25 = 32 unique strategies. Plugging every possibility into a simulation showed that a win-lose circulation loop, much like in rock-paper-scissors, appears in the presence of one strategy (cooperate first and then only cooperate again if the opponent defected in round one). This means that certain strategies beat others (rock beats scissors), while remaining vulnerable to still-different strategies (paper beats rock), and so the population oscillates up and down when the oscillation strategy is present. When applied to a scenario in which 1% of a population (the invading population) employs the oscillation strategy while 99% of a population utilizes a different strategy, the circulation loop causes repeated oscillation. However, when the simulation is run on a homogeneous population (one with equal strategy distributions), the population will not fluctuate because the oscillation strategy is quickly overcome by other strategies. The results for invading strategies are applicable for determining the effects of invasive species. For example, if a small amount of green iguanas are introduced to the Everglades in Florida, they may employ an invasive strategy in interactions with other species who try to defend from the iguana’s negative influence (Cubie 2009). In the homogenous population, however, little oscillation is present due to the plethora of interacting strategies, and this can be observed in real life through stable populations in areas with high species diversity (Kim et al. 2014).

Although all this research is substantive, it raises many more questions to be explored. How would the invading strategy in Kim’s network study affect population stability if that 1% was bogged down by an external factor similar to the one presented in Duffy’s study? Kim’s network study could also potentially be expanded into a two-round memory game to understand the behavior patterns of more advanced creatures. Finding the answers to these questions can lead to better modeling of more complex populations (in which extortion may become prevalent), and expanding the memory further can eventually lead to mapping out widespread human interactions.

All of these studies are building blocks adding to the database of information for game theory. Through studies with human participants, scientists can analyze the factors that affect cooperation on an individual level, whereas studies with mathematics and models allow researchers to uncover the effects of different behaviors of subpopulations on the population as a whole. In characterizing more external factors and strategies, more behaviors can be understood, allowing for the development of better models. By strengthening these models, they can be more effectively used to mitigate damages caused by harmful strategies in various populations. The next step is just to ensure that humans intervene only when necessary, and carefully observe otherwise.

Works Cited

Axelrod, Robert, and William D. Hamilton. “The Evolution of Cooperation.” Science 211 (1981): 1390-396. Web. 8 Sept. 2014.

Brown, J.A., and D.A. Ashlock. “Domination in Iterated Prisoner’s Dilemma,” 24th Canadian Conference on Electrical and Computer Engineering (2011): 1125-1128. Web. 06 Sept. 2014.

Çetin, Uzay, and Haluk O. Bingol. “Iterated Prisoners Dilemma with Limited Attention.” Computer Science and Game Theory (2014): 1-6. Cornell University Library. Web. 07 Sept. 2014.

Chen, Jing, and Aleksey Zinger. “The Robustness of Zero-Determinant Strategies in Iterated Prisoner’s Dilemma Games.” Journal of Theoretical Biology 357 (2014): 46-54. Web. 07 Sept. 2014.

Cubie, Doreen. “Everglades Invasion.” National Wildlife. National Wildlife Federation, 1 Feb. 2009. Web. 13 Sept. 2014.

Dixit, Avinash and Barry Nalebuff, “Game Theory.” The Concise Encyclopedia of Economics. 2008. Library of Economics and Liberty. 17 September 2014.

Dixit, Avinash and Barry Nalebuff, “Prisoners’ Dilemma.” The Concise Encyclopedia of Economics. 2008. Library of Economics and Liberty. 17 September 2014.

Duffy, Sean, and John Smith.”Cognitive Load in the Multi-Player Prisoner’s Dilemma Game: Are there Brains in Games?” Journal of Behavioral and Experimental Economics 51.0 (2014): 47-56. Web. 06 Sept. 2014.

Kim, Young Jin, Myungkyoon Roh, and Seung-Woo Son. “Network Structures between Strategies in Iterated Prisoners’ Dilemma Games.” Journal of the Korean Physical Society 64.3 (2014): 341-345. Web. 06 Sept. 2014.

Sinervo, Barry. “Societal Evolution.” Macroevolutionary Patterns and Phylogeny. UCSC, 1997. Web. 12 Sept. 2014.

 

writing in the natural sciences