The essence of cooperation is described by the Prisoner’s Dilemma. Two players have a choice between cooperation, C, and defection, D. If both players cooperate they get more than if both defect, but defecting against a cooperator leads to the highest payoff, while cooperating with a defector leads to the lowest payoff. One way to construct a Prisoner’s Dilemma is by assuming that cooperation implies paying a cost for the other person to receive a benefit, while defection implies taking something away from the other person ().
Without any mechanism for the evolution of cooperation, natural selection favors defection. But a number of such mechanisms have been proposed, including direct and indirect reciprocity7
. Direct reciprocity means there are repeated encounters between the same two individuals, and my behavior depends on what you have done to me 1-6
. Indirect reciprocity means there are repeated encounters within a group; my behavior also depends on what you have done to others.
Costly (or altruistic) punishment, P, means that one person pays a cost for another person to incur a cost. People are willing to use costly punishment against others who have defected8-18
. Costly punishment is not a mechanism for the evolution of cooperation7
, but requires a mechanism for its evolution19-23
. Like the idea of reputation effects24
, costly punishment is a form of direct or indirect reciprocity. If I punish you because you have defected against me, direct reciprocity is used. If I punish you because you have defected with others, indirect reciprocity is at work. The concept of costly punishment suggests that the basic game should be extended from two possible behaviors (C and D) to three (C, D and P). Here we investigate the consequences of this extension for the repeated Prisoner’s Dilemma.
104 subjects participated in repeated Prisoner’s Dilemma experiments at the Harvard Business School Computer Lab for Experimental Research. Participants interacted anonymously in pair-wise encounters via computer screens. Subjects did not know how long each interaction would last, but knew that the probability of another round was 0.75 (as in Ref. 25
). In any given round, the subjects chose simultaneously between all available options, which were presented in a neutral language. After each round, the subjects were shown the other person’s choice as well as both payoff scores. At the end of the interaction, the participants were presented with the final scores and then randomly re-matched for another interaction.
We have performed two control experiments (C1 and C2) and two treatments (T1 and T2). In the control experiments, people played a standard repeated Prisoner’s Dilemma. In each round they could either cooperate or defect. Cooperation meant paying 1 unit for the other person to receive 2 units (in C1 and T1) or 3 units (in C2 and T2). Defection meant gaining 1 unit at a cost of 1 for the other person. In the treatments, people had three options in every round: cooperate, defect or punish. Punishment meant paying 1 unit for the other person to lose 4. We used a 4:1 punishment technology because it has been shown to be more effective in promoting cooperation than other ratios13
. The resulting payoff matrices are shown in . See Supplementary Information for more details.
shows some examples of games that occurred in the treatments T1 and T2. A number of games were all-out cooperation. Sometimes cooperation could be maintained by forgiving an opponent’s defection. At other times, defection in response to defection was able to restore cooperation. Typically, costly punishment did not re-establish cooperation. In some cases, costly punishment provoked counter-punishment, thereby assuring mutual destruction. Giving people the option of costly punishment can also lead to unprovoked first strikes with disastrous consequences.
Comparing the two control experiments, C1 and C2, we find that the frequency of cooperation increases as the benefit to cost ratio increases. In C1, 21.2% of decisions are cooperation, compared to 43.0% in C2. For both parameter choices, cooperation is a subgame perfect equilibrium. Comparing each control experiment with its corresponding treatment, we find that punishment increases the frequency of cooperation. In T1 and T2, 52.4% and 59.7% of all decisions are cooperation.
Punishment, however, does not increase the average payoff. In T1 and T2, we observe that 7.6% and 5.8% of decisions are punishment, P. We find no significant difference in the average payoff when comparing C1 with T1 and C2 with T2. Therefore, punishment has no benefit for the group, which makes it hard to argue that punishment might have evolved by group selection22
Examining the data of experiments T1 and T2 on the individual level, we find no correlation between the use of cooperation or defection and payoff, but a strong negative correlation between the use of punishment and payoff (). In experiment T1, the five top ranked players, who earned the highest total payoff, have never used costly punishment. In both experiments, the players who end up with the lowest payoff tend to punish most often. Hence, for maximizing the overall income it is best never to punish: winners don’t punish ().
It could be the case that the winners of our experiment were merely lucky in that they were paired with people against whom punishment was not necessary. In order to test this hypothesis, we analyzed the correlation between payoff and the first order conditional strategies used by people. illustrates a strong negative correlation between payoff and the probability to use punishment, P, after the opponent has defected, D. Winners tend to respond by using D against D, while losers use P against D. The response to another person’s defection is the only strategic feature which is clearly correlated with winning or losing the game. Winners play a tit-for-tat like strategy2,4
, while losers use costly punishment.
Tit-for-tat prevails over costly punishment
It could be that using costly punishment becomes more beneficial as the game progresses. In order to test this possibility, we have separately analyzed the data from the last ¼ of all interactions. Again, it remains true that there is a strong negative correlation between an individual’s payoff and his use of costly punishment.
In previous experiments, punishment was usually offered as a separate option following one or several rounds of a public goods game. The public goods game is a multi-person Prisoner’s Dilemma, where each player can invest a certain sum into a common pool, which is then multiplied by a factor and equally divided among all players irrespective of whether they have invested or not26
. After the public goods game, people are asked if they want to pay money for others to lose money. People are willing to use this option in order to punish those who have invested nothing or only very little, and the presence of this option has been found to increase contributions8,10
Careful analysis, however, has revealed that in most cases, punishment does not increase the average payoff. In some experiments, punishment reduces the average payoff 9,10,12,27
, while in others it does not lead to a significant change11,14,15
. Only once has punishment been found to increase the average payoff 13
. The higher frequency of cooperation is usually offset by the cost of punishment, which affects both the punisher and the punished. Our findings are in agreement with this observation: the option of costly punishment does not increase the average payoff of the group. It is possible, however, that in longer experiments and for particular parameter values punishment might increase the average payoff.
It is sometimes argued that costly punishment is a mechanism for stabilizing cooperation in anonymous, one-shot games. But whether or not this is the case seems to be of little importance, because most of our interactions occur in a context where repetition is possible and reputation matters. For millions of years of human evolution, our ancestors have lived in relatively small groups where people knew each other. Interactions in such groups are certainly repeated and open ended. Thus, our strategic instincts have been evolving in situations where it is likely that others either directly observe my actions or eventually find out about them. Also in modern life, most of our interactions occur with people whom we meet frequently. Typically, we can never rule out ‘subsequent rounds’. Therefore, if costly punishment is important for the evolution of human cooperation, then it must play a beneficial role in the setting of repeated games. Our findings do not support this claim.
We also believe that our current design has some additional advantages over previous ones. In our setting, costly punishment is always one of three options. Hence, there is an opportunity cost for using punishment, because the subject forfeits the opportunity to cooperate or to defect. Our design also minimizes the experimenter and participant demand effects28
, because there are always several options 27
. In many previous experiments retaliation for punishment is not possible 9-16,27
, but it is a natural feature of our setting.
In summary, our data show that costly punishment strongly disfavors the individual who uses it and hence it is opposed by individual selection in cooperation games where direct reciprocity is possible. We conclude that costly punishment might have evolved for reasons other than promoting cooperation, such as coercing individuals into submission and establishing dominance hierarchies20,29
. Punishment might enable a group to exert control over individual behavior. A stronger individual could use punishment to dominate weaker ones. People engage in conflicts and know that conflicts can carry costs. Costly punishment serves to escalate conflicts, not to moderate them. Costly punishment might force people to submit, but not to cooperate. It could be that costly punishment is beneficial in these other games, but the use of costly punishment in games of cooperation appears to be maladaptive. We have shown that in the framework of direct reciprocity, winners do not use costly punishment, while losers punish and perish.