Herrnstein, R. J. (1990). Rational choice theory: Necessary but not sufficient. American Psychologist, 45(3), 356.
- The theory of rational choice is normatively useful, but fundamentally insufficient as an account of behavior
- Rational choice theory holds that organisms strive to maximize total utility (behaviorally, this is reinforcement)
- Utility cannot be observed directly but must be inferred by observing choice behavior
- Rational Choice Theory provides a rule for inferring utility: Utility maximization is simply what organisms are doing when they behave, subject to certain constraints
- Most disciplines dealing with behavior rely on the idea that humans and other organisms maximize utility according to the axioms of the rational choice theory
- Rational choice theory evolved to also try to explain irrational behavior not guided by self-interest. This is possible because subjective utility differs from objective value. As a result, maximizing subjective utility may lead to irrational behaviors, such as overeating, alcohol and drug abuse, as well as overspending, which leads to undesirable consequences like obesity, addiction and debt. In this context, rationality is revealed preference
- Like utility, rational choice theory also posits that the probabilities by which value is discounted by uncertainty is also subjective. Hence people worry and overpay to avoid low-probability events, but ignore high probability events
- Subjectivity of utility is motivational, while that of probability is cognitive
- Why rational choice theory continues to survive:
- It aligns with common sense in simple settings. For instance, FI-5 is better than FI-10 every time1
- The axiomatic formalization of the theory are elegant and this has a great appeal to theories
- If discounting is rational, the rate should be fixed per unit time
- According to the matching law, behavior is distributed across alternatives so as to equalize the reinforcements per unit of behavior invested in each alternative. That is, the proportion of behavior allocated to each alternative tends to match the proportion of reinforcement received from that alternative
- Experiment described in Herrnstein & Prelec (1989):
- Subject presented with concurrent schedules of reinforcement2 (a few cents whenever response key was depressed after the trial light was illuminated)
- Each trial separated by intertrial interval (t + C)
- Intertrial interval for Key-1 (A) was 2 seconds shorter than that following the choice of the other Key-2 (B). So, delay for A = t – 2 + C; delay for B = t + C; because the intertrial interval for either choice was a linear function of the proportion of A chosen in the preceding 10 trials. So, if A was chosen continually (impulsive choice), delay to both A & B would both be increased. However, if B was chosen consistently, delay to A & B would remain the same!
- Optimal “rational” strategy = choose B all the time. Most people did not do this. In fact, some subjects exclusively chose A!
- Subjects know their choices are influencing intertrial interval, but do not know what to make of that information.
- Organisms allocate more behavior to alternatives that provide higher rates of reinforcement. This is referred to as melioration
- Although melioration is commonsensical; however, it does not maximize reinforcement and it leads to an equilibrium dictated by the matching law
- Melioration suggests that choice is driven by a comparison of the average returns from the alternatives.
- Equilibrium occurs when one alternative has displaced the others (then choice will be at the extremities of the graph) or the alternatives in the choice set are providing equal returns per unit consumption (choice will be in the middle of the graph)
- Because of melioration, organisms tend to disregard the overall returns (global utility) and only focus on the current average returns (local utility) from the alternatives.
- Melioration explains suboptimal behavior, especially in cases of distributed choices where organisms do not make a once-and-for-all decision about alternatives, but rather, repeated choices are made over a period of time.
- No single choice is responsible for obesity, alcoholism, spendthriftness, etc.
- Graph above:
- Allocation to VI: proportion on alternative that needs to be sampled only occasionally (Impulsive choice)3
- Reinforcement: Rate of reinforcement from impulsive choice while the subject is choosing it. The less time spent on this alternative, the higher the rate of reinforcement when it is eventually sampled. This models a source of reinforcement that gets depleted when it is sampled and restores itself when unsampled, or a motivational state that fluctuates with deprivation and satiation.
- VR4 linear curve – Reinforcement only occurs when the alternative is sampled. There is a fixed rate of return per unit time invested on it.
- When behavior allocation to VI is low, the rate of return is higher than VR5. Due to melioration, the subject allocates even more behavior to VI. However, doing this causes the rate of return to fall below VR. As a result, melioration causes the subject to stop allocating behavior to VI. Within both extremes is the equilibrium point where both alternatives provide equal rates of return per investment
- To maximize, the subject has to find the highest point on the “joint” curve. That is, the subject would have to resist the temptation to allocate more behavior to VI. In practice, however, most organisms fail to resist this temptation.
- Rational choice theory describes distributed choice only in situations where the distributed nature of the choice is immaterial (i.e., returns do not depend on frequency of sampling)
- Rational choice theory can only provide guidance on how choice behavior should be allocated (normative), rather than how it is allocated (positive)
- We may need rational choice theory only because we often act suboptimally
FOOTNOTES:
- In layman’s terms, a fixed interval (FI) schedule implies that a decision maker will get access to a choice option after the passage of some fixed unit of time (e.g., seconds, minutes, hours, etc.). All things being equal, waiting for a fixed interval of 5 minutes before accessing your choice is obviously better than waiting for a fixed interval of 10 minutes ↩︎
- Schedules of reinforcement refer to the rules governing access to a particular choice option. When they are concurrent, it implies that there are at least two different rules, in operation at the same time, governing access to the available choice options. ↩︎
- In a variable interval (VI) schedule, the decision maker will get access to a choice option after the passage of some variable unit of time. From the perspective of the decision maker, there is a lot of uncertainty in estimating when that access will be granted. This implies that continually expending effort towards an option governed by a VI schedule is likely going to be an exercise in futility since the passage of time, not effort determines access. ↩︎
- In a variable ratio (VR) schedule, the decision maker will getter access to a choice after some variable amount of effort. Here, what determines access is the expenditure of effort, rather than the mere passage of time. Therefore, the more effort applied towards accessing a choice option, the greater the likelihood of getting access to it ↩︎
- Because the choice under the VI schedule is solely dependent on the passage of time, the decision maker is better served by expending little effort on this option. ↩︎