The Effects of the Change in the NRMP Matching Algorithm

By Alvin E. Roth, Ph.D., University of Pittsburgh, and

Elliott Peranson, M.A.Sc., National Matching Services Inc.

Revised, June 5, 1997

Acknowledgements: The work discussed in Section VII is not part of the study commissioned by the NRMP, but rather is part of the academic investigation of theoretical issues motivated by the study; computational support for that investigation was provided by Aljosa Feldin at the University of Pittsburgh.

Corresponding author: Alvin E. Roth, Dept. of Economics, University of Pittsburgh, Pittsburgh, PA 15260. email: alroth+@pitt.edu Fax: (412) 648-1793, Web: http://www.pitt.edu/~alroth/alroth.html

Context: Following two years of heated controversy about the resident match, the National Resident Matching Program (NRMP) recently voted to replace the existing matching algorithm with a newly designed applicant-proposing algorithm.

Objective: To design an applicant-proposing algorithm for the match and compare it with the existing NRMP algorithm to determine how many applicants and residency programs could be expected to receive more or less preferred matches from the two algorithms, how the different algorithms influence the opportunity for strategic behavior, and what kinds of advice can be given to participants.

Design: Computational experiments compared the newly designed applicant-proposing algorithm with the existing NRMP algorithm on the Rank Order Lists submitted by all applicants and residency programs in the 1993-1996 and 1987 NRMP matches.

Results: Differences in the matchings produced by the two algorithms are small: fewer than one in a thousand applicants would have received a different match. Most (but not all) of the few applicants who are matched to different positions by the two algorithms do better when the applicant-proposing algorithm is used; and the opposite is true for programs. Opportunities for profitable strategic behavior are very rare for both applicants and programs under either algorithm, and the differences between the two algorithms are small. With either algorithm, both applicants and programs can be advised that trying to get a preferred match by behaving strategically is far more likely to harm than to help them.

Conclusions: The existing NRMP algorithm and the newly designed applicant-proposing algorithm perform similarly. Both algorithms make it sensible for applicants and residency programs to arrange their Rank Order Lists based solely on their preferences for possible matches. The choice of algorithms will systematically effect the matches of only a small group of applicants (less than one tenth of one percent). The NRMP's recent decision to use the applicant proposing algorithm starting in 1998 reflects a judgment about the impact of this difference on applicants and programs.

Key words: resident match, NRMP




In May of 1997 the board of directors of the National Resident Matching Program (NRMP) voted to replace the existing matching algorithm with a newly designed, applicant proposing algorithm, starting in 1998. This resolved two years of heated and sometimes acrimonious controversy about the match, beginning most visibly with an exchange in Academic Medicine in 1995.1-4 In reaction to this exchange, groups such as the American Medical Student Association, Public Citizen's Health Research Group, and the Medical Student Section of the AMA advocated that the matching algorithm be changed and/or that the description of the match be changed to give applicants more accurate advice about how to participate.5- 6 At issue was how much the former (pre-1998) matching algorithm is biased towards residency programs at the expense of applicants, whether it would be feasible to replace that algorithm with one more favorable to applicants, how much this would change the outcome of the match, and what advice can be given to applicants about how to fill out their Rank Order Lists. The NRMP asked the authors of the present paper to study these questions.

This paper addresses the above questions by comparing the former NRMP algorithm with the newly designed applicant proposing algorithm capable of handling the NRMP's special features. These special features lie at the heart of the controversy, because they make the NRMP more complex than the simpler "two-sided matching markets" primarily discussed in the game theory literature, to which all parties in the controversy have referred.7 Critics of the NRMP interpreted this literature as showing that the existing algorithm was biased and could easily be improved, and that the advice given by the NRMP to applicants was flawed. The most impassioned of the criticisms went on to suggest that there was an element of perfidy in the reluctance of the NRMP to make the changes demanded, and in the history of how the match had been constructed and advertised to past generations of participants. In contrast, defenders of the NRMP were inclined to view the matching algorithm as having evolved from the traditional recruitment process in which programs offer positions to applicants, and adapted to changes in the medical market, picking up special features as required.

The NRMP match is complicated by four special features that create linkages between applicants or programs that do not exist in simpler matching markets. These are: couples seeking two related positions; applicants who simultaneously seek a Program Year 2 (PGY-2) position and a complementary program year 1 (PGY- 1) position from supplemental Rank Order Lists; residency programs whose positions revert to other programs if unfilled; and programs that request an even or odd number of matches. Because of these "match variations," the theory for simpler markets provides only an approximate guide to the NRMP.

Computational experiments were therefore conducted on the Rank Order Lists submitted by all applicants and residency programs in the matches from 1993-1996 and 1987. The recent matches reflect contemporary preferences; 1987, which had the lowest rate of unmatched US seniors in the available dataset (6.0%, as opposed to the historically high rate of 7.5% for 1996), provides a comparison over a longer period.

BACKGROUND

Stable matchings in simple and complex markets

Centralized matching algorithms are mostly unsuccessful unless they produce "stable" matchings.8, 9, 10 Briefly, a matching between residents and residency programs is stable if there are no applicant-program pairs such that the applicant prefers the program to his/her current match, and the program also prefers the applicant to one of its current matches.11 The former NRMP algorithm produces stable matches.8, 10 So, too, do the algorithms used in some regions of the National Health Service in the U.K.; other regions tried algorithms that produced unstable matchings, and these mostly failed and were abandoned.9, 12, 13 Both the present study and the controversy that preceded it focus on algorithms that produce stable matchings.

For simple markets without the NRMP match variations, we know the following about stable matchings and algorithms.7

1. At least one stable matching can always be found, no matter what Rank Order Lists are submitted.

2. The set of stable matchings always contains a "program optimal" stable matching, and an "applicant optimal" stable matching.

3. The program optimal stable matching is produced by a "program proposing" algorithm, in which residency programs make offers to applicants, starting from the top of each program's rank order list, and allowing applicants to hold at any point in the algorithm the most preferred offer among those so far received. The applicant optimal stable matching is produced by an analogous "applicant proposing" algorithm.

4. The same applicants are unmatched and the same positions are unfilled at every stable matching.

5. When the applicant-proposing algorithm is used, no applicant can possibly improve his match by submitting a Rank Order List (ROL) different from his true preferences. No parallel assertion can be made about residency programs offering more than one position.

When the NRMP match variations are present, none of these five assertions are correct in all cases. Markets with match variations can be constructed for which: no stable matchings exist8; no optimal stable matchings for either side exist (even when stable matchings exist14) and therefore; no algorithm always produces an "optimal" stable matching for either side of the market. Furthermore, different stable matchings may have different applicants unmatched and positions unfilled, and no algorithm always guarantees applicants that stating their true preferences is an optimal strategy.

A major focus of this study was to assess the extent to which these possibilities occur in the NRMP matches. While a stable matching has been found in every NRMP match at least since the mid 1970s, it appears that no stable matching is precisely program- or applicant-optimal in any of the years examined. However we will see that applicant- and program-proposing algorithms continue to perform approximately as in simple markets.

The former NRMP algorithm

The NRMP algorithm used through the 1997 match handled match variations through a three-phase process. The first phase produces an initial match by ignoring most match variations, using a program-proposing algorithm, modified to let couples temporarily hold many offers. This match will generally be unstable because of the way couples are handled, and because other match variations are ignored; consequently the second phase of the algorithm identifies instabilities. The third phase fixes these instabilities to produce a stable match. The former NRMP algorithm is a hybrid: it is program-proposing in its first phase (which performs the bulk of the matching), and applicant- proposing in some parts of its third phase.15

DESIGN OF THE NEW APPLICANT-PROPOSING ALGORITHM

A conceptual design for the applicant-proposing algorithm, based on the algorithm outlined by Roth and Vande Vate16 and on components of the existing algorithm, was formulated and circulated for comment.15 To code a working algorithm, choices had to be made concerning the sequencing of operations. Each such decision can be shown to have no effect on the outcome of simple matches, but could effect the outcome when match variations are present (because optimal stable matches may not exist). Computational experiments to test the effects of sequencing were conducted using data from three NRMP matches: 1993, 1994, and 1995. The effects of sequencing on the match produced were found to be both unsystematic and tiny; the modal number of applicants affected when sequencing effects were found was 2. Based on this result, sequencing decisions were made that resulted in the most direct convergence to a stable matching. A full account of the design process and the results of the sequencing experiments is available elsewhere.17

DESCRIPTIVE STATISTICS AND MATCH RESULTS

Table 1 describes the NRMP matches in the years studied. About 4% of the more than 20,000 applicants each year participate as couples, and 8% to 12% submit supplemental ROLs. In the 1990's about 7% of the three to four thousand programs that participate have positions that revert to other programs if they remain unfilled. Match variations are clearly a substantial part of the match.


Table 1: Descriptive Statistics and Original NRMP Match Results1
1987199319941995 1996
APPLICANTS
Participating (Active, ROL Returned)
Primary ROL's20071209162235322937 24749
Applicants with Supplemental ROL's1572251523122098 2436
Results
Primary Matches16117172091772518170 18316
Supplemental PGY-1 Matches57712941152990 725
Couples
Applicants Who Are Coupled694854892998 1008
Coupled Applicants Who Matched646794817899 912
PROGRAMS
Active Programs3225367737153800 3830
Active Programs with ROL Returned3170362236223745 3758
Potential Reversions of Unfilled Positions
Programs Specifying Reversion69247276285 282
Positions to be Reverted if Unfilled225132914671291 1272
Programs Requesting Even/Odd Matching42678
Quotas2
Total Quota Before Match19973227372280122806 22578
Changes During Match Processing
Quota Increases: Programs22120143124 130
Quota Increases: Positions45357357327 336
Quota Decreases: Programs23127142128 138
Quota Decreases: Positions46338338303 326
Total Quota After Match (Final Quota) 19972227562282022830 22588
Results
Positions Filled16694185031887719160 19041
Positions Unfilled3278425339433670 3547
Programs Filled2100230924402599 2589

1ROL indicated rank order list; and PGY-1, program year 1.

2Quotas include positions in active programs with no ROL returned. Changes during the match are caused primarily by reversions. In some cases, 1 position is reverted simultaneously to 2 programs, causing a net increase in the number of positions offered. In addition, a few positions may be dropped from the match during processing to accommodate requests for even/odd matching.


DIFFERENCES IN THE MATCHES PRODUCED BY THE TWO ALGORITHMS

Table 2 compares the matches produced by the former NRMP algorithm and the new applicant-proposing algorithm. Less than 0.1% of applicants are affected by the change in algorithms, and of these most (but not all) prefer the applicant proposing algorithm.

Equally few programs are affected by the change of algorithms- -and these constitute about 0.5% of all programs. Most but not all these programs prefer the former NRMP algorithm, but in 1994 and 1996 slightly more programs would even have preferred the applicant-proposing algorithm to the former algorithm. In most programs that receive a different match only one applicant differs between the matches produced by the two algorithms. The majority of differences involve filling a position with a different applicant; very few positions go from filled to unfilled or vice versa.

To see how match variations cause some applicants to do worse and some programs to do better with the applicant-proposing algorithm, suppose that switching to the applicant-proposing algorithm causes applicant A to improve his match from his second to first choice. It may be that his first choice now requires a supplemental PGY-1 match. If this new supplemental match displaces another applicant less preferred by the PGY-1 program, that applicant is forced to go farther down his/her list (i.e. does worse), while that PGY-1 program does better.



Table 2: Comparison of Results Between Original NRMP Algorithm and Applicant-Proposing Algorithm
1987199319941995 1996
Applicants
Number of Applicants Affected2016201421
Applicant Proposing Result Preferred1216111412
Current NRMP Result Preferred80909
U.S. Applicants Affected1179171218
Independent Applicants Affected37323
Difference in Result by Rank Number2
1 rank12111388
2 ranks31426
3 ranks23223
More than 3 ranks (maximum)2 (9)1 (4)1 (5)2 (6)3 (6)
New Matched00001
New Unmatched10000
Programs
Number of Programs Affected2015231519
Applicant Proposing Result Preferred8012110
Former NRMP Result Preferred121511149
Difference in Result by Rank Number
5 or fewer ranks53963
6 to 10 ranks53353
11 to 15 ranks05131
More than 15 ranks9 (178)4 (36)6 (31)011 (191)
Programs with New Position(s) Filled00211
Programs with New Unfilled Position(s)10200

1U.S. applicants register for the match through a U.S. medical school; for the most part, these are current seniors in a U.S. medical school. Independent applicants do not register for the match through a U.S. medical school; for the most part, these are graduates of foreign, Canadian or osteopathic medical schools.

2 For example, if an applicant matched at the first choice with one algorithm and the third choice with the other algorithm, the difference in the result by rank number would be 2 ranks.

DIFFERENCES IN SENSITIVITY TO PARTICIPANT BEHAVIOR

The above comparisons involve ROLs submitted for matches made using the former NRMP algorithm. A comparison of the two algorithms also requires consideration of whether participants might have reason to submit different ROLs if the new algorithm were used instead. For this purpose, we considered how many participants could have favorably influenced the match, under either algorithm, by submitting different ROLs.

For matching markets without match variations, there are strategic opportunities for applicants when the program proposing-algorithm is used.7 And every applicant who can do better than to submit his true preferences as his ROL can do so by submitting a truncation of his true preferences. Furthermore, the only applicants who can do better than to submit their true preferences are those who would have received a different match from the applicant-proposing algorithm. This identifies an upper bound on the number of applicants who could possibly profit from strategic behavior of this sort. Note that even an applicant who could possibly profit from submitting a strategically chosen ROL would not generally have the information needed either to know this or to identify the optimal ROL to submit. If the applicant submitted a truncation that was one program too short he or she would become unmatched.

The case of residency programs that offer more than one position is not so simple, even in simple matches. Programs may, in theory, profit both from truncating their ROLs and from reducing the number of positions they submit to the match, either by making early arrangements with some applicants or by withholding positions to be filled by unmatched applicants after the match.18, 19

To see how well the theory for simple markets approximates the NRMP, and to assess the size of the effects, again requires computational experiments. In view of the very small differences observed in the outcomes of the two algorithms, it is reasonable to conjecture that, even in the more complex NRMP market, the opportunities for profitable strategic behavior will be vanishingly small under either algorithm. The computational experiments that permitted this conjecture to be verified are described in detail elsewhere.17 These experiments produced upper bounds on the numbers of applicants and programs for whom strategic behavior could even be potentially profitable.

The truncation experiments with applicants' ROLs yield the following upper bounds.

Table 3: Upper Limit of the Number of Applicants Who Could Benefit by Truncating Their Lists at One Above Their Original Match Point:
1987199319941995 1996
Former NRMP Algorithm12221316 11
Applicant-Proposing Algorithm00229

The number of applicants who could even potentially benefit from truncating their ROLs under the former NRMP algorithm in each year is almost exactly equal to the number of applicants who received a preferred match under the applicant proposing algorithm (line 2 of Table 2).

Similar results are obtained by computational experiments with truncations of programs' ROLs, and by reductions in the number of positions they offer through the match.17 The number of participants who can even potentially profit from these kinds of strategic behavior is vastly smaller than those who have no such prospects. For both applicants and programs, attempts to alter match results by strategic behavior in stating ROLs or capacities is far more likely to hurt than to help.

WHY THE DIFFERENCES ARE SMALL: INSIGHTS FROM SIMPLE MARKETS

The small differences between the two algorithms suggests that, in each of the years studied, the set of stable matchings is small, as measured by the number of participants who receive different matches at different stable matchings.7 Computational experiments on hypothetical markets without match variations can help explain why this is so. For this purpose, consider simple markets with n positions and n applicants, as n varies from the size of small specialty markets and submarkets to the size of the full NRMP match.

When preferences are highly correlated--when similar programs tend to agree which are the most desirable applicants, and applicants tend to agree which are the most desirable programs-- the set of stable matchings is small. (For example, if all programs had the same ranking of applicants, and all applicants had the same ranking of programs, then the only stable matching would be the one that filled the highest ranked program with the highest ranked applicants, the second program with the next highest group of applicants, and so on.) As correlation decreases, the set of stable matchings grows, and more participants would be matched differently by the two algorithms, independently of the size of the market.

But the size of the market also plays a critical role. Consider the case of uncorrelated preferences (so the set of stable matchings is large). If every applicant could somehow interview for all positions, and rank order them for the match, the percentage of applicants who could get different stable matchings would grow as the number of applicants and positions grew. But in a real market there is a limit to how many interviews an applicant can go on, or a program can conduct. Taking this into account, computational experiments show that the set of stable matchings becomes very small as the market becomes large.

Specifically, let k equal the number of positions a candidate can interview for, and put on a Rank Order List, and let n equal the number of applicants and positions. Computational experiments on randomly generated simple markets show that, even when preferences are uncorrelated, if k is held fixed at realistic levels then the set of stable matchings becomes small as n becomes large. For example, if k=15 (not an unreasonable approximation for the NRMP) and n > 10,000, fewer than 0.1% of applicants would receive a different match from the applicant- proposing and program-proposing algorithms. Even with uncorrelated preferences, this simple market exhibits the same one-in-a-thousand order of magnitude seen in the NRMP. For simple markets the size of small specialty matches or submarkets with n of about 100 positions, if applicants rank order no more than 10 programs only about 2% of applicants receive different matches from the two algorithms. Especially since preferences are not uncorrelated in the medical matches, the orders of magnitude of the effects seen in the actual matches are comparable to simple matches with similar k and n.

This is important for the present study because, in this theoretical exercise, we can look at agents' true preferences, whereas in the study of real matches we use as data the submitted ROLs. A skeptical interpretation of our results could be that perhaps large opportunities for strategic manipulation exist, but these aren't detectable in the ROLs submitted to the match, because participants have already behaved strategically in an optimal way. If this hypothesis were correct, we should also find large differences when we look at the size of the set of stable matchings in these hypothetical markets. Instead, we see that simple markets provide an explanation of not only the direction of the effects we have been examining, but also their small size.

CONCLUDING REMARKS

There is now a well tested applicant-proposing algorithm capable of handling the NRMP match variations (more than 500 matches on ROL data were conducted during this study). Under either algorithm, both applicants and programs can be confidently advised that, when deciding what ROLs to submit, they are extraordinarily unlikely to be able to influence the match in their favor by submitting a list different from their true preferences. This result is supported both by the computational studies conducted on previous matches and by the new results for simple markets.

The choice between the two algorithms does not involve consequential differences for the match as a whole, except that a small group of applicants can be expected to do better under the applicant-proposing algorithm, and a small number of programs may do worse. The policy decision to adopt the newly designed applicant-proposing algorithm focused on this difference between the two algorithms and its impact on applicants and programs.

REFERENCES

1. Peranson, E. and Randlett, R.R. "The NRMP matching algorithm revisited: Theory versus practice," Academic Medicine, 1995, 70, 477-484.
2. Peranson, E. and Randlett, R.R. "Comments on Williams' `A Reexamination of the NRMP Matching Algorithm'," Academic Medicine, 1995, 70, 490-494.
3. Williams, K.J. "A Reexamination of the NRMP Matching Algorithm," Academic Medicine, 1995, 70, 470-476.
4. Williams, K.J. "Comments on Peranson and Randlett's `The NRMP Matching Algorithm Revisited: Theory versus Practice'," Academic Medicine, 1995, 70, 485-489.

5. American Medical Students Association and Public Citizen Health Research Group, "Report on Hospital Bias in the NRMP," http://pubweb.acns.nwu.edu/~alan/nrmp2.html, 1995.
6. AMA-MSS Resolutions for the 1995 Interim Meeting, http://www.bcm.tmc.edu/ama-mss/i95res.htm#11

7. Roth, A.E. and Sotomayor, M. Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis, Cambridge University Press, 1990.

8. Roth, A.E. "The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory," Journal of Political Economy, 1984, 92, 991-1016.

9. Roth, A.E. "New Physicians: A Natural Experiment in Market Organization," Science, 1990, 250, 1524-1528.

10. Roth, A.E. "The National Resident Matching Program as a Labor Market," Journal of the American Medical Association, 275, April 3, (Pulse), 1996, 1054-1056.

11. Gale, D. and Shapley, L. "College Admissions and the Stability of Marriage," American Mathematical Monthly, [1962],69, 9-15.

12. Roth, A. "A Natural Experiment in the Organization of Entry Level Labor Markets: Regional Markets for New Physicians and Surgeons in the U.K.," American Economic Review, 1991, 81, 415-440.

13. Roth, A and Xing, X. "Jumping the Gun: Imperfections and Institutions Related to the Timing of Market Transactions," American Economic Review, 1994, 84, 992- 1044.

14. Aldershof, B. and Carducci, O.M. "Stable Matchings with Couples," Discrete Applied Mathematics, 1996.

15. Roth, A. Interim Report #1: Evaluation of the current NRMP algorithm, and preliminary design of an applicant-processing algorithm. http://www.pitt.edu/~alroth/interim1.html, 1996.

16. Roth, A.E. and Vande Vate, J.H. "Random Paths to Stability in Two-Sided Matching," Econometrica, 1990, 58, 1475-1480.

17. Roth, A.E. "Report on the design and testing of an applicant proposing matching algorithm, and comparison with the existing NRMP algorithm," http://www.pitt.edu/~alroth/phase1.html, 1996.

18. Sonmez, T. "Manipulation via Capacities in Two-Sided Matching Markets," discussion paper, Department of Economics, University of Michigan, 1996.

19. Sonmez, T. "Can Pre-Arranged Matches be Avoided in Two-Sided Matching Markets?" discussion paper, Department of Economics, University of Michigan, 1996.