Yale School of Management

The Authors Respond to Critiques of “The Number of Undocumented Immigrants in the United States: Estimates Based on Demographic Modeling with Data from 1990 to 2016”

In new research published in PLOS ONE, Yale’s Edward  H. Kaplan and Jonathan S. Feinstein and MIT’s Mohammad M. Fazel-Zarandi find that the number of undocumented immigrants in the United States is substantially higher than previous estimates. Below, the authors respond to critiques of their research published in a Formal Comment in PLOS ONE.

Read the original study: “The Number of Undocumented Immigrants in the United States: Estimates Based on Demographic Modeling with Data from 1990 to 2016

Read a Yale Insights article about the research and watch the authors discuss their findings.


Critique: The undercount implied by the new model is implausibly high.

The residual method, based on responses to surveys by undocumented immigrants, requires both locating undocumented foreign-born immigrants (who do not wish to be found), and having undocumented foreign-born immigrants truthfully report they are foreign-born (who clearly have an incentive to misreport given the consequences of revealing their undocumented status). The commenters claim survey underreporting on the order of 10%. But that number, taken from an unpublished conference presentation with no available written record online, was based on asking known undocumented Mexican immigrants if they answered the census questionnaire, and only 10% of respondents said they did not answer the questions (note: this says nothing about the chance of reaching undocumented immigrants). One cannot estimate the undercount rate of the residual method from only looking at survey results. Given the double incentives undocumented immigrants have (to not be found, and to not report that they are foreign born), it is indeed plausible that the census could be missing many more undocumented immigrants than the commenters believe. For a humorous take on this point, see the first 70 seconds of  this video.

Critique: The range of model-estimated populations is too large to be useful for policy purposes, while the residual method gives a much smaller range of uncertainty. 

Indeed there is a wide range in the model estimates, reflecting the uncertainty underlying the model inputs! But, given this wide range, what is notable is that the standard estimate of 11 million undocumented immigrants in the United States is completely excluded—see the figure below. Over one million simulations, the conventional estimate falls below the smallest model trajectory, demonstrating the implausibility of the conventional estimate. The survey-based residual method suffers from selection bias. The small variability stems from the sampling variation that accompanies large samples, but given the selection bias, the result is that the residual method is giving a precise estimate of the wrong quantity. The residual method is estimating the number of undocumented immigrants who are both located by survey-takers and answering truthfully that they are born outside of the United States. By contrast, the model is giving an approximate estimate of the right quantity: the total number of undocumented immigrants in the United States. Compare to the Literary Digest poll, where in the 1936 election between Alf Landon and Franklin Roosevelt, the Literary Digest mailed surveys to its 10 million readers, 2.4 million responses were received, and a very precise estimate was made predicting that Landon would win 57% of the vote. Of course, Roosevelt won the election with 62% and the Literary Digest went out of business. The residual method is like the Literary Digest poll—it is a biased sample. 

Chart: undocumented immigrant population frequency

Critique: The voluntary emigration rates of undocumented immigrants employed in the model are too low.

The voluntary emigration rates of undocumented immigrants the commenters suggest in their critique, deduced from the Mexican Migration Project (MMP) and reported for the first time in their comment, imply extremely short sojourn times in the United States—an average of just over four years—and also imply that only 8% of border crossers would remain in the country more than 10 years. These numbers stand in contrast to all other migration estimates in the literature. Furthermore, prior published research that the commenters believe produces the “correct” estimate of the number of undocumented immigrants used emigration rates nearly identical to those we employed—for example, see page 306 of this paper by Robert Warren and John Robert Warren. Were the commenters to apply their newly developed MMP rates to the analysis in the Warren and Warren paper, the resulting undocumented population estimate would no longer yield the “right number,” missing by many millions. However, it is also important to point out that the newly reported data on which the commenters base their analysis are flawed. The MMP data derive from a highly selected and thus skewed sample. Nearly everyone in the survey was in Mexico at the time of sampling (26,056 households, compared to only 1,057 surveyed in the United States), so while those sampled in Mexico may have been in the U.S. earlier, they had returned to Mexico by the time of the sample. Clearly these individuals, having returned to Mexico, de facto spent less time in the United States than those Mexican migrants in the U.S. who have not yet left. Thus, those sampled in the MMP have shorter stays in the U.S. as reflected by higher emigration rates. Indeed, individuals who have chosen to remain in the U.S. are not reflected well in these data. The very small part of the dataset that does survey individuals in the U.S. is restricted to the Los Angeles area, which is relatively close to the border. With a more established base that may make “circular flows” (repeated unauthorized border crossings and returns) easier, this sample is not representative of the undocumented immigrant population in the further interior of the U.S. (we note that the Mexican sample is also close to the border, and thus also will tend to capture individuals more likely to pass back and forth between the two countries). We based our analysis of voluntary emigration on published research, and in particular research that the commenters themselves believe produces the “correct” estimate of the number of undocumented immigrants. This research used emigration rates nearly identical to (but lower than) those we employed—again, see page 306 of this paper by Robert Warren and John Robert Warren. We also note that the commenters are “cherry-picking” the MMP data, since the border apprehension rates in the MMP are quite low. Rather than using these lower rates, which would produce a higher estimate of the undocumented population, the commenters choose to accept our higher border apprehension rates, which are higher and quite conservative. In general, one would expect more circular migration when apprehension rates are low, so the comment authors’ use of our high border apprehension rates coupled with the MMP-based high voluntary emigration rates is contradictory.