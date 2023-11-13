To simulate a world is almost a divine act. One Jewish tradition holds that there were multiple worlds created and destroyed before our own. They didn’t quite work out—beta versions perhaps?—and so the universe was restarted again and again, until we got to the current version.

But we will never simulate a world perfectly. First we must contend with the wrench of chaos theory—butterflies flapping and the inherent imprecision of measurement and all that—which has demonstrated that systems more complicated than a swinging pendulum can cause computational simulations to rapidly diverge over time despite small changes in the initial conditions. Start with a tiny rounding error, or a measurement mistake, and there’s no guarantee that a prediction will be anywhere close to where the system will actually end up.

But there is also something that might be called the Kitchen Sink Conundrum: as more and more detail—both in modeling features and data—is thrown into a simulation, there is no guarantee that it will get us closer to a good understanding of reality itself (see also kitchen sink regression).

Decades ago, in a RAND Corporation report from 1979 by David Leinweber, entitled “Models, Complexity, and Error,” this was clearly articulated. This report examined two types of error: error of measurement and error of specification.

Error of specification refers to how accurate the model is in accounting for the richness of the system being modeled. A more sophisticated model, with more operations on the input, will hopefully correspond better to the real world: it will be more accurate. So as the complexity of the model is increased, it will adhere better to reality and there will be a lower error of specification (though there may be diminishing returns).

On the other hand, there is also error of measurement. The more complex a model, the more likely that any measurement error will compound, and cause the outputs to be wildly inaccurate. As Leinweber notes, “As the models grow larger and more complex, the compounded error in the prediction increases.”

So one curve goes down with complexity and the other goes up. Therefore, the overall error has a minimum: there is an “optimal model complexity” and “[f]urther complicating the model buys nothing.”

In fact, recognizing this tradeoff goes even further back in history, as the chart in this RAND report is based on a journal article from 1968: “Predicting Best with Imperfect Data” by William Alonso.

This tradeoff between complexity and accuracy is a humble realization. Do not add complexity in the hopes of greater verisimilitude, as not only can there be a diminishing return to this effort, but it could even be entirely counter-productive. Computational and mathematical models are powerful, but they must be used carefully.

As Leinweber notes: “Weak data require simpler models.” Do not succumb to throwing the kitchen sink into a model as one’s default mode. ■

If you enjoy alternate history, you might enjoy Hite’s Law:

“All Change Points, from Xerxes to the last presidential election, create worlds with clean, efficient Zeppelin traffic. Changing history may produce Zeppelins as an inevitable by-product, much as bombarding uranium produces gamma rays. Often, the quickest way to tell if you are in an Alternate History is to look up, rather than at a newspaper or encyclopedia. From this premise, it is not outside the realm of Plausibility that our history between 1900 and 1936 was, in fact, an Alternate History. It would, at least, explain a lot.”

