279 – Garbage in, garbage out?

As the developer of various decision tools, I’ve lost track of the number of times I’ve heard somebody say, in a grave, authoritative tone, “a model is only as good as the information you feed into it”. Or, more pithily, “garbage in, garbage out”. It’s a truism, of course, but the implications for decision makers may not be quite what you think.

The value of the information generated by a decision tool depends, of course, on the quality of input data used to drive the tool. Usually, the outputs from a decision tool are less valuable when there is poor-quality information about the inputs than when there is good information.

But what should we conclude from that? Does it mean, for example, that if you have poor quality input information you may just as well make decisions in a very simple ad hoc way and not worry about weighing up the decision options in a systematic way? (In other words, is it not worth using a decision tool?) And does it mean that it is more important to put effort into collecting better input data rather than improving the decision process?

No, these things do not follow from having poor input data. Here’s why.

Imagine a manager looking at 100 projects and trying to choose which 10 projects to give money to. Let’s compare a situation where input data quality is excellent with one where it is poor.

decision_aheadFrom simulating hundreds of thousands of decisions like this, I’ve found that systematic decision processes that are consistent with best-practice principles for decision making (see Pannell 2013) do a reasonable job of selecting the best projects even when there are random errors introduced to the input data. On the other hand, simple ad hoc decision processes that ignore the principles often result in very poor decisions, whether the input data is good, bad or indifferent.

Not every decision made using a sound decision process is correct, but overall, on average, they are markedly better than quick-and-dirty decisions. So “garbage in, garbage out” is misleading. If you look across a large number of decisions (which is what you should do), then a better description for a good decision tool could be “garbage in, not-too-bad out”. On the other hand, the most apt description for a poor decision process could be “treasure or garbage in, garbage out”.

An interesting question is, if you are using a good process, why don’t random errors in the input data make a bigger difference to the outcomes of the decisions? Here are some reasons.

Firstly, poorer quality input data only matters if it results in different decisions being made, such as a different set of 10 projects being selected. In practice, over a large number of decisions, the differences caused by input data uncertainty are not as large as you might expect. For example, in the project-selection problem, there are several reasons why data uncertainty may have only a modest impact on which projects are selected:

  • Uncertainty doesn’t mean that the input data for all projects is wildly inaccurate. Some are wildly inaccurate, but some, by chance, are only slightly inaccurate, and some are in between. The good projects with slightly inaccurate data still get selected.
  • Even if the data is moderately or highly inaccurate, it doesn’t necessarily mean that a good project will miss out on funding. Some good projects look worse than they should do as a result of the poor input data, but others are actually favoured by the data inaccuracies, so of course they still get selected. These data errors that reinforce the right decisions are not a problem.
  • Some projects are so outstanding that they still seem worth investing in even when the data used to analyse them is somewhat inaccurate.
  • When ranking projects, there are a number of different variables to consider (e.g. values, behaviour change, risks, etc.). There is likely to be uncertainty about all of these to some extent, but the errors won’t necessarily reinforce each other. In some cases, the estimate of one variable will be too high, while the estimate of another variable will be too low, such that the errors cancel out and the overall assessment of the project is about right.

So input data uncertainty means that some projects that should be selected miss out, but many good projects continue to be selected.

Even where there is a change in project selection, some of the projects that come in are only slightly less beneficial than the ones that go out. Not all, but some.

Putting all that together, inaccuracy in input data only changes the selection of projects for those projects that: happen to have the most highly inaccurate input data; are not favoured by the data inaccuracies; are not amongst the most outstanding projects anyway; and do not have multiple errors that cancel out. Further, the changes in project selection that do occur only matter for the subset of incoming projects that are much worse than the projects they displace. Many of the projects that are mistakenly selected due to poor input data are not all that much worse than the projects they displace. So input data uncertainty is often not such a serious problem for decision making as you might think. As long as the numbers we use are more-or-less reasonable, results from decision making can be pretty good.

To me, the most surprising outcome from my analysis of these issues was the answer to the second question: is it more important to put effort into collecting better input data rather than improving the decision process?

As I noted earlier, the answer seems to be “no”. For the project choice problem I described earlier, the “no” is a very strong one. In fact, I found that if you start with a poor quality decision process, inconsistent with the principles I’ve outlined in Pannell (2013), there is almost no benefit to be gained by improving the quality of input data. I’m sure there are many scientists who would feel extremely uncomfortable with that result, but it does make intuitive sense when you think about it. If a decision process is so poor that its results are only slightly related to the best possible decisions, then of course better information won’t help much.

Further reading

Pannell, D.J. and Gibson, F.L. (2014) Testing metrics to prioritise environmental projects, Australian Agricultural and Resource Economics Society Conference (58th), February 5-7, 2014, Port Macquarie, Australia. Full paper

Pannell, D.J. (2013). Ranking environmental projects, Working Paper 1312, School of Agricultural and Resource Economics, University of Western Australia. Full paper

278 – Global wealth inequality

The charity Oxfam recently released a remarkable report on international wealth inequality. Based on data and analysis published by the Swiss financial company Suisse Credit, they highlighted that the aggregate wealth of the world’s richest one percent of people is about the same as the aggregates wealth of the other 99 percent.

This made my head spin, so I wanted to see the graph of wealth distribution. Using the Oxfam/Suiss Credit data, I put together an approximation of the Lorenz Curve for the whole world (Figure 1). To create a Lorenz curve, you rank all the people, from poorest to richest, and plot the proportion of the world’s wealth that they own. The graph shows the proportion of the world’s wealth that is owned by the poorest X percent.

inequality

Figure 1. The percentage of the world’s wealth that is owned by the poorest X percent of the population.

 

The figure reinforces the remarkable extent of inequality indicated in the headline 1%:99% fact.

For example, it shows that the least-wealthy 70% of people own just a few percent of the world’s wealth between them.

90% of people have a bit more than 10% of the wealth.

The wealth of the bottom 30% is roughly zero. If you look closely, you can see that the line disappears below the axis for the bottom group of people, indicating that they have slightly negative wealth.

At the other extreme, the wealth of the very richest people is astounding. You can’t make this out on the graph, but the richest 80 people in the world – with individual wealth ranging from $13 billion to $76 billion in 2014 – have as much wealth between them as the bottom 50% of people on the planet. That’s 80 people versus 3,500,000,000 people.

However, you might be surprised to learn that the story of the richest 1 percent is not all about billionaires, or even millionaires. To make it into the richest 1 percent, you need wealth of about $800,000. There are 1.8 million such people in Australia. Those of us who live in Australia (or in any developed country) would come across top 1 percenters on a regular basis – they are all around us. They are mostly not people living a jet-set lifestyle. Within a developed-country context, most of them would not be considered especially rich.

That is even more true of the top 10 percent. The wealth you need to make it into that group is only $77,000. As one of my colleagues commented, this reveals that the problem is not “those rich bastards”. It’s us!

slumsThis is not to say that the poor are not improving their lot. In many developing countries, the average wealth of poor people, and especially middle-ranked people, has improved over time (see here). It’s just that the wealth of people who are already wealthy is growing more rapidly, not just absolutely but relatively.

Another surprising result is that there are quite a few people from developed countries at the bottom end of the distribution. These are mostly people who have assets, and actually have a pretty good standard of living, but they also have large debts that leave them with negative net wealth. The collapse of house prices in the US associated with the Global Financial Crisis created many such people. Remarkably, about 7% of Americans are in the bottom 10% for net wealth. Only India has more people in this poorest group! Of course, this reveals that net wealth is not the whole story. An American from the bottom 10% is likely to have a much higher standard of living and much greater opportunities for improvement than an Indian from the bottom 10%.

The difficult thing, of course, is the question of what should be done about all this inequality. Oxfam has some proposals, but others have argued that inequality per se is not a problem, as long as the lot of the poor is improving. To me it seems that extreme inequality is a concern in its own right, particularly within a country, but that it would be hard to support measures to dampen inequality if doing so would make poor people worse off. This is a  can of worms, of course.

Further reading

Bellù, L.G. and Liberati, P. (2005). Social Welfare Analysis of Income Distributions: Ranking Income Distributions with Lorenz Curves, IDEAS page.

Credit Suisse (2014). Global Wealth Data Book, online here.

Oxfam (2015). Wealth: Having It All and Wanting More, online here.

News reports: here, here, herehere