Monthly Archives: September 2009

159 – The cost of inaccurate data

How much does it matter if you use inaccurate data to inform a decision? It might matter less than you think. Here’s an example.

Not very long ago, I had a discussion with someone involved in prioritising environmental projects. In his view, the quality of the data for variables considered in the prioritisation process is particularly important. If there is no good independent data for a variable, he argued, the variable should just be excluded from the metric used to select projects. (A metric is a formula used to score and rank each project.)

In PD#158 I showed that if you leave out relevant variables like this, the cost, in term of lost environmental benefits, can be huge. A better metric would deliver much greater environmental benefits – of the order of 100% greater – than one that omits key variables.

But that was assuming you have perfect data. What if the data you feed into those metrics is not accurate? Is it better to include a variable, even if the data for that variable is poor? Or is the cost of data inaccuracy so large that it’s better to leave the variable out.

I did an analysis simulating millions of prioritisation decisions for different situations. Each simulation involved selecting the best projects from a list of 100 randomly generated projects.

The analysis shows that, although bad data might cause you to make individual poor decisions, overall you are far better off including weak data than no data.

In the simulations, I looked at four levels of data inaccuracy: perfect accuracy about all five variables; small errors for all five variables; medium errors, and large errors. See Pannell (2009) for details of how I represented small, medium and large errors.

The results are as follows.

  • Small data errors hardly matter at all.
  • Medium sized errors matter a bit. If the budget is extremely tight, and you use a good metric for prioritisation, the cost of the errors is around 14% (compared to a 30-60% cost of using the wrong metric for prioritisation). If you use a poor metric (such as omitting a variable, or adding variables when you should multiply them – both very common errors), the extra cost of using inaccurate data is extremely low – around 1%.
  • Large data errors matter a moderate amount, with cost of up to 23% in my simulations, which is still far less than the cost of using a poor metric. If you use a poor metric, the extra cost of also having large data errors is again very small – around 2%.

There are some really important messages out of this.

  1. For an organisation wishing to select the best projects, it is crucial to use the right metric (including all relevant variables) to score the projects, rather than dropping a variable because of weak data. (This is assuming that the data errors are random. If you suspect that data has been systematically biased in a particular direction, this conclusion may not hold. But it would be better to work on reducing this bias, rather than omitting the variable.)
  2. If you currently have both a weak metric and weak data, improving the metric is far more important than improving the accuracy of the data. If you don’t use the right metric, there is almost no benefit from improving the accuracy of data. This is true even if the errors in the data are large!
  3. Even if you do use the right metric, the benefits of reducing data errors are only moderate, at best. As long as you consider all relevant factors and combine them appropriately, it may be that expert judgments about the values of key variables may be sufficient, rather than requiring highly accurate field measurements.
  4. Once you have reduced data errors down to a moderate level, it is unlikely to be worthwhile trying to get them down to a low level.

These findings strongly reinforce the approach we take in INFFER (Pannell et al., 2009). In the design of INFFER, we made sure that all relevant variables are considered, and that they are combined using a good metric. The simulations show that these two things are crucial. At the same time, we use a simplified approach for each individual variable, accepting that strict accuracy about numerical values is not essential. We do emphasise the importance of using the best available information, but argue that the best available is likely to be good enough to work with, even if it is not highly rigorous scientific data. We are not complacent about bad data, and put an emphasis on the need to fill key knowledge gaps, but we recognise that one can work with lower quality data for now, rather than being paralysed by it.

David Pannell, The University of Western Australia

Further Reading

Pannell, D.J. (2009). The cost of errors in prioritising projects, INFFER Working Paper 0903, University of Western Australia. Full paper (350K)

Pannell, D.J., Roberts, A.M., Alexander, J., and Park, G. (2009). INFFER (Investment Framework For Environmental Resources), INFFER Working Paper 0901, University of Western Australia, Perth. Full paper (74K)

158 – Using the wrong metric to prioritise projects is very costly

Many different types of metrics are used to prioritise projects for funding. In some cases, the metrics used are not much better than completely uninformed random choices.

Imagine yourself in the following situation:

  • You are an environmental manager with a fixed budget trying to decide which projects to fund. You want to choose the projects that will deliver the most valuable environmental outcomes that you can afford.
  • Money is tight. There are hundreds of projects you could potentially fund, but you only have sufficient funds at your disposal to fund a small percentage of them.
  • You have the information needed to evaluate the projects: the significance of the environmental assets affected, the degree of degradation they have suffered or are likely to suffer in future, the effectiveness of proposed works in preventing or turning around that degradation, the likely adoption of those works by landholders, the risks of project failure, and the costs of each project.

How should you combine this information to choose the best projects? Does it really matter how you combine it? Of course these questions are relevant to other sorts of projects as well, but I’ll talk about them in an environmental context, which is where I’ve had to deal with them recently.

In principle, the best strategy is to rank the projects according to their overall benefits divided by their costs. This is well known and easy to understand but, remarkably, many exercises in prioritising environmental projects ignore the project costs. In simulations I’ve done recently (Pannell, 2009), I found that if funds are tight (they usually are) the simple error of ignoring project cost would result in choosing projects with about 30% lower environmental values, even if you did everything else perfectly.

Another potential error is to ignore one or more crucial variables. For example, in the world of environmental management, it is remarkably common for people to fail to explicitly consider both the adoptability of the proposed on-ground works, and their technical feasibility in addressing the environmental problem. My simulations showed that leaving out a couple of crucial variables when choosing projects would mean that you would probably lose around 50% of the potential environmental benefits from the investment!

Next there is the question of how to calculate the overall project benefits. In particular, should the relevant variables be combined by multiplication or addition? This depends on how the variables are related to overall benefits. Commonly, overall benefits are proportional to the relevant variables. For example, benefits would be proportional to the measure used to score asset significance or value, the level of degradation that is avoided by the on-ground works, and the probability of project success, and they would be approximately proportional to the adoption of works by the community. For cases like these where benefits are proportional to the variables, the appropriate mathematical formula is clear-cut: the variables should be multiplied together.

In real-world environmental programs, estimating project benefits using a weighted additive metric is far more common than using a multiplicative metric, even when the constituent variables are likely to be proportional to benefits. Indeed, as far as I am aware, the only practically used tools in Australia that use multiplication are the Benefit: Cost Index in INFFER and the Project Prioritisation Protocol (Joseph et al., 2009), (although the latter could be more comprehensive in the factors it considers). Perhaps the preference for addition reflects the popularity of Multi-Criteria Analysis, in which weighted additive benefit scoring is by far the most common approach, and is often used for all variables in the equation.

My simulations showed that using a weighted additive benefits index when a multiplicative one should be used results in losses of up to 55% of the potential environmental benefits. This is quite a remarkable result. Even if you have perfect information about the projects, and you do everything else correctly, adding when you should multiply can lose you more than half of the potential benefits of the program investment.

I’m not saying that addition should never be used. For example, it would be appropriate to add together the benefits for different stakeholder groups. However, addition should not be used as the default for all variables. Within each stakeholder group, there will be variables that must be multiplied if the results are to be sensible.

It is striking how sensitive the benefits of investment are to the way that project rankings are calculated. If you do pretty much anything wrong, you are likely to lose around 30-50% of the potential benefits, and combinations of errors (which are the norm in practice) may push the losses up towards 60%.

That is not all that much better than you get from completely uninformed random selection of projects: you lose 70 to 80% of benefits under the dumbest possible strategy.

Clearly, environmental managers need to pay a lot more attention to the metrics they use to prioritise projects. The costs to the environment from getting it wrong are huge.

The simulations also show how important it is to prioritise environmental projects. Under major national programs like Landcare, the Natural Heritage Trust and the National Action Plan for Salinity and Water Quality, the philosophy was to try to engage with as many people as possible, without seriously examining whether they can make a difference to key environmental outcomes. Compared to a more systematic targeted approach, this inclusive strategy probably resulted in the programs achieving around 70 percent less valuable environmental outcomes than they could have generated, even assuming (bravely!) that the projects were well designed and well implemented.

The current program, Caring for our Country, is at least targeted to particular environmental assets, although there remains room for improvement in how the target assets and projects are chosen. This study shows that targeting itself is not enough. If you don’t do the targeting well, it hardly helps!

David Pannell, The University of Western Australia

Further Reading

Joseph, L.N., Maloney, R.F. and Possingham, H.P. (2009). Optimal allocation of resources among threatened species: a project prioritisation protocol, Conservation Biology 23(2), 328-338.

Pannell, D.J. (2009). The cost of errors in prioritising projects, INFFER Working Paper 0903, University of Western Australia. Full paper (350K)

Pannell, D.J., Roberts, A.M., Alexander, J., and Park, G. (2009). INFFER (Investment Framework For Environmental Resources), INFFER Working Paper 0901, University of Western Australia, Perth. Full paper (74K)

157 – Is the community an environmental “asset”?

In working with environmental managers and policy makers in the INFFER (Investment Framework for Environmental Resources) project, we aim to get clarity of thinking about investment priorities. A common source of ambiguity is the practice of calling the community an “asset”.

In INFFER, we use the term “asset” to refer to the things that the environmental program is ultimately meant to protect or enhance. In other words, they are natural assets, such as rivers, land, vegetation, species or wetlands.

However, we strike problems because many people involved in environmental programs like to say that the community is an asset. It is clear enough what they mean – in some situations, we need the community to respond to deliver environmental outcomes, and if we can enhance the capacity of the community to respond, we can enhance the environmental outcomes achieved. It also has elements of respecting and valuing the role of the community in environmental programs.

This much is unarguable. The problem is that the term “asset” means something rather different in the two cases (natural assets versus community assets). In the case of natural assets, we’re referring to the outcomes that the program is ultimately meant to achieve. In the case of community assets, we’re talking about something else. The community plays a number of crucial roles in environmental programs, but enhancing community capacity is not, in itself, the purpose of the programs. It’s a means to an end.

Roles for the community include the following.

(a) The community considers different environmental assets to be of different significance or importance. In INFFER, we capture this in community workshops or draw in information from past workshops or surveys.

(b) Particular members of the community provide important local knowledge about assets, such as the degree of current degradation, and the impacts of current management actions.

(c) For some assets, it is primarily up to members of the public to implement the works that would be required to manage the asset.

The figure below helps to illustrate how the three roles of the community are quite different from the role of the natural asset. The water tank represents the natural asset. The water level in the tank represents the overall significance of the asset in its current condition. If the asset had been in pristine condition, the tank would have been full. In this example, the asset has been degraded and lost much of its significance. The ongoing degradation process is represented by the open tap (faucet). Works or on-ground actions can be undertaken to reduce the rate of degradation (turn the handle on the tap). It is up to the community (the hand) to do this.

The three community roles identified above are labeled in red. Role (a) is that the community’s preferences and values determine the overall significance of the asset. Role (b) is that the community may provide local knowledge about the degradation process or the effectiveness of works. Role (c) is the community implementing the works.

In assessing whether an environmental project is worthwhile (one of the aims of INFFER), it is essential to be clear about these different roles and the way they relate to the natural asset. To treat the community as an asset in the same way as a natural asset like a river clearly makes no sense, and would hamper our ability to assess the project properly.

The diagram also emphasises that to be able to evaluate proposed actions by the community, you also need to be clear about which natural assets will be affected. If you cannot relate the actions to particular natural assets, you really have no idea about the environmental merits of the project.

That is why we prefer to reserve the word “asset” for natural assets. We are not de-valuing the community, just trying to be clear about things.

David Pannell, The University of Western Australia