Uncertainty in modeling can originate from several sources, including incomplete knowledge of human-natural systems, misconception about the main system processes, lack of data, structural limits of the model itself, the impossibility of finding an optimal set of parameters able to cover multiple space-time resolutions and spans, level of stochasticity, and/or an incomplete statistical/heuristic learning process (Pontius Jr. and Spencer, 2005; Messina et al., 2008; Pontius and Neeti, 2010; Pontius Jr. and Petrova, 2010).
The terms verification and validation have often been confused and used interchangeably (Oreskes et al., 1994; Rykiel, 1996). Verification checks the ability of a model to meet the needs of the final user and attempts to identify internal logical/computational errors, while validation measures a model’s ability to represent cause-effect relations in site-specific contexts (Coquillard and Hill, 1997).
Validation refers to the comparison between model predictions and observations stored in the dataset that were not used to train the model. Thus, during validation the calibrated model is fed with new values, independent from the training dataset. Training and validation datasets are commonly created by randomly splitting observational data into two sub-sets, e.g. 80% of data for training and 20% for validation (Chung and Fabbri, 2003; Villa-Vialaneix et al., 2012), or by using time partitioning (i.e. calibration employs older records or time span, while validation uses the most recent records or time span (Pontius Jr. and Petrova, 2010).
Choosing the best goodness-of-fit criterion for validation depends strictly on data type and quality, project goals, and required accuracy. Even if model output is consistent with the training and test datasets, there is no guarantee that the model will perform equally well when tested with a different dataset for predictive purposes. For example, information captured by historical data could contain errors (Pontius Jr. and Petrova, 2010), or may be outdated and thus unrepresentative of present and future system dynamics. Alternatively, a paucity of data may require selection of calibration and validation time intervals much shorter than the future period the model aims to predict (Pontius and Neeti, 2010). Finally, uncertainty surrounds whether the employed validation method is truly measuring model accuracy (Hagen-Zankera and Lajoie, 2008).
Spatial models require comparison within a neighborhood context, because maps that do not match exactly cell-by-cell may still present similar spatial patterns and therefore spatial agreement within a given cell vicinity. To address this issue, several vicinity-based comparison methods have been developed. For example, Costanza (1989) introduced the multiple-resolution fitting procedure that compares map spatial fitness within increasing window sizes. Power et al. (2001) provided the Fuzzy Inference System, a method based on hierarchical fuzzy pattern matching. Pontius Jr. (2000, 2002) introduced Klocation, which differentiates errors due to location and quantity. Hagen et al. (2003) developed the Kfuzzy, considered equivalent to the Kappa statistic, and the Fuzzy Similarity, which accounts for fuzziness of location and category within a cell neighborhood. Almeida et al. (2008) modified the latter and named it Reciprocal Similarity Comparison, because this metric corresponds to the minimum fuzzy similarity between map 1 of changes versus map 2 of changes and vice versa (Soares-Filho et al. 2009), (Fig. 1). Van Vliet et al. (2011) developed the Kappa Simulation, which assesses the agreement between the simulated land-use map and the actual one, based on the original land-use map.
We evaluated six map comparison methods using a set of synthetic maps (Fig. 2). The best methods are the ones that yield the highest contrast between the reference map (Final) and a randomly blurred pattern (neutral model), but also capture similitude between maps that closely resemble each others, such as the maps of line1 x line2 (table 1). The best method is the Reciprocal Similarity Comparison, with exponential decay function truncated at 19×19-window size. This method was chosen because it focuses only on the goodness-of-fit of the changes rather than the whole map, clearly distinguishes well-matched spatial patterns from a neutral model (a random map with same number of changes; probability for all cells =0.5), and is virtually independent of window size of comparison if large windows are employed.
Most comparison metrics are not mutually comparable, nor are the scores of a specific metric comparable when applied to different models, given their different contexts (Hagen-Zankera and Lajoie, 2008). Our selected metric intrinsically incorporates a neutral model of permanence and can be used against a neutral model of random allocation to test whether the model predicts better than chance. Yet, benchmark for accepting or rejecting a model remains an open question.
Almeida, C., Gleriani, J., Castejon, E., Soares-Filho, B., 2008. Neural networks and cellular automata for modeling intra-urban land use dynamics. International Journal of Geographical Information Science 22, 943–963.
Chung, C., Fabbri, A., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30, 451–472.
Coquillard, P., Hill, D., 1997. Modelisation et simulation d’ ecosystemes. Des modeles deterministes aux simulations par evenements discrets. ed Masson, Paris.Costanza, R., 1989. Model goodness of fit: a multiple resolution procedure. Ecological Modelling 47, 199–215.
Costanza, R., 1989. Model goodness of fit: a multiple resolution procedure. Ecological Modelling 47, 199–215.
Hagen, A., 2003. Fuzzy set approach to assessing similarity of categorical maps. International Journal of Geographical Information Science 17, 235– 249.
Hagen-Zankera, A., Lajoie, G., 2008. Neutral models of landscape change as benchmarks in the assessment of model performance. Landscape and Urban Planning 86, 284–296.
Messina, J., Evans, T., Mason, S., A.M., S., Deadman, P., Verburg, P.H., 2008. Complex systems models and the management of error and uncertainty. Journal of Land Use Science 3, 11–25.
Oreskes, N., Shrader-Frechette, K., Belitz, K., 1994. Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641–646.
Pontius Jr., R.G., 2002. Statistical methods to partition effects of quantity and location during comparison of categorical maps at multiple resolutions. Photogrammetric Engineering and Remote Sensing 68, 1041–1049.
Pontius Jr., R.G., Neeti, N., 2010. Uncertainty in the difference between maps of future land change scenarios. Sustainability Science 5, 39–50.
Pontius Jr., R.G., Petrova, S., 2010. Assessing a predictive model of land change using uncertain data. Environmental Modelling & Software 25, 299–309.
Pontius Jr., R.G., Spencer, J., 2005. Uncertainty in extrapolations of predictive land change models. Enviroment and Planning B 32, 211–230.
Power, C., Simms, A., White, R., 2001. Hierarchical fuzzy pattern matching for the regional comparison of land use maps. International Journal of Geographical Information Science 15, 77–100.
Rykiel, E., 1996. Testing ecological models: the meaning of validation. Ecological Modelling 90, 229–244.
Van Vliet, J., Bregt, A., Hagen-Zanker, A., 2011 Revisiting Kappa to account for change in the accuracy assessment of land-use change models. Ecological Modeling 222, 1367-1375.
Villa-Vialaneix, N., Follador, M., Ratto, M., Leip, A., 2012. Metamodels comparison for the simulation of N2O fluxes and N leaching from corn crops. Environmental Modelling & Software 34, 51–66.