Browse to a topic that interests you. Perhaps a question that you have been pondering for a while is answered. See also our videos on decision support modelling and use of PEST.
Regularization
What is regularization?
In the broadest sense, “regularisation” is whatever it takes to obtain a unique solution to an ill-posed inverse problem – that is, to a problem where we are trying to estimate more parameters than we have information to do so. When do we do this? All the time really, for the earth is a complicated place, and our models have to be parameter-simple so that parameters can be estimated uniquely. How do we regularise? We can do it ourselves by reducing the number of parameters that we try to estimate. Or we can use a simple, lumped-parameter model rather than a complex physically-based model. Or we can let PEST do the regularisation for us. If done properly, the latter alternative has the following advantages.
- We get maximum information out of our data, thereby achieving something that is close to a parameter set of minimum error variance;
- By retaining many parameters in the model, we have the ability to explore the extent to which parameters, and any prediction that we make using the model, may be wrong. (When analyzing the uncertainty of a decision-critical prediction, the parameters that we cannot estimate are as important as those that we can.)
What is the null space?
Mathematically, if there is a vector δp and a matrix X such that Xδp=0 then the set of all δp vectors for which the above equation holds defines the null space of X. Now suppose that X represents a model acting on its parameters. (The Jacobian matrix is such a representation.) Suppose further that we have calibrated that model, thereby estimating a set of parameters p which gives rise to a good fit between model outputs and field measurements. Then we can add any δp within the null space of X to p to obtain a parameter set which also calibrates the model.
It follows that if a model has a null space, its parameters are not uniquely estimable. Regularisation (of one kind or another) must then be used to calibrate that model. However regularized inversion finds just one of the millions of parameter sets which fit a calibration dataset equally well. Which one will it find? Hopefully, if regularisation is properly done, it will find something approaching the “minimum error variance" solution to the inverse problem of model calibration. This does not mean that the parameters comprising this solution are likely to be right! It only means that the potential wrongness of each of these parameters is roughly symmetrically disposed with respect to it.
When do models have null spaces? All the time. The world is a complex place. The fact that models, and the spatial and temporal relationships between parameters that they use, can be made simple enough for a unique solution of the inverse problem to exist (using, for example, zones of piecewise constancy or mathematical regularisation devices such as singular value decomposition), does not make the inherent complexity of the real world go away. However parameter simplification of one kind or another is necessary if a model is to be calibrated. Differences between the simplified, calibrated parameter set on the one hand, and the complex set of real-world hydraulic properties on the other hand, are the largest source of error of most model predictions. Fortunately, this source of error can be explored as an adjunct to calibration achieved through regularised inversion using linear and nonlinear uncertainty analysis software supplied with the PEST and PEST++ suites.
How does PEST regularise?
There are two broad types of mathematical regularisation. Both are available through the PEST and PEST++ suites. They can be used either individually or together.
Tikhonov regularisation “fixes up” an ill-posed inverse problem by adding information to it, this information pertaining directly to system parameters. For example, preferred values can be suggested for all parameters. Alternatively, or as well, preferred relationships between parameters (such as smoothness relationships) can be suggested. The user then sets a “target measurement objective function” defining a desired level of model-to-measurement misfit. This can be set in accordance with the expected level of measurement/structural noise associated with the calibration dataset so that overfitting does not occur. PEST then re-formulates the inversion process as a constrained optimisation process; it minimizes the so-called “regularisation objective function” (thereby maximizing the extent to which preferred parameter values and/or parameter relationships are respected) while attempting to achieve a measurement objective function that is no less than, and no higher than, the user-supplied target measurement objective function.
In contrast to Tikhonov regularisation, subspace regularisation (of which singular value decomposition - or SVD - is the flagship) “fixes up” an ill-posed inverse problem by eliminating inestimable combinations of parameter from that problem. That is, it releases the parameter estimation process from having to estimate combinations of parameters of which the calibration dataset is uninformative. First, it subdivides parameter space into two orthogonal subspaces. One of these subspaces is spanned by parameter combinations comprising the “calibration solution subspace”; these are uniquely estimable on the basis of the current calibration dataset. The other subspace (the “calibration null space”) is spanned by parameter combinations that cannot be uniquely estimated. SVD regularizes an ill-posed inverse problem by simply not estimating the latter combinations of parameters; they thus retain their initial values.
In some ways SVD resembles what we do ourselves when we regularise an ill-posed inverse problem. That is, we omit from the inverse problem parameters that we cannot estimate. However SVD is a little smarter than we are; it thinks in terms of (orthogonal) parameter combinations rather than in terms of individual parameters. This allows more information to be extracted from a calibration dataset and, if properly implemented, minimizes perturbations from the original parameter set. If the latter are assigned on the basis of best prevailing expert knowledge, this can lead to estimated parameters of minimized error variance.
How should I set the target measurement objective function?
Often this is a matter of trial and error. Ideally, it should be set at a level that is commensurate with the level of measurement noise. For example, if all of your observations have a weight that is the inverse of the standard deviation of expected measurement noise, then the expected average value of each squared residual is about 1.0; the expected value of the objective function should then be about equal to the number of observations.
The trouble is, most observation “noise” that we encounter when calibrating a model is in fact model structural noise. A model simply cannot replicate every nuance of system behaviour, and we are often quite happy to consider our model calibrated with a model-to-measurement misfit which is considerably greater than that which would be incurred by measurement noise alone. Furthermore, we normally find out what this fit actually is (and hence the level of structural noise that actually exists) during the calibration process itself. In classical parameter estimation where we use the principle of parsimony to do our regularisation, the relationship between measurement noise and weights is then expressed by the “reference variance” obtained through the calibration process, after we have achieved the best fit that we can.
With highly parameterised inversion where regularisation is done mathematically, things are slightly different. Here we can often get as good a fit as we like. However this can lead to over-fitting, and a rapid rise in the potential error of predictions made by the calibrated model. In practice, the level of fit that we decide we are justified in achieving is chosen by us on the basis of the parameter values that give rise to that level of fit. If the cost of a good fit is weird parameter values, then the fit is too good (because predictions made on the basis of those weird parameter values may be wildly in error). On the other hand, if hydraulic properties inferred through the calibration process are realistic, and show realistic spatial variability, then the parameter set (and the fit which it gives rise to) may indeed be acceptable. However this does not mean that model predictions will not be in error. It may mean however that the chances of error (or the “error variance”) of some predictions at least is substantially smaller for having calibrated the model than it would have been if we had not calibrated it.
What if I have no idea what to set the target measurement objective function at because this is the first time I have run PEST on this particular problem?
Set the FRACPHIM control variable to 0.1. This will set the target measurement objective function at 0.1 times the current measurement objective function during every optimisation iteration, but no lower than the PHIMLIM control variable. Then set PHIMLIM as low as you like.
In the end, PEST will try to achieve a very low objective function, because that is what you have told it to do. However as the parameter estimation process progresses, and PEST “throws away” regularisation information by assigning it a very low weight, the inversion process will eventually run out of steam as it becomes ill-conditioned. In the meantime you will probably have obtained a level of model-to-measurement misfit that is better than you deserve, given the (measurement and structural) noise associated with your data. Your parameter field may look a little ugly as a consequence. However on the next occasion that you run PEST, you will be able to use a suitable target measurement objective function, because you are now aware of the attainable level of model-to-measurement misfit for your problem (and of the consequences to parameters of achieving this fit).
When should I use LSQR instead of SVD?
If you are estimating more than about two thousand parameters, SVD can become very slow. This is not the case with LSQR, as it can operate very quickly even where many thousands of parameters are estimated. The disadvantages of LSQR are that it cannot provide quite the same guarantee as does SVD of achieving a minimum error variance solution to the inverse problem. Also, a resolution matrix is not available after the inversion process has reached completion. In most circumstances neither of these are of any concern.
SVD-Assist
What is the difference between SVD and SVD-Assist?
When PEST uses truncated singular value decomposition (SVD) to solve the inverse problem of model calibration, it decomposes parameter space into two orthogonal subspaces spanned by orthogonal unit vectors. The unit vectors which span the calibration solution space represent combinations of parameters that are estimable on the basis of the current calibration dataset. Those spanning the null space comprise inestimable parameter combinations. Using standard techniques, PEST only estimates those parameter combinations which are in fact estimable, leaving the values of inestimable parameter combinations unchanged. (Actually, PEST works in terms of parameter perturbations from initial user-supplied parameter values rather than absolute parameter values - but the principle is the same.)
Decomposition of parameter space in this manner takes place on the basis of the Jacobian matrix. This is the matrix of sensitivities of all model outputs corresponding to observations comprising the calibration dataset to all adjustable parameters. In normal nonlinear parameter estimation, this matrix is re-computed during every iteration of the inversion process. Where sensitivities are calculated using finite parameter differences, this requires as many model runs as there are adjustable parameters for each occasion on which the Jacobian matrix is re-computed.
When PEST implements “SVD-assisted” parameter estimation, it computes the global Jacobian matrix only once. Then it decomposes parameter space into estimable and inestimable parameter combinations. It then reformulates the whole calibration problem so that, from that moment on, it estimates only “super parameters”. Only as many of these are required as there are dimensions in the calibration solution space; alternatively, only as many of these need to be defined as you have computing resources to estimate. Through the use of super parameters you can get all of the benefits of highly parameterized inversion with a comparatively small run time burden. Your model can have hundreds, or even thousands, of parameters, but these may be accommodated with a computational burden of only a few tens of runs per iteration.
What exactly is a super parameter?
Mathematically, a super parameter is the scalar projection of real world parameters onto one of the orthogonal axes spanning the calibration solution space. Only as many of these can be estimated as there are dimensions in the solution space. Null space parameter components can never be known.
Can I use Tikhonov regularization when undertaking SVD-Assisted parameter estimation?
Yes; in fact this is recommended practice. When estimating super parameters, Tikhonov regularization constraints can be applied to native parameters that are actually used by the model. Thus estimated parameter fields are constrained to deviate minimally from those encapsulating a “default parameter condition” defined by the modeller on the basis of site characterization studies and expert knowledge.
How many super parameters should I estimate?
Estimate as many as you have computing resources for. Of course, if you try to estimate more super parameters than there are valid dimensions in the calibration solution space, this strategy runs the risk of incurring numerical instability and overfitting. However if you tell PEST to use SVD to estimate super parameters and set the EIGTHRESH control variable to 1E-6 or 1E-7, this will maintain stability of the parameter estimation process. You can also add Tikhonov regularization to prevent over-fitting and further enhance credibility of estimated parameters.
How can I gain some estimate of the dimensions of the calibration solution space prior to undertaking SVD-assisted parameter estimation?
Use the SUPCALC utility. This is what it was written to do.
Create a PEST input dataset in the normal way. This may, or may not, include Tikhonov regularization. Set the NOPTMAX variable to -2 in the “control data” section of the PEST control file, and then run PEST. PEST will run the model as many times as there are adjustable parameters in order to calculate a Jacobian matrix based on initial parameter values. Then it will cease execution. Next run the SVDAPREP utility to build a new PEST input dataset based on super parameters. You are then ready to go.
If I alter the original PEST dataset, do I need to re-calculate the Jacobian matrix before running SVDAPREP to set up SVD-assisted parameter estimation?
Not necessarily. If alterations to the original PEST dataset are restricted to the following, then you can create a new Jacobian matrix corresponding to the altered PEST dataset using the JCO2JCO utility.
- Removal, tying or fixing of parameters;
- Alterations to the transformation status of any parameters;
- Removal of observations, or alterations to their weights;
- Additions or removal of prior information.
Pilot Points
Is it possible to “over fit” when using pilot points?
It certainly is. But if you are using regularization, this is easily prevented. If you don’t like the parameter field that PEST comes up with, then raise the target objective function (PHIMLIM) to achieve less good of a fit, for it would appear that you are fitting noise rather than information.
Where should I put pilot points when parameterizing a groundwater model domain?
Here are a few simple rules to follow.
- There is no need to be parsimonious; use as many pilot points as you can. Let PEST do the regularization (use Tikhonov regularisation for parameter sensibility, and SVD for numerical stability).
- Use more pilot points in those parts of the model domain where information has a high spatial density (for example where there are more observation wells).
- Try to place pilot points between head measurement wells in the direction of groundwater flow where they represent hydraulic conductivity. Place them on measurement wells where they represent specific yield or storage coefficient.
- Make sure to place hydraulic conductivity pilot points between measurement wells and any downstream boundary, for it is the hydraulic conductivity of the material between these wells and the boundary that determines the heads in those wells.
- Don’t leave any large gaps in the final pilot points pattern.
- If you use preferred value regularisation (for example if you use the ADDREG1 utility to add Tikhonov constraints to a PEST input dataset), consider employing a covariance matrix instead of weights with prior information equations. See the PPCOV and PPCOV_SVA utilities supplied with the PEST Groundwater Data Utilities Suite.
- If you are absolutely sure that no heterogeneity can arise over a distance of less than x, then don’t place any pilot points closer together than a distance of x.
How can I avoid “bulls eyes” when I use pilot points parameterization?
Paradoxically, bulls eyes arise when you use too few pilot points, rather than too many. A lot of people who are new to regularized inversion, and are habituated to traditional parameter estimation based on parameter parsimony, use only a few pilot points. This is a mistake, and almost guarantees that a parameter field based on circles will arise. Use a lot of pilot points in conjunction with regularization to parameterize a model domain.
Sometimes the estimation of outlandish values for some pilot points can provide an indication of some conceptual inadequacy in the model. The location of the aberrant pilot point can help pinpoint the source of the model conceptual error that give rise to the aberration. This is another advantage associated with the use of highly parameterized inversion in general, and pilot points in particular; weird parameter values tend to be local to the aberrations which cause them.
What sort of regularization should I use with pilot points?
A number of programs of the PEST Groundwater Utility Suite automate the addition of regularisation prior information equations to an existing PEST control file in which some or all of the parameters cited in this file are based on pilot points. These include PPKREG, PPKREG1 and the very versatile GENREG utility. Sophisticated geostatistically-based regularization can be added to a PEST input dataset using all of these utilities.
By far the easiest way to add regularization to a PEST control file, however (whether or not parameters in that file are based on pilot points) is to use the ADDREG1 utility supplied with PEST. This provides a prior information equation for every parameter featured in a PEST control file; in each case, this equation assigns a preferred value to that parameter which is equal to its initial value. This strategy works fine with pilot point parameters in most modelling contexts, irrespective of whether the model is single or multilayered, and whether one or many parameter types are being estimated.
If using ADDREG1, consider using the PPCOV or PPCOV_SVA utilities (supplied with the PEST Groundwater Utilities) to build one or a number of geostatistically-based covariance matrices to use with these prior information equations instead of individual weights; this will allow heterogeneity to arise in ways that are geologically meaningful. Also, assign prior information equations pertaining to different pilot point families (e.g. families that pertain to different layers or different parameter types) to different observation groups. Then set the IREGADJ variable in the “regularisation” section of the PEST control file to 1. This allows PEST to exert constraints on parameters more tightly where information on those parameters within the calibration dataset is weak or absent altogether.
Uncertainty Analysis
What is the difference between predictive error and predictive uncertainty?
“Error” is the more appropriate concept to employ when we take a single model output and act on the basis of that output. This is what happens when we calibrate a model and then use this calibrated model to make predictions of future environmental behaviour. This prediction is likely to be wrong, its “potential wrongness” arising from the fact that a model, and the parameters that it uses, are simplifications of reality (even if the calibrated model fits the historical observation dataset well), and from the fact that the historical dataset is contaminated by measurement noise. Tools available through the PEST suite allow quantification of the potential for error in predictions made by a calibrated model.
“Uncertainty” is slightly different. It is an intrinsic property of the data on which basis a model is parameterized and calibrated. Conceptually, it can quantified using a Bayesian approach. However it is often easier to use error as a substitute for uncertainty, especially when working with complex highly parameterized models. Predictive uncertainty is smaller than potential predictive error; but predictive error variance is easier to calculate. However it is important to realize that it is possible for predictive error variance to be artificially magnified through inappropriate or untidy regularisation, whether this is implemented through manual or mathematical means. (Here we define "regularisation" as whatever it takes to obtain a unique solution to a fundamentally nonunique inverse problem).
Uncertainty analysis requires choice of a covariance matrix of innate parameter variability. Where do I get this from? What if I don’t have a variogram for the area? Or what if I choose one that is wrong?
If you choose an incorrect variogram, that is ok. But you have to choose some characterization of innate parameter variability, for there can be no uncertainty analysis without it. Remember that a stochastic characterisation of the innate variability of hydraulic properties at a particular study site can reflect the fact that you simply don’t know the range of likely values at that particular site; you don’t have to be a specialist geostatistician to express the fact that you are not sure how high or how low a hydraulic property is likely to be based on your current knowledge. If you don’t know something, then that defines uncertainty after all. However you will nearly always feel a sense of disquiet if one or more parameter values is “weird”. Nevertheless, in most cases you will know “weird” when you see it. Then let “weirdness” be your guide to setting parameter limits for the purpose of uncertainty analysis.
Furthermore, it is not always necessary that the matrix of innate parameter variability be based on a variogram. In many cases no off-diagonal elements will be necessary. However if you think that hydraulic properties within a study do in fact possess innate spatial continuity, then simply have a guess at what the length dimension of this continuity may be, and on any preferential direction that it may have.
If I want to use a variogram for characterization of innate variability of pilot point parameters, how do I do this?
If you have some idea of a suitable variogram for your study site, record this in a "structure file"; see PEST documentation for details of this file type. Then use the PPCOV utility from the PEST Groundwater Utility Suite to generate a covariance matrix for your pilot points on the basis of that variogram. Alternatively, use the PPCOV_SVA utility to express parameter variability over model domains where the extent, anisotropy and direction of this variability can vary from place to place.
What is the best way to use PEST to do model predictive uncertainty analysis?
There are a number of alternatives available here.
For over-determined inverse problems (i.e. for problems in which there is no null space and a unique parameter set can be found because the steps necessary to ensure parameter uniqueness through parameter parsimony have been taken), PEST can be run in predictive analysis mode. When used in this mode, PEST maximizes or minimizes a user-specified prediction while maintaining the model in a calibrated state. Formulas are available for defining an objective function at which a model is no longer deemed to be calibrated. However this is often best done visually; where model-to-measurement misfit is dominated by structural noise, traditional statistical analysis is not applicable.
In the under-determined parameter estimation context (which is far more representative of the innate complexity of real-world systems), a variety of PEST-suite methodologies can be used for implementation of post-calibration uncertainty analysis. Linear uncertainty analysis can be accomplished very easily using the GENLINPRED utility and/or through the PREDUNC suite of utility programs. Nonlinear uncertainty analysis can be undertaken by first generating parameter samples using a linear approximation to the posterior covariance matrix (calculated using the PREDUNC7 and/or PNULPAR utilities), and then adjusting these parameter sets to satisfy calibration constraints using a pre-calculated Jacobian matrix. (This process is sometimes referred to as “null space Monte Carlo”.) Alternatively, use the PESTPP-IES iterative ensemble smoother. This has the ability to explore post-calibration parameter uncertainty very efficiently.
Though its outcomes are only qualitative in nature, perhaps the easiest way to use PEST to infer the likelihood (or otherwise) of a predictive possibility is to undertake a “predictive calibration” exercise. Include a prediction of interest as an “observation” in a revised calibration process, and see if you can still calibrate the model satisfactorily to the historical dataset. If you cannot, or if the parameters required to achieve calibration with the "predictive observation" included in the calibration dataset are too unrealistic, then the hypothesis that the prediction is possible has been tested, and can be rejected. This process is also referred to has “direct predictive hypothesis testing”.
Can I use PEST for optimization of data acquisition?
Yes indeed.
What is the best data to gather? It is that which reduces predictive uncertainty by the greatest amount. PEST provides special utilities (PREDUNC5 and PREDVAR5) which allow a user to compare the efficacy of different data acquisition strategies in lowering the uncertainty of predictions of interest. Furthermore, in doing this, PEST takes full account of the so-called “null space contribution” to predictive uncertainty. This is essential if attempts to optimise data acquisition are to have integrity.
Pareto Mode
When can I run PEST in Pareto mode?
Pareto mode can be useful in two contexts. One is when exploring the post-calibration uncertainty of a prediction. The other is when imposing Tikhonov constraints on parameters during a highly-parameterized calibration exercise.
How is Pareto mode used in exploring predictive uncertainty?
Exploration of predictive uncertainty using PEST’s Pareto mode is a lot like hypothesis-testing. Remember that the scientific method is based on the testing of hypotheses. If a hypothesis contradicts prevailing data, then it can be rejected. If it does not, then it is retained (but not necessarily accepted as a good description of reality because other hypotheses may also be consistent with all available data).
In environmental management, decisions are often made on the basis of avoiding unwanted future occurrences. Can a model say for sure that a certain occurrence will not happen? Often not, because there is too much uncertainty associated with predictions of future environmental behaviour. However we can use the model to test the hypothesis that the unwanted happening will occur. To do this we “observe” its occurrence through including its occurrence as a member of the calibration dataset, added to the normal membership of the calibration dataset comprised of historical measurements of system state. If, with the inclusion of this “predictive observation” in the calibration dataset, we can still attain a good fit with the “historical” component of the calibration dataset, and if achievement of this fit does not result in unrealistic parameter values, then the hypothesis that the bad thing will happen cannot be rejected. This is good to know when it comes to making important decisions.
Use of PEST in Pareto mode allows a modeller to focus on the value of an important prediction (for example the flood peak of a river, the quality of surface or groundwater following changed environmental management), and to vary that value while assessing probability of occurrence as it depends on that value. As the prediction gets worse and worse, hopefully this worsening predictive outcome comes at a cost of larger and larger misfit with the historical calibration dataset, and/or the requirement for the model to be endowed with more and more unrealistic parameters for the prediction to occur. Either on the basis of statistical analysis, or simply through informed intuition, there may come a point where the modeller is able to say “it is most unlikely that the prediction will ever be this bad”.
Unfortunately, this process may also demonstrate that the possibility of an untoward environmental happening cannot in fact be excluded. However analysis of Pareto results will at least allow a modeller to assess the likelihood or otherwise of the feared occurrence. Meanwhile, in monitoring the values that PEST assigns to model parameters in order to create the pre-conditions for the unwanted prediction to occur, valuable information is provided on which scientific design of an early-warning monitoring network can be based.
How is Pareto mode used in applying Tikhonov regularization constraints?
Regularization constitutes a trade-off situation. Where there is insufficient data to estimate all model parameters uniquely (which is always the case given the innate complexity of reality) the shortfall in information must be made up by the modeller. That is not to say that he/she can provide correct (or even nearly correct) values for parameters when making up for this information shortfall. But he/she can provide values for parameters, or for relationships between parameters, that are of “minimum error variance” when all of his/her expertise is taken into account. Note that “minimum error variance” does not mean “small error variance”. It only means that the potential for being wrong is minimized because that potential has been made symmetrical with respect to the estimates provided. Thus there is as much chance of being wrong one way as there is of being wrong the other way.
So when we calibrate a highly parameterized model, we supply a backup plan for all parameters. In doing this we provide backup estimates for parameters (or parameter combinations) that are inestimable on the basis of the current calibration dataset. These estimates are normally provided in the form of prior information equations. But what weight should be placed on these prior information equations? If it is too much, PEST will ignore the calibration data and conform to the parameter constraints embodied in the prior information. In contrast, if the weight applied to prior information is too weak, PEST will inject too much parameter variability into the calibrated parameter field in order to provide a glorious fit with the data. This may yield an unlikely set of parameters that is as much reflective of noise in the measurements and inadequacies in the model than it is of information in the data. In neither of these cases can the calibrated parameter field be considered to be optimal.
So just how tightly should Tikhonov constraints be enforced? There are statistical answers to this question. But statistics often don’t help us much in the real world. After all, what is the covariance matrix of model structural noise? What is the variogram that applies to reality at our specific sight? We don’t know, and so statistics abandon us. Normally the best that we can do is examine a suite of different parameter fields that fit the data to greater or lesser degrees and, based on our knowledge of site conditions, choose a parameter field that has some - but not too much - variability, and that leads to a fit with historical data that is good - but not too good.
The information that PEST provides when it runs in Pareto mode allows the modeller to make this subjective judgement, because it places the trade-off information that he/she needs at his/her fingertips. Judgements on what constitutes an “optimal parameter field” in any particular modelling context can then be made by an individual, or a group of people, with all information to hand.
But it is important to remember just what an “optimal parameter field” means. Don’t be under any illusion that “an optimal parameter field” leads to correct predictions of future system behaviour. This cannot happen, because many facets of future system behaviour depend on more parameters than we can ever know the values of. An “optimal parameter field” is one that leads to predictions that possess minimum error variance - predictions that are probably wrong, but whose potential for wrongness has been minimized. If a prediction is sensitive to parameters for which information is weak or non-existent, this potential may still be very large. But at least it will be symmetric about the prediction made by the model.
Bad Model Derivatives
What do you mean by “bad model derivatives”?
If a model parameter is varied incrementally, but dependent model outputs vary in a somewhat irregular manner, then the finite-difference derivatives calculation process is corrupted. Irregular variation of model outputs is most often caused by artificial thresholds in model algorithms, model numerical instability, or lack of full convergence of model iterative solvers. Sometimes the last of these can be precluded by setting model solution convergence criteria to smaller values. However this may not always work, as the solver may simply refuse to converge to these tighter criteria.
What steps can I take to improve model performance in order to ameliorate problems with bad derivatives?
In some cases there are steps that you can take to considerably increase the chances that model outputs will be differentiable with respect to parameter values. Here are a few basic (but very important) suggestions. Remember when considering these, that finite-difference derivatives (as calculated by PEST) are computed by subtracting one (possibly large) number from another (possibly large) number. This is an operation that is easily contaminated by numerical error.
- If the model has an iterative solver, set the convergence criterion for that solver to a small value.
- If the model uses adaptive time stepping, consider setting the model time step manually to an interval that is smaller than any which the model’s adaptive time-stepping scheme is likely to undercut.
- Make sure that the model writes to its output files with maximum precision. This is between 6 and 7 significant figures for single precision arithmetic. Use exponential notation rather than fixed field notation for writing numbers if possible, for leading zeros are a waste of valuable space, and cause significant figures to be lost at the right hand side of the number.
- If the model is comprised of one or more executables run in succession in a batch or script file, make sure that files written/read by these executables in passing data to each other also record numbers with maximum available precision.
What options are available in PEST for improving its performance in the face of bad model derivatives?
PEST provides the following options for improving its performance in the face of problematical model derivatives.
- Optional use of a five point, minimum error variance, finite-difference derivatives stencil.
- “Split slope analysis” in which finite differences that are suspected of contamination by bad model numerical performance are rejected (see the SPLITTHRESH, SPLITRELDIFF and SPLITACTION control variables).
- A version of its “automatic user intervention” functionality in which parameters with large (and therefore possibly wayward) derivatives are ignored during a particular optimization iteration.
If PEST does not appear to be working well, how do I know whether the problem is caused by bad model derivatives, or by something else?
There are some telling signs that bad derivatives may be the reason for PEST malperformance. If the parameter estimation process begins with the objective function falling fast but then, at an early iteration, falls no further or even rises, this is often a sign that PEST is having to work with poor model derivatives. Ensure that you are using SVD or LSQR for solution of the inverse problem (regardless of whether you are employing Tikhonov regularization). This eliminates numerical instability arising from problem ill-posedness. Under these circumstances, bad PEST performance is probably caused by bad numerical derivatives.
How can I know for sure that finite-difference derivatives are problematical?
If you suspect that your model is not providing good derivatives, use the JACTEST and POSTJACTEST utilities to explore the matter further.
What options are available to me if a model’s derivatives are so bad that PEST simply will not work?
Use the SCEUA_P or CMAES_P global optimisers provided with PEST. They can be used interchangeably with PEST; hence no modification of a PEST input dataset is required prior to their use. They are also parallelisable.
General
Do initial parameter values matter?
If you are doing traditional parameter estimation where you formulate a well-posed inverse problem, and if your model is not too nonlinear, then initial parameter values may not matter too much. In this case there will only be one objective function minimum, this corresponding to a unique set of parameter values. PEST will probably find this minimum irrespective of initial parameter values.
However if your model is very nonlinear, and there are multiple objective function minima, then your initial parameter values should be chosen more carefully. Chose parameter values that you consider to be realistic on the basis of your expert judgement.
If doing highly parameterized inversion the same rule applies. Choose initial parameter values that are your preferred parameter values on the basis of expert knowledge or site characterization information. Then let the parameter estimation process introduce perturbations to this preferred parameter set, with Tikhonov regularization ensuring that these perturbations are the minimum required to ensure fitting of model outputs to field data.
How do I create a set of model input files that contain optimized parameter values?
If PEST stops naturally, or if it stops in response to a PSTOPST command, it will undertake a final model run using optimized parameter values. Model input files will therefore be loaded with optimized parameter values, and numbers recorded in model output files will have been calculated on the basis of these values.
When PEST undertakes SVD-assisted parameter estimation, or if its execution is halted prematurely using the PSTOP command, it does not carry out one final model run on the basis of optimised parameters before shutting down. Therefore model input files do not necessarily contain optimised parameters, and model-generated files do not contain best-fit model outputs.
In this case, use the PARREP utility to create a new PEST control file containing optimised parameter values. (Optimised parameter values are stored in the PAR file if undertaking normal parameter estimation, and in the BPA file when undertaking SVD-assisted parameter estimation.) Set the NOPTMAX control variable to zero in this new PEST control file and then run PEST. PEST will run the model once, compute the objective function, and then cease execution.
Note that if the PEST control file instructs PEST to run in regularisation mode, the measurement objective function will be the same as the lowest achieved in the previous PEST run. The regularisation objective function will differ, however, because PEST does not calculate an optimum regularisation weight factor (as it did during each iteration of the previous parameter estimation process), when asked to simply undertake just one model run.
I get an error message which says “phi gradient zero”. What does this mean?
Technically, this means that all of your parameters are totally insensitive; that is, it means that none of them have any effect on model outputs. What it actually signifies on most occasions, however, is that you have given PEST the name of the wrong model input file corresponding to a certain template file in the “model input/output files” section of the PEST control file. Thus PEST writes updated parameter values to one file, while the model obtains parameter values from another. Alternatively, if your model is a batch file, this condition signifies that there is some “break” in data linkages between model components as a result of file misnaming, or that there is a breakdown in the running of an executable program that forms part of the overall model.
What precautions should I take when the “model” run by PEST is actually a batch or script file comprised of more than one executable program?
The following precautions are recommended.
- Ensure that data is transferred between executable programs with maximum precision. Thus numbers written to all model component output files which constitute input files for another model component should have at least six to seven significant figures. (Use exponential notation to represent numbers if possible.)
- Begin the model batch file by deleting all files that are written by one model component and read by another. Thus if one model component fails to run, the next component will not read an old output file, mistaking it for a new one. Instead it will crash. Its output files will not therefore be written and following model components will crash. Finally, when the files that PEST expects to read are not present, it will cease execution with an appropriate error message. (Note that PEST deletes all model output files that it must read prior to running the model for this same reason - that is, so that it does not read an old model output file mistaking it for a new one.)
How does PEST perform in the face of local objective function minima?
Before answering this, it is important to note that local minima are often a function of non-differentiability of model outputs with respect to parameter values. It would not take too much trouble in many cases to eradicate this problem if some small steps were made in model algorithmic design to remove thresholds and discontinuities that are “manufactured” by the model itself. This is well explained in the following article:-
Kavetski, D., Kuczera, G. and Franks, S.W. (2006). Calibration of conceptual hydrologic models revisited: 1. Overcoming numerical artefacts. Journal of Hydrology, 320 (1), pp173-186.
Having noted this however, it must also be admitted that the problem is real – and would still exist (though to a lesser extent) even if all models were designed in such a way as to avoid artificial thresholds. It is also true that the Gauss Marquardt Levenberg method, on which PEST is based, cannot provide a guarantee that a local minimum will not be found instead of the global objective function minimum. However the problem is not as bad as many people make out.
For those occasions on which local optima and/or model non-differentiability cause problems which simply cannot be overcome, two so-called “global optimisers” are supplied with the PEST suite. These are the SCEUA_P and CMAES_P optimisers. Neither relies on the use of derivatives for finding the objective function minimum. Both can be used interchangeably with PEST; and both can undertake model runs in parallel.
When should I use prior information?
There is no single answer to this question.
Conceptually, prior information should be used whenever you have some idea of what parameter values are reasonable. Of course, this happens all of the time. The problem with using prior information, however, is that you must assign weights to it. And what weight do you assign to prior information? If it is too large, PEST will ignore the calibration dataset and simply respect the prior information, failing to give you a good fit between model outcomes and field data. On the other hands, if the weights assigned to prior information equations are too small, then prior information won’t do much good, either in forcing parameter values to be realistic, or in providing numerical stability to a potentially ill-posed inverse problem.
Prior information is a great thing, for it is important that all available information be included in the calibration process, even if some of it is qualitative. However it is often best to employ it as part of a pervasive Tikhonov regularisation scheme. If supplied as part of a regularisation scheme, PEST gets to assign weights to prior information itself. It assigns weights that are as strong as they need to be to stabilize the problem, but not so strong that they compromise the sought-after level of model-to-measurement fit as specified by the PHIMLIM regularisation control variable.