# Local Descent - Stanford University Surrogate Models Surrogate Models A surrogate model of a function is an approximation of the function that is less costly to evaluate The surrogate model can then be used to help direct the search for the optimum of the real objective function 2

Fitting Surrogate Models Given a set of design points and function evaluations, Find the model parameters which minimize error This optimization problem is called regression 3

Fitting Surrogate Models Example: approximating with a quadratic function 4 Linear Models For -dimensional design space, has parameters and requires at least samples to fit unambiguously

A common simplification is to prepend 1 to x to get 5 Linear Models Fitting a linear model is called linear regression. Very common Solution can be found analytically If XX is invertible,

If X X is invertible, 6 Basis Functions More general surrogate models can be constructed as linear combinations of basis functions evaluated at the design point For linear models, the basis function is the design point

7 Basis Functions Any surrogate model represented as a linear combination of basis functions can be fit using regression 8

Basis Functions: Polynomials A simple 1-D polynomial model of degree has the form In 2-D, the basis function has the form 9 Basis Functions: Polynomials

Polynomial surrogate models are fit with linear regression, so in a sufficiently high-dimensional space, each polynomial model is always linear 10 Basis Functions: Sinusoidal Any continuous function over a finite domain can be represented using an infinite set of sinusoidal basis

functions If the function is univariate and integrable, it can be described using a Fourier series according to the summation 11 Basis Functions: Sinusoidal The formula for sinusoidal coefficients are

12 Basis Functions: Radial A radial function depends only on the distance of the input from some center point . Radial functions are convenient for describing local landscapes of complicated functions

When using radial functions as basis functions, it is common to use the evaluated design points as center points 13 Basis Functions: Radial 14

Basis Functions: Radial 15 Fitting Noisy Objective Functions Performing regression on noisy data can result in models that capture noise in addition to the underlying function Preference for smoothness can be encoded in the

regression problem by adding a regularization term to the objective function where is a smoothing parameter This results in an analytical solution of the form 16 Fitting Noisy Objective Functions

17 Model Selection After a model is fit to data, its quality still must be evaluated Data used to fit a model is called training data Models are compared based on generalization error which is the predictive error on data not in the training set

18 Model Selection One way to quantify generalization error is as the expected squared error of predictions Since this requires knowledge of the original function value at unknown points, it can be tempting to estimate generalization error using training error, which is the

mean squared error (MSE) of the model compared to the training data Models with low training error can still perform poorly outside of the regions containing training data 19 Model Selection: Holdout

The holdout method estimates generalization error by partitioning available data into a test set and a training set. Data from the test set is used to fit the model, and data from the training set is used to estimate generalization error. The model designer must decide how to partition the data into these two sets: If training set is too small, the model will be poor

If training set is too large, the generalization error estimate will be poor In random subsampling, the holdout method is applied 20 Model Selection: Holdout

21 Model Selection: Cross Validation Data can be more efficiently utilized for training and validation using -fold cross validation The original dataset is randomly partitioned into k subsets At the th iteration, the th partition is selected as the holdout set

The remaining partitions are used as the training set to generate a model and compute the th generalization error The values of are used to compute a mean and standard deviation 22 Model Selection: Cross Validation 23

Model Selection: Bootstrap The bootstrap method averages the generalization error by training on bootstrap samples evaluated on the entire training set Each bootstrap sample is generated by randomly selecting data points with replacement from a dataset of size , meaning some data points can be selected multiple times in the same bootstrap sample

24 Model Selection: Bootstrap To remove bias, the leave-one-out bootstrap estimate only computes generalization error using samples not included in a particular bootstrap sample This introduces a new bias since the test sets have different sizes, so the 0.632 bootstrap estimate

normalizes the result to remove this bias 25 Summary Surrogate models are function approximations that can be optimized instead of the true, potentially expensive objective function

Many surrogate models can be represented using a linear combination of basis functions Model selection involves a bias-variance tradeoff between models with low complexity that cannot capture important trends and models with high complexity that overfit to noise Generalization error can be estimated using techniques such as holdout, -fold cross validation, and the bootstrap 26