Abstract:
Physics-based simulation codes are widely used in science and engineering to model complex systems that would be infeasible to study otherwise. Such codes provide the highest-fidelity representation of system behavior, but are often so slow to run that insight into the system is limited. For example, conducting an exhaustive sweep over a d-dimensional input parameter space with k-steps along each dimension requires kᵈ simulation trials (translating into kᵈ CPU-days for one of our current simulations). An alternative is directed exploration in which the next simulation trials are cleverly chosen at each step. Given the results of previous trials, supervised learning techniques (SVM, KDE, GP) are applied to build up simplified predictive models of system behavior. These models are then used within an active learning framework to identify the most valuable trials to run next. Several active learning strategies are examined including a recently-proposed information-theoretic approach. Performance is evaluated on a set of thirteen synthetic oracles, which serve as surrogates for the more expensive simulations and enable the experiments to be replicated by other researchers.