Empirical Optimization
Empirical program optimization is the process of using empirical search to make optimization decisions. Such decisions might be the choice of values for optimization parameters or even the choice of an appropriate algorithm. The heart of the search is a generate and test routine. Using a current set of optimization decisions it generates code, compiles it, executes it, and based on its performance chooses a new set of optimization decisions. The process repeats until a satisfactory performance level is achieved.
Some examples of tools that use empirical search for program optimization are ATLAS (for generating high-performance BLAS libraries) and FFTW (for generating FFT libraries). These tools fall in the category of library generators, i.e. they are not general-purpose compilers, but rather they concentrate on a specific problem in a specific domain and generate libraries that provide a high-performance solution to the problem. It is widely believed that empirical optimization is more effective than model-driven optimization.
Our work in the area is in the directions of combining empirical search and traditional model-based optimization and integrating the results into conventional compilers, which currently are not at all competitive with library generators. This is a very ambitious goal and therefore the work has to go into several phases:
- The first step is to understand why library generators are better: is it because they are domain specific or because they use empirical search. We have a PLDI paper comparing model-based and empirical optimization in the context of BLAS libraries. To make such a comparison, we replaced the empirical optimization engine in ATLAS with a model-based optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems. Our results show that model-based optimization can be surprisingly effective.
- Second, having realized that model-based and empirical optimization are not at all in conflict, we want to use models to prune the space for empirical search and use empirical observations to refine our models. We have started doing this work in the same context BLAS libraries. Our current results show that if we put our detailed models into ATLAS and do intelligent search starting from the point predicted by the models, we get better results in less time. We are preparing this for publication.
-
Finally, as more long-term goal, we want to extend our work in a number
of directions:
- Compare model-based and empirical optimization in the context of library generators like FFTW, which use search to make a decision between different algorithms, rather than different values for an optimization parameter;
- Extend our approach to all the levels of the memory hierarchy. All our work to date involved changing parts of the ATLAS framework, which itself is constrained to the L1 data cache;
- Use all the insights that we have gathered to develop a parameterizable library generator. Rather than fixing it to a very specific domain like BLAS or FFT, a high-level mathematical description of the problem will be provided as a parameter to the generation process. Our view is that this general purpose library generator will be able to do dense and sparse linear algebra, FFT, sorting and perhaps other algorithms;
The ultimate goal is to incorporate all these results in a general purpose compiler.
Publications:
- An Experimental Study of Self-Optimizing Dense Linear Algebra Software Proceedings of IEEE [Invited Paper], 96(5):832-848, May 2008
- An Experimental Comparison of Cache-oblivious and Cache-aware Programs 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2007
- Analytical Models and Empirical Search: A Hybrid Approach to Code Optimization 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2005
- Automatic Measurement of Instruction Cache Capacity 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2005
- X-Ray: A Tool for Automatic Measurement of Hardware Parameters International Conference on Quantitative Evaluation of SysTems (QEST), 09/20/2005
- Think Globally, Search Locally International Conference on Supercomputing (ICS), 06/20/2005
- Automatic Measurement of Memory Hierarchy Parameters International Conference on Measurement & Modeling of Computer Systems (SIGMETRICS), 06/06/2005
- Is Search Really Necessary to Generate High-Performance BLAS? Proceedings of the IEEE, Special issue on ”Program Generation, Optimization, and Adaptation", 02/01/2005
- Automatic Measurement of Hardware Parameters for Embedded Processors Cornell University Computing and Information Science Technical Report TR2005-1974, 01/27/2005
- Automatic Measurement of Memory Hierarchy Parameters (obsoleted by SIGMETRICS'05 version) Cornell University Computing and Information Science Technical Report TR2004-1970, 11/08/2004
- Think Globally, Search Locally (obsoleted by ICS'05 paper) Cornell University Computing and Information Science Technical Report TR2004-1969, 11/05/2004
- X-Ray: Automatic Measurement of Hardware Parameters (obsoleted by TR2005-1974 version) Cornell University Computing and Information Science Technical Report TR2004-1966, 10/06/2004
- A Comparison of Empirical and Model-driven Optimization Programming Language Design and Implementation, 06/09/2003





