Performance Profiles

Performance profiles are a powerful visualization tool for comparing the performance of multiple algorithms (or solver-model combinations) across a set of test problems. They were introduced by Dolan and Moré in 2002^[1] and have become a standard method for benchmarking optimization software.

This page explains:

The general concept of performance profiles
How they are adapted in CTBenchmarks.jl
How to interpret the resulting plots

The General Concept

Motivation

Traditional benchmarking efforts often involve extensive tables displaying solver performance on various metrics. However, these tables face inherent challenges:

The sheer volume of data becomes overwhelming, especially for large test sets
Interpretation of results frequently leads to disagreements
A small subset of problems can dominate the conclusions

Performance profiles address these issues by using performance ratios rather than raw metrics, offering insights into the percent improvement of a solver's metric compared to the best solver.

Mathematical Definition

Consider:

A set of solvers $S$ (or solver-model combinations)
A set of problems $P$
A performance metric $t_{p,s}$ for each solver $s \in S$ on problem $p \in P$

The metric must be a positive value where smaller is better (e.g., CPU time, number of iterations).

Performance Ratio

The performance ratio $r_{p,s}$ of solver $s$ on problem $p$ is defined as:

\[r_{p,s} := \frac{t_{p,s}}{\min_{s' \in S} t_{p,s'}}\]

where $t_{p,s}$ is the value of the chosen metric for solver $s$ on problem $p$.

Key properties:

We have $r_{p,s} \geq 1$ by definition
Having $r_{p,s} = 1$ means that solver $s$ achieved the best performance on problem $p$
When a solver fails to solve a problem (e.g., reaches maximum time or iteration limit), we assign $r_{p,s} = +\infty$

Performance Profile Function

The performance profile is the cumulative distribution function for the performance ratio:

\[\rho_s(\tau) := \frac{1}{n_p} \sum_{p \in P} \mathbb{1}_{r_{p,s} \leq \tau}\]

where:

the parameter $n_p$ is the total number of problems
the indicator function $\mathbb{1}_X$ is defined as $\mathbb{1}_X = 1$ if condition $X$ is true, and $\mathbb{1}_X = 0$ otherwise

In words, $\rho_s(\tau)$ represents the proportion of problems for which solver $s$ has a performance ratio within a factor $\tau$ of the best solver.

Key Properties

By definition, $\rho_s(\tau)$ has several important properties:

Range: $0 \leq \rho_s(\tau) \leq 1$ for all $\tau$
At $\tau = 1$: The value $\rho_s(1)$ gives the percentage of problems where solver $s$ achieved the best performance. The sum across all solvers may exceed 100% in case of ties.
Monotonicity: $\tau \mapsto \rho_s(\tau)$ is a non-decreasing function
Plateau: For $\tau$ sufficiently large, all solvers should reach a plateau representing the percentage of problems solved. If $\rho_s(\tau) < 1$ for all $\tau$, solver $s$ failed on some problems.

Performance Profiles in CTBenchmarks.jl

Adaptation to Our Context

In CTBenchmarks.jl, we adapt the classical Dolan–Moré definition to our specific benchmark structure:

Instances

An instance is a pair $(\mathrm{problem}, \mathrm{grid\_size})$ appearing in the benchmark results, regardless of whether it was successfully solved by any solver.

For example:

(beam, 100)
(beam, 500)
(crane, 200)

are three distinct instances.

Solver-Model Combinations

A solver-model combination $s$ is identified by the pair (model, solver).

For example:

(JuMP, Ipopt)
(ADNLPModels, Ipopt)
(ExaModels, MadNLP)

are three distinct solver-model combinations.

Performance Metric

For each instance $p = (\mathrm{problem}, \mathrm{grid\_size})$ and solver-model $s$:

If the run $(p, s)$ has success == true and a valid benchmark object, we extract the CPU wall time:
\[t_{p,s} = \text{benchmark["time"]}\]
Among all solver-models that succeeded on instance $p$, we compute the best (minimal) time:
\[t_p^* = \min_{s' \in S} t_{p,s'}\]
For every successful run $(p, s)$, we define the performance ratio:
\[r_{p,s} = \frac{t_{p,s}}{t_p^*} \geq 1\]
Instances where solver-model $s$ failed (or has no valid timing) are treated as having $r_{p,s} = +\infty$

Performance Profile Definition

The performance profile of each solver-model combination $s$ is:

\[\rho_s(\tau) = \frac{1}{N} \cdot \#\{ \text{instances } p : r_{p,s} \leq \tau \}\]

where $N$ is the total number of distinct $(\mathrm{problem}, \mathrm{grid\_size})$ instances present in the JSON file, including those where all solvers failed.

Treatment of Failures

This definition has important consequences:

Only instances with success == true and valid timing contribute ratios $r_{p,s}$ and can increase $\rho_s(\tau)$
Instances where a given solver-model fails are counted in $N$ but never in the numerator for that solver-model
Instances where all solver-models fail are still included in $N$ but do not contribute any $r_{p,s}$ for any solver-model

Consequence: If there exist problem-grid instances that are not solved by a given solver-model combination, its curve $\rho_s(\tau)$ will plateau strictly below 1 (100%). If some instances are unsolved by all solver-models, then no curve can reach 1, clearly indicating that there are problems for which none of the tested approaches succeeded.

Implementation overview

In the implementation, CTBenchmarks.jl separates the definition of a performance profile from its computation and visualization.

ProfileCriterion{M} describes the metric:
- a human-readable name,
- a function row::DataFrameRow -> M extracting the metric value from each benchmark row,
- a comparison better(a::M, b::M) indicating when a is strictly better than b (e.g., a <= b for a time or iteration count).
PerformanceProfileConfig{M} describes how to build a profile from a table of benchmark results:
- instance_cols: columns that define an instance (in this documentation: [:problem, :grid_size]),
- solver_cols: columns that define a solver combination (here: [:model, :solver]),
- criterion: the ProfileCriterion{M} to use,
- is_success: predicate selecting successful runs,
- row_filter: additional filtering (e.g., by hardware or scenario),
- aggregate: how to aggregate multiple runs of the same (instance, solver combination) into a single metric value.
PerformanceProfile{M} is an immutable structure that stores:
- the benchmark identifier,
- all instances and successful runs with their performance ratios,
- the list of solver combinations and the Dolan–Moré ratio bounds,
- the PerformanceProfileConfig{M} that was used to construct it.

The main profiles used in this documentation are defined by two configurations:

CPU-time profile (default time-based profile)
- instances: (problem, grid_size),
- solver combinations: (model, solver),
- criterion: CPU time, using the benchmark["time"] field,
- successful runs: those with success == true and a valid benchmark object,
- aggregation: mean CPU time over repeated runs of the same (instance, solver combination).
Iterations profile (iteration-based profile)
- instances: (problem, grid_size),
- solver combinations: (model, solver),
- criterion: Iterations, using the iterations field recorded in benchmark results,
- successful runs: those with success == true and a valid, non-missing iterations count,
- aggregation: mean number of iterations over repeated runs of the same (instance, solver combination).

The corresponding Julia functions are:

compute_profile_default_cpu(bench_id, src_dir) and compute_profile_default_iter(bench_id, src_dir), which build PerformanceProfile{Float64} objects using the respective configurations,
_plot_profile_default_cpu(bench_id, src_dir) and _plot_profile_default_iter(bench_id, src_dir), which call plot_performance_profile to generate the figures,
_analyze_profile_default_cpu(bench_id, src_dir) and _analyze_profile_default_iter(bench_id, src_dir), which call analyze_performance_profile to produce the textual summaries.

In addition to the plot, CTBenchmarks.jl provides an automatic textual analysis generated by _analyze_profile_default_cpu(bench_id) (used via INCLUDE_TEXT blocks in the documentation). This analysis is computed from the performance profile data.

Optimal Control and Discretization

What are Optimal Control Problems?

The benchmarks in CTBenchmarks.jl are based on optimal control problems (OCPs) from the OptimalControlProblems.jl package. An optimal control problem with fixed initial and final times consists of minimizing a cost functional (in Bolza form):

\[J(x, u) = g(x(t_0), x(t_f)) + \int_{t_0}^{t_f} f^{0}(t, x(t), u(t))~\mathrm{d}t\]

where:

the variable $x(t)$ is the state trajectory (e.g., position, velocity)
the variable $u(t)$ is the control input (e.g., force, torque)
the function $g$ is the Mayer cost (terminal cost)
the function $f^0$ is the Lagrange cost (running cost)
the variable $t \in [t_0, t_f]$ is the time interval

The state and control must satisfy the dynamics constraint:

\[\dot{x}(t) = f(t, x(t), u(t))\]

and possibly other constraints:

\[\begin{array}{llcll} x_{\mathrm{lower}} & \le & x(t) & \le & x_{\mathrm{upper}}, & \text{(state box constraints)} \\ u_{\mathrm{lower}} & \le & u(t) & \le & u_{\mathrm{upper}}, & \text{(control box constraints)} \\ c_{\mathrm{lower}} & \le & c(t, x(t), u(t)) & \le & c_{\mathrm{upper}}, & \text{(path constraints)} \\ b_{\mathrm{lower}} & \le & b(x(t_0), x(t_f)) & \le & b_{\mathrm{upper}}. & \text{(boundary constraints)} \end{array}\]

Cost types:

Mayer: Only terminal cost ($f^0 = 0$)
Lagrange: Only integral cost ($g = 0$)
Bolza: Both terminal and integral costs

The Direct Method: From OCP to NLP

The direct method transforms the infinite-dimensional optimal control problem into a finite-dimensional nonlinear programming problem (NLP) by discretizing time. This approach is:

More robust with respect to initialization than indirect methods (Pontryagin's Maximum Principle)
Easier to apply, which explains its widespread use in industrial applications
Less precise than indirect methods, but sufficient for many practical purposes

Discretization Scheme

In OptimalControlProblems.jl, every OCP is discretized using the trapezoidal rule on a uniform grid with $N$ steps:

\[\begin{array}{lclr} t \in [t_0,t_f] & \to & t_0 < t_1 < \dots < t_N=t_f, \text{ with } t_{i}-t_{i-1} = \frac{t_f-t_0}{N}, & i = 1:N \\[0.5em] x(\cdot),\, u(\cdot) & \to & X=\{x_0, \ldots, x_N, u_0, \ldots, u_N\} & \\[1em] \hline \\ \text{step} & \to & \displaystyle h = \frac{t_f-t_0}{N} & \\[0.5em] \text{criterion} & \to & \displaystyle g(x_0, x_N) + \frac{h}{2} \sum_{i=1}^{N} \left( f^0(t_i, x_i, u_i) + f^0(t_{i-1}, x_{i-1}, u_{i-1}) \right) & \\[1em] \text{dynamics} & \to & \displaystyle x_{i} = x_{i-1} + \frac{h}{2} \left( f(t_i, x_i, u_i) + f(t_{i-1}, x_{i-1}, u_{i-1}) \right), & i = 1:N \\[1em] \text{state constraints} & \to & x_{\mathrm{lower}} \le x_i \le x_{\mathrm{upper}}, & i = 0:N \\[1em] \text{control constraints} & \to & u_{\mathrm{lower}} \le u_i \le u_{\mathrm{upper}}, & i = 0:N \\[1em] \text{path constraints} & \to & c_{\mathrm{lower}} \le c(t_i, x_i, u_i) \le c_{\mathrm{upper}}, & i = 0:N \\[1em] \text{boundary constraints} & \to & b_{\mathrm{lower}} \le b(x_0, x_N) \le b_{\mathrm{upper}} & \end{array}\]

This yields a standard NLP of the form:

\[\text{(NLP)} \quad \left\{ \begin{array}{lr} \min \ F(X) \\[1em] X_{\mathrm{lower}} \le X \le X_{\mathrm{upper}}\\[0.5em] C_{\mathrm{lower}} \le C(X) \le C_{\mathrm{upper}} \end{array} \right.\]

where $X$ contains all discretized state and control variables, and $C(X)$ represents the discretized dynamics and constraints.

Grid Size and Problem Instances

The grid size $N$ (number of discretization steps) is a crucial parameter:

Smaller $N$: Faster to solve, but less accurate approximation of the continuous problem
Larger $N$: More accurate, but computationally more expensive

In CTBenchmarks.jl, we benchmark each optimal control problem at multiple grid sizes to assess:

Scalability: How does solver performance degrade as $N$ increases?
Accuracy vs. speed trade-offs: Which solver-model combinations are efficient for coarse/fine grids?

This is why an instance is defined as a pair $(\mathrm{problem}, \mathrm{grid\_size})$: the same optimal control problem discretized with different $N$ values represents different computational challenges.

Available Optimal Control Problems

The OptimalControlProblems.jl package provides a curated collection of optimal control problems from the literature. Each problem is modeled in two ways:

JuMP models: Direct NLP formulation using JuMP.jl
OptimalControl models: High-level OCP description using OptimalControl.jl, with automatic discretization via CTDirect.jl

Examples of problems (see the problems browser for the complete list):

Beam: Minimize vibrations in a flexible beam
Chain: Hanging chain with control forces
Double oscillator: Coupled oscillators with control
Goddard: Rocket ascent problem (maximize altitude)
Robot: Robot arm trajectory optimization
Steering: Vehicle steering with obstacle avoidance
Vanderpol: Van der Pol oscillator control

Each problem has specific characteristics:

State dimension: Number of state variables (e.g., position, velocity)
Control dimension: Number of control inputs
Cost type: Mayer, Lagrange, or Bolza
Constraints: Which types of constraints are present (state, control, path, boundary)

You can explore all problems interactively in the problems browser, which allows filtering by:

Number of state/control variables
Cost type (Mayer, Lagrange, Bolza)
Constraint types (state, control, path, boundary)
Final time (fixed or free)

Models and Solvers

In CTBenchmarks.jl, a model refers to the Julia package used to formulate the NLP:

JuMP: JuMP.jl with automatic differentiation
ADNLPModels: ADNLPModels.jl with automatic differentiation
ExaModels: ExaModels.jl for GPU-accelerated modeling
OptimalControl: OptimalControl.jl with CTDirect.jl for direct transcription

A solver is the optimization algorithm used to solve the NLP:

Ipopt: Ipopt.jl (interior-point method)
MadNLP: MadNLP.jl (interior-point method, GPU-capable)

Different model-solver combinations can have vastly different performance characteristics:

Modeling overhead: Time to build the NLP (automatic differentiation, sparsity detection)
Solver efficiency: Time to solve the NLP (linear algebra, convergence rate)
Memory usage: RAM and GPU memory requirements
Scalability: Performance on large-scale problems (large $N$)

Example: Performance Profile Plot

The following figure shows a typical performance profile from CTBenchmarks.jl, comparing 6 solver-model combinations on a set of optimal control problems with varying grid sizes.

Performance Profile Example

Reading This Example

In this plot:

6 solver-model combinations are compared:
- (ADNLPModels, Ipopt) (blue circles)
- (ADNLPModels, MadNLP) (green diamonds)
- (ExaModels, Ipopt) (red squares)
- (ExaModels, MadNLP) (orange triangles)
- (JuMP, Ipopt) (purple inverted triangles)
- (JuMP, MadNLP) (brown stars)
Multiple problem instances: Each curve represents performance across all $(\mathrm{problem}, \mathrm{grid\_size})$ pairs tested

Key observations:

At $\tau = 1$ (leftmost):
- (ExaModels, Ipopt) (red) is the fastest on ~25% of instances
- (ExaModels, MadNLP) (orange) is the fastest on ~20% of instances
- No single combination dominates all instances
At $\tau = 2$:
- (ExaModels, Ipopt) (red) solves ~60% of instances within 2× the best time
- (ExaModels, MadNLP) (orange) solves ~70% of instances within 2× the best time
Plateaus (rightmost):
- (ExaModels, MadNLP) (orange) reaches ~75%, indicating it solved 75% of all instances
- (ADNLPModels, Ipopt) (blue) reaches ~80%
- (ADNLPModels, MadNLP) (green) reaches ~75%
- (JuMP, Ipopt) (purple) reaches ~70%
- Some combinations fail on 20-30% of instances
Trade-offs:
- ExaModels combinations are often fastest when they succeed, but have lower robustness
- ADNLPModels + Ipopt is more robust (80% success) but rarely the fastest
- JuMP combinations show intermediate performance

Interpretation: There is no clear winner. The choice of solver-model combination depends on your priorities:

For maximum robustness: Choose (ADNLPModels, Ipopt)
For speed on easy problems: Choose (ExaModels, Ipopt) or (ExaModels, MadNLP)
For balanced performance: Consider (ADNLPModels, MadNLP) or (JuMP, Ipopt)

This example illustrates why performance profiles are valuable: they reveal the full spectrum of performance (efficiency, robustness, scalability) in a single plot, enabling informed decisions based on your specific requirements.

Interpreting Performance Profiles

Visual Elements

A typical performance profile plot shows:

X-axis: Performance ratio $\tau$ (log scale, base 2)
- Values like 1, 2, 4, 10, 50, 100
- Factor by which a solver is slower than the best
Y-axis: Proportion of solved instances $\rho_s(\tau)$ (linear scale, 0 to 1)
- Displayed as percentages: 0%, 10%, ..., 100%
Curves: One curve per solver-model combination
- Each curve is piecewise constant and non-decreasing
- Markers help distinguish curves

Reading the Plot

At $\tau = 1$ (leftmost point)

The height of a curve at $\tau = 1$ indicates the proportion of instances where that solver-model was the fastest.

Example:

If (JuMP, Ipopt) has $\rho(\tau=1) = 0.6$, it means JuMP+Ipopt was the fastest on 60% of instances

At intermediate $\tau$

The height at $\tau = 2$ indicates the proportion of instances where that solver-model was within a factor 2 of the best.

Example:

If (ADNLPModels, Ipopt) has $\rho(\tau=2) = 0.8$, it means ADNLPModels+Ipopt solved 80% of instances within twice the time of the best solver

Plateau (rightmost part)

The plateau represents the proportion of instances successfully solved by that solver-model, regardless of performance.

Example:

If (ExaModels, MadNLP) plateaus at 0.95, it means ExaModels+MadNLP solved 95% of all instances (but failed on 5%)

Common Patterns

Pattern 1: Clear Winner

Solver A: ρ(1) = 1.0, plateau at 1.0
Solver B: ρ(1) = 0.0, plateau at 1.0

Solver A was faster on all problems, and both solved all problems. Solver A is clearly preferable.

Pattern 2: Fast but Not Robust

Solver A: ρ(1) = 0.7, plateau at 0.75
Solver B: ρ(1) = 0.3, plateau at 1.0

Solver A is faster on most problems but fails on 25% of them. Solver B is slower but solves everything. Trade-off between speed and robustness.

Robustness: Does the curve reach 100%? If not, which problems failed?

Dataset overview
- Problems: number of distinct optimal control problems.
- Instances: $N = \#\{(\mathrm{problem}, \mathrm{grid\_size})\}$.
- Solver combos: $S = \#\{(\mathrm{model}, \mathrm{solver})\}$.
Profile configuration
- Printed directly from the stored PerformanceProfileConfig{M}:
  - instance definition (columns used to define an instance),
  - solver-combo definition (columns used to define a solver combination),
  - name of the criterion (here: CPU time in seconds).
Successful runs
- A run is a pair (instance, solver combination).
- Successful runs reports $\#\{(p,s) : \text{run } (p,s) \text{ succeeded}\} / (N \times S)$.
- This is the fraction of the full instance–solver grid for which we have a successful metric value.
Successful / unsuccessful instances
- An instance is called successful if at least one solver combination succeeded on it.
- Successful instances reports $\#\{p : \exists s, \text{run } (p,s) \text{ succeeded}\} / N$.
- Unsuccessful instances are those for which no solver combination succeeded; they are listed explicitly.
Robustness (per solver combination)
- For each solver combination $s$, robustness is $\#\{p : \text{run } (p,s) \text{ succeeded}\} / N$.
- This is the percentage of instances that each solver combination manages to solve at least once.
Efficiency (per solver combination)
- For each solver combination $s$, efficiency is $\#\{p : r_{p,s} = 1\} / N$, i.e., the proportion of instances where this solver combination achieves the best metric (ratio $r_{p,s} = 1$).
- This corresponds to the height of its performance-profile curve at $\tau = 1$.
Most robust / most efficient
- The analysis identifies the solver combination(s) with the highest robustness and the highest efficiency and reports ties explicitly.

Together with the performance-profile plot, this textual analysis helps to separate clearly:

how often each solver combination succeeds (robustness),
how often it is best (efficiency),
and how many instances are intrinsically difficult (unsuccessful for all combinations).

Performance Profiles

The General Concept

Motivation

Mathematical Definition

Performance Ratio

Performance Profile Function

Key Properties

Performance Profiles in CTBenchmarks.jl

Adaptation to Our Context

Instances

Solver-Model Combinations

Performance Metric

Performance Profile Definition

Treatment of Failures

Implementation overview

Optimal Control and Discretization

What are Optimal Control Problems?

The Direct Method: From OCP to NLP

Discretization Scheme

Grid Size and Problem Instances

Available Optimal Control Problems

Models and Solvers

Example: Performance Profile Plot

Reading This Example

Interpreting Performance Profiles

Visual Elements

Reading the Plot

At $\tau = 1$ (leftmost point)

At intermediate $\tau$

Plateau (rightmost part)

Common Patterns

Pattern 1: Clear Winner

Pattern 2: Fast but Not Robust

Pattern 3: Very Small Factors

Pattern 4: Multiple Solvers (>2)

What to Look For

References