GCC gOlogy: studying the impact of optimizations on debugging

Alexandre Oliva <aoliva@redhat.com> - 2019-03-12 v1.0.1 (*)

-g-Ology, or gOlogy, stands for the study of how optimization levels (selected by -O flags) affect the quality of debugging information (enabled by -g flags). This report assesses the theoretical and practical impact of various optimizations available in the GNU Compiler Collection version 8 on the debugging experience of applications compiled by it. The goal is to assess the quality of the debug information generated by GCC with optimization enabled, document the effects of optimization passes, and identify and document problems and opportunities to improve it.

GCC offers various optimization levels, from -O0 to -O3, plus -Og, -Osize and -Ofast, and way over a hundred independently-controllable optimization flags. Each of the optimization levels enables a subset of the optimization flags; enabling debugging information generation, on the other hand, is not supposed to have any effect whatsoever on the executable code. This report focuses on flags that are enabled by the -O* options, and their effects on (extended) DWARF debug information generated by GCC.

This report is structured as follows. The introduction outlines how GCC gets from source code to output assembly code and debug information, the major internal representation forms used throughout compilation, and several techniques used by GCC to keep track of the mapping from internal representations and output executable code to corresponding source code concepts. Then, the bulk of the report goes through each of the -O flags, and in each of them, through the optimization passes that are enabled or affected by the -O flag, describing the general behavior of the pass and what effects it may have on debug information. The final section highlights and consolidates the most relevant findings.


In GCC, language front ends parse a translation unit and deliver to the so-called middle end a number of functions (procedures, methods, subprograms) to compile in a form that, although language-independent, closely resembles a parse tree. Each function then goes through a number of passes, some of which are only executed when certain optimization flags are enabled, or other conditions are met.

The tree form is turned into gimple form, in which each function amounts to a set of basic blocks in a control flow graph, each containing a sequence of stmts represented as tuples. A stmt may be a label definition, a simple assignment, a function call, a conditional or unconditional branch, an asm statement, debug binds or markers, or other less common forms. Scalar variables are versioned and converted to static single assignment (SSA) form, in which each reference to a variable takes a version that links it back to a single definition of that variable version. Additional definitions, called PHI nodes, may be introduced at confluence basic block, indicating which version is to be taken when arriving from each incoming block. This is the form in which most of the optimization passes in GCC take place.

Each function is then expanded to the register transfer language (RTL) form, in which basic blocks are now formed by a sequence of insns, each one corresponding to a machine instruction defined in the target back end, or other machine-independent forms such as debug binds and markers, notes and other forms not relevant for this report. Each insn may contain zero or more computations represented as SETs (one of which may set PC to indicate a branch), a CALL, an ASM, and indicators that additional registers or memory can be used or unpredictably modified. Scalar variables are initially assigned to pseudo-registers, and many RTL optimization passes operate in this form. Register allocation will then map each remaining pseudo-register to a hardware register (if optimizing) or a stack slot, adding spills and reloads as needed to satisfy the requirements of each hardware instruction.

A few RTL passes run after register allocation, and at the end assembly code is output for each insn, while outputting debug information that is to be interspersed with the assembly code, and gathering debug information that is consolidated and output afterwards.

Preserving debug information

There was a time when debugging required disabling optimizations. Debug information formats back then could only assign a single location to each variable, and optimizing out the frame pointer would remove the base reference for all stack-based variables.

GCC has long had the notion that enabling debug information should not cause any changes to executable code. To that end, each stmt and insn carries source location information, i.e., file and line (and, more recently, column) numbers and lexical blocks, even when debug information is not enabled. Without optimization, this makes for single-stepping in a debugger just in the natural order of execution, and all variables are assigned stable memory locations, which makes for a single location per variable throughout its lifetime.

Optimizations introduce complications, combining, simplifying and removing computations, modifying the order of execution, reusing registers and stack slots, duplicating portions of code, introducing alternate induction variables and modifying the iteration order in loop nests. Compiler and debug information formats have evolved over time so as to enable optimized programs to be represented and debugged, with varying levels of success.

For example, automatic variables in optimized programs may live in a register for some time, another register at another time, and a stack slot at other times. DWARF debug information supports location lists, that may indicate a different location for a variable for different, possibly-overlapping executable code ranges. Memory references in gimple and RTL forms carry symbolic expressions used for alias analysis, and also to build location lists; SSA versions, RTL pseudo-registers and hardware registers also carry symbolic references to the variables they refer to. The variable tracking pass identifies, using such symbolic references, situations in which the location of a variable varies throughout its lifetime, and arranges for location lists to be output accordingly.

As location expressions gained the ability to represent value expressions, it became possible to indicate that in a certain range a variable holds a known constant value, or that its value is not available directly, but it can be computed from other locations. Variable tracking at assignments extended variable tracking, introducing debug binds early in compilation that associate a scalar source variable with the location in which its value is stored, arranging for the location/value expressions to be adjusted throughout the compilation (even if computations are removed or moved past the binds, so that the bound value expressions remain accurate) while preserving their natural execution order, and using such binds to generate location lists.

Although each stmt and insn carries source location information, as they're shuffled by optimization, single-stepping may go back to earlier statements, and it becomes impossible to tell when the effects of a statement are complete. Statement Frontier Notes (SFN) are introduced as additional debug notes, emitted (so far only by C and C++ parsers) in the stmt stream to mark the beginning of logical statements, thus after any debug binds associated with previous statements take effect. Their natural execution order is retained by the compiler, so the markers can be used to output source location information marked as recommended stop points (the is_stmt flag in DWARF line number tables), avoiding bouncing and making for predictable observability of side effects.

Given optimization, it is not uncommon for no executable code to remain between inspection points for multiple neighbor statements. This was a problem because, although multiple source locations can be associated with a single address in the line number table, ranges in location lists could only name addresses of executable instructions. Location view (LVu) numbering was introduced to identify each of the entries in the line number table that refer to the same code address, so that they can then be referenced unambiguously in location lists. The representation of such extended location lists requires extensions proposed for DWARF v6, and at the time of this writing, there aren't any debuggers that support such extended location lists. Still, since GCC makes the information available and we expect debuggers to catch up eventually, the analyses that follow assume the disambiguation given by LVu is effective in masking the optimization effects it was created to overcome.

Despite all this effort, it is not realistic to expect the debug experience of a program without optimization to be the same as that of a program optimized even by optimizations regarded as not affecting debugging. For example, a variable assigned to an exclusive stack slot will be available throughout a function, but optimization may assign it to a register during its limited live range, and then it won't be possible to inspect it elsewhere. Setting breakpoints based on addresses of executable code may not work as effectively in optimized programs, because the same spot of the program may have been duplicated by optimization, and then the breakpoint may not hit where expected. Having the value of a variable available in a given locations, say its stack slot, does not guarantee it is possible to modify it, say it could have just been loaded into a register, that may then be modified by the program and stored back in the stack slot; this might happen even without optimization, but the windows for this possibility are narrower. Furthermore, folding that logically follows from reasoning about what is known about a variable at compile time may no longer be applicable if the variable is modified in the debugger; if a block was removed because the condition guarding it was provably false at compile time, changing a variable so that the condition would evaluate to true will not bring back the code that was optimized out.

So, inspecting variables in optimized programs is more likely to yield "optimized out" because optimizations may expose dead ranges that are not noticed with -O0, and modifying them may always conflict with optimizations. As for breakpoints, using source locations rather than code addresses is less likely to yield surprising results.


In this section, each optimization level is detailed, enumerating the flags incrementally enabled by it over the previous level, and detailing the effects on debugging brought about by each of the optimization levels and flags.

Optimization levels form a nearly-strict crescendo in terms of passes they activate: -O0, -Og, -O1, -Os, -O2, -O3, -Ofast.

Nevertheless, determining when a pass is run is an involved process. Each pass has a gate function, that decides whether to run the pass based on optimization levels and flags. The default_options_table array in gcc/opts.c arranges for flags to be enabled depending on the optimization level, but some flags are enabled by default through their initializer in e.g. gcc/common.opt. Some are also forced enabled or disabled depending on other conditions. However, even if the gate condition of a pass is enabled, it might not run if any enclosing pass group fails its own gate condition.

The following outline depicts the optimization passes GCC goes through while compiling a function, in the order they might run; the information is extracted from gcc/passes.def. Indentation indicates grouping of the indented passes within the previous less-indented pass group. Parameters for the pass are indicated between parentheses after the pass name.

      pass_lower_omp see -O1
      pass_lower_eh see -Og
      pass_expand_omp see -Og, and -O1



              pass_ccp(!nonzero) see also --tree-bit-ccp, and --ipa-bit-cp
              pass_cd_dce see also --tree-dce(aggressive)
              pass_cleanup_eh see -Og
              pass_profile see --guess-branch-probability

                  pass_fre see above
                  pass_expand_omp_ssa see -Og, and -O1


      pass_ipa_devirt see also --devirtualize-speculatively
      pass_ipa_cp see also --ipa-bit-cp, --ipa-vrp, and --ipa-cp-clone
      pass_ipa_inline see -Og, --inline-functions-called-once, --inline-small-functions, --indirect-inlining, -Os, -O2, and -O3


          pass_ccp(nonzero) see above
          pass_complete_unrolli see also --tree-loop-ivcanon
          pass_forwprop see above
          pass_fre see above
          pass_merge_phi see above
          pass_dce see above
          pass_merge_phi see above
          pass_phiopt see also --hoist-adjacent-loads
          pass_tail_recursion see above
          pass_ch see above
          pass_thread_jumps see above
          pass_dominator(may_peel_loop_headers) see above
          pass_dse see above
          pass_dce see above
          pass_forwprop see above
          pass_phiopt see above
          pass_ccp(nonzero) see above
          pass_lim see above
          pass_pre see also --code-hoisting, --tree-tail-merge, and --tree-partial-pre
          pass_dce see above
              pass_cd_dce see above
              pass_loop_distribution see also --tree-loop-distribute-patterns
              pass_copy_prop see above
                  pass_lim see above
                  pass_copy_prop see above
                  pass_dce see above
              pass_expand_omp_ssa see above
              pass_ch_vect see also --tree-loop-vectorize
              pass_vectorize: see also --vect-cost-model=cheap, and --vect-cost-model=dynamic
                  pass_dce see above
              pass_complete_unroll see also --tree-loop-ivcanon, and --peel-loops
              pass_slp_vectorize see also --vect-cost-model=cheap, and --vect-cost-model=dynamic
              pass_lim see above
              pass_slp_vectorize see above
          pass_lower_vector_ssa see -Og
          pass_reassoc(!insert_powi) see above
          pass_strength_reduction see also --expensive-optimizations
          pass_thread_jumps see above
          pass_dominator(!may_peel_loop_headers) see above
          pass_thread_jumps see above
          pass_vrp(!warn_array_bounds) see above
          pass_phi_only_cprop see above
          pass_dse see above
          pass_cd_dce see above
          pass_forwprop see above
          pass_phiopt see above
          pass_fold_builtins see -Og, and --inline-atomics
          pass_dce see above
          pass_local_pure_const see above
          pass_lower_vector_ssa see above
          pass_ccp(nonzero) see above
          pass_fold_builtins see above
          pass_copy_prop see above
          pass_dce see above
          pass_split_crit_edges see above
          pass_uncprop see above
          pass_local_pure_const see above
      pass_lower_vector see -Og
      pass_cleanup_eh see above

      pass_expand see -Og, --tree-coalesce-vars, --tree-ter, --defer-pop, and --expensive-optimizations

          pass_jump see -Og, and --thread-jumps
          pass_df_initialize_opt see -Og
          pass_cse see also --expensive-optimizations, --rerun-cse-after-loop, and --cse-follow-jumps
          pass_rtl_cprop see above
          pass_cse_after_global_opts see also --cse-follow-jumps
              pass_rtl_move_loop_invariants see also -Og
          pass_rtl_cprop see above
          pass_cse2 see also --cse-follow-jumps
          pass_combine see also --expensive-optimizations
          pass_ira see -Og, --ira-share-save-slots, --omit-frame-pointer, -Os, --expensive-optimizations, --caller-saves, --ipa-ra, and --lra-remat
          pass_reload see -Og, and --expensive-optimizations
              pass_thread_prologue_and_epilogue see -Og, and --shrink-wrap
              pass_jump2 see --crossjumping
              pass_fast_rtl_dce see also Og
              pass_reorder_blocks see also --reorder-blocks-algorithm=stc
              pass_compute_alignments see --align-loops, --align-jumps, --align-labels, and --align-functions
              pass_shorten_branches see -Og
              pass_final see -Og, --peephole, and --ipa-ra

Before optimizations, the program is parsed so as to build a tree representation, that is then gimplified.

Some optimization passes run such cleanup passes as TODO_cleanup_cfg, TODO_rebuild_alias, and TODO_remove_unused_locals.

There are other flags that affect too many passes to mention, such as --strict-aliasing, --merge-constants and --fast-math, or that cannot be associated with any optimization pass, such as --reorder-functions.

-O0: optimize=0

Disable optimization.

This flag sets optimization level to 0. This is the base level, the golden standard for the debugging experience, against which other levels are compared. All automatic variables and parameters are allocated to memory, being loaded and, if modified, stored back, at every use. All branches and labels are preserved, and no blocks are duplicated. Functions are not inlined, except for mandatory inlines, e.g., functions marked with attribute always_inline. Source locations preserved from branches or returns only in CFG edges are materialized as NOPs.

-Og: optimize=1 + debug

Perform only very fast optimizations with low impact on debugging.

This flag sets the optimization level to 1, but limited by an option for better debugging that disables a number of optimizations, even some that would otherwise be enabled at optimization level 1.


Optimization enables the selection of the local dynamic TLS model to access thread-local variables known to be defined in the dynamic module being compiled. Without that, the global dynamic TLS model is used instead, but this change has no effect on debugging.

Type conversions attempt to substitute conversions to float of results of standard calls that return double to calls that return float. Likewise, conversions to integral types of results of standard calls that return double (e.g. round, logb) are converted to calls that return integral types (lround, ilogb). These only affect debugging inasmuch as the behavior of the substituted functions is to be inspected.


Small changes in the processing of nested functions that enable frame structs and static chains to be optimized away, without impact on debugging, and in representing variable-length arrays in nested functions, which may lose some details about the types.

pass_expand_omp, and pass_expand_omp_ssa

Some OpenMP primitives may also be simplified when optimization is enabled. These are internal implementation details, so they shouldn't affect debugging.


Gimple EH lowering decisions change with optimization, but finally regions may be duplicated either way, and with the same minor effects on debugging: different code addresses for the same source code lines.

pass_split_crit_edges, and pass_cleanup_eh

Critical edges are also split to ease optimizations, and later unsplit if they remain.


Optimization affects slightly the way variables and parameters are remapped when inlining, but these changes have their effects on debug information masked away.


When optimizing, various passes run cleanups of the control flow graph. This may delete unreachable blocks and trivially dead insns like unused sets or copies to self. In gimple mode, the removal of unreachable blocks may propagate SSA defs to uses, but it is hard to imagine that any uses thereof will be reachable, so there should be no impact on debugging. Removed blocks may be missed during debugging: breakpoints can't be set in removed blocks. Cleanup may renumber basic blocks, detect forwarder blocks, remove unused labels and fallthrough forwarder blocks, merge blocks with unconditional fallthrough, replace jumps to returns or jumps with copies of the targets, simplify conditional jumps and remove single-destination jumps. The removal of fallthrough forwarder blocks may discard debug binds and markers, which could make single-stepping or breaking at the source locations represented by the removed markers impossible. Binds might also be lost, though at least in gimple there will often be redundant binds at confluence points, shortly thereafter. A similar negative effect arises when a jump is replaced with a return or another jump, bypassing any debug markers and binds at the original target's block.

When optimizing, NOPs that would materialize CFG edge source locations are not inserted, and extra steps that preserve source locations during gimplification of jumps and labels are not taken. If corresponding debug markers are also dropped, this may remove the possibility of stopping at some goto.


Optimization enables unused local variables and lexical blocks to be released early; it may cause variables and scopes that cannot ever be entered to be omitted altogether from debug information.


Optimization enables the named return value pass, that detects functions that return aggregate types in memory, always returning the same local variable, and unifies that variable with the result, using the name and source location of the variable, and mapping all uses of the variable to the result. This may have an effect on debugging if the variable happens to be taken from an inlined function: in this case, the source name and location mapping is skipped, because it would introduce a name not present in the original function, but the variable is still remapped to the return declaration, so the source location of the variable's declaration is lost.


Optimization enables a pass that combines calls to sin, cos and cexpi with the same SSA operand into a single dominating cexpi call, taking the real or imaginary part of the result at each former sin or cos call. This pass also attempts to simplify pow, powi and cabs calls. None of these affect debugging, aside from the ability to step into any of the affected math function calls.


With optimization, a pass that simplifies memcpy to memset if the copied-from range is known to be all zeros, some stdarg calls to simple pointer operations if va_list is a simple pointer type, and other similar transformations that do not affect debugging, aside from stepping into or breaking at simplified functions.

pass_lower_vector, and pass_lower_vector_ssa

Optimization enables attempts to optimize divide and modulus operations on vectors of integral types into combinations of vector multiply, shift, and add. It also enables attempts to optimize initialization of vectors to avoid piecewise initialization. None of these affect debugging.


Enabling optimization changes defer_stack_allocation behavior, but its effect on debugging is limited to narrowing the live ranges of dead values.

It also enables reordering of operations in expand, so that those requiring more operands are performed first. This reordering does not involve memory-modifying operations, and debug binds cover affected cases, so it does not affect debugging.

Expand also introduces plenty of pseudos when optimizing, which allows replacement of common subexpressions and whatnot. Conversely, gimplification introduces more temporaries when not optimizing, and it attempts to reuse temporaries when optimizing. The effects on debugging are limited to variations in variable location assignments.

pass_jump, and pass_thread_prologue_and_epilogue

The jump and pro_and_epilogue RTL passes run cleanup_cfg with CLEANUP_EXPENSIVE, given optimize. This performs some more expensive block merging, and simplification of conditional jumps around jumps. The merging has no effect on debugging (indeed, it could reduce the loss of debug markers and binds if done on forwarder blocks), whereas the simplification might drop markers and binds along with the jumps, with impact on debugging similar to that of the other jump simplifications.


Several RTL optimization passes also use dataflow analysis to update notes about unused register definitions, as well as death points of registers. Debug binds that reference registers after their death points or unused sets are detected during this analysis, and debug temporaries are introduced next to the death points to preserve the equivalent expressions for use in the debug binds. This generally improves the debugging experience, enabling bind expressions to resort to the equivalences to express the values bound to user variables even if the register is reused for another purpose and no longer holds the value.


The first CSE (common subexpression elimination) pass is enabled when optimizing. The effects of this pass are described under --rerun-cse-after-loop. A third CSE pass may be activated with --rerun-cse-after-global-opts.


Depending on the selected register allocation model, optimization changes register pressure cost estimates in the RTL loop analyzers, but that's not something that changes the kinds of optimizations made there, or the kinds of impacts on debugging they may have.


Optimization enables the init-regs pass, that adds zero-initialization for pseudos before uninitialized uses, without effects on debugging.


Optimization enables combine, a pass that performs arithmetic substitution of single-use pseudo-set insns into others. After successful substitution, insns become useless and are removed, but if their values are still used in debug binds, the binds are updated accordingly, and markers ensure the bind effects are still visible. Therefore, this pass has no effect on debugging.


It also changes the default register allocation region setting, without effects on debugging.


Optimization enables reload inheritance and removal of redundant reload stores, without effects on debugging.

pass_split_after_reload, and pass_split_before_regstack

Additional insn splitting passes are enabled after reload when optimizing, without any effects on debugging; any impact would have been brought about by later splitting passes anyway.


Several RTL optimization passes run a fast dead code elimination subpass, at the end of the live registers dataflow analysis, as long as --dce is enabled; see --dce(fast) for details.


Optimization enables variable tracking, debug binds and markers, to try to mask the effects of optimizations on debugging. They are not needed without optimization.


When optimizing, insn lengths are estimated with multiple passes that grow lengths as needed, which may result in shorter variants, without effects on debugging.


Final may discard redundant compares when optimizing. It also links back single-use labels to jumps to them, for use in machine-specific transformations such as SH's constant pool placement. These transformations have no effect on debugging.

--tree-ccp: pass_ccp

Enable SSA-CCP optimization on trees.

Conditional constant propagation attempts to determine the value of conditions that control conditional branches. It may simplify (fold) some calls and assigns into constant assignments, and turn conditional branches into unconditional ones, possibly dropping blocks that become unreachable.

The most significant effect on the debugging experience is that setting breakpoints at certain source code ranges may become impossible as the blocks containing them are dropped. The extra folding might make additional lines not be represented by any instructions, but SFN provides markers to stand for them, and VTA and LVu ensure the effects of the optimized-away code can be inspected even without remaining instructions, so the overall impact of this pass on the debugging information is likely negligible.

--tree-fre: pass_fre

Enable Full Redundancy Elimination (FRE) on trees.

This pass uses value numbering to identify and remove redundant SSA computations, replacing them with previously-computed results, while also propagating copies, removing dead computations, folding computations, and resolving conditional branches and indirect calls. Changes are only relevant for debugging sessions that would modify variables to create situations that wouldn't normally arise at runtime. The substitutions and folding have no effect on debugging, unless variables are changed in the debugger so as to break the equivalences. Stmt removals are masked by debug binds, markers and views. Resolving conditional branches may remove entire blocks if they aren't reachable to begin with, but the consequent inability to set breakpoints on them could be surprising, especially if the debugging session were to change variables so as to try to force the execution of the unreachable block. Resolving indirect calls to direct ones might also surprise attempts to modify pointers in a debug session, attempting to cause a different function to be called.

--tree-dse: pass_dse

Enable dead store elimination.

This pass removes stores and mem* calls that modify memory that is overwritten without intervening reads. Addressable variables, that might be modified by such removed stmts, are not tracked by debug binds, so debugging sessions might be confusing as expected effects of removed dead stores will not be observable.

--guess-branch-probability: pass_profile

Enable guessing of branch probabilities.

No effect on debugging per se.

--tree-ch: pass_ch, and pass_ch_vect

Enable loop header copying on trees.

This pass copies loop headers, turning the copies into entry tests. Debug binds in the copied blocks are also copied to the post-loop block, modeling the binds introduced after PHI nodes when entering SSA. With those additional bindings, duplicating the header blocks does not impact debugging significantly within the copied blocks or after them. One possibly confusing consequence is that setting a breakpoint at the current program counter, while single-stepping the loop entry test, will not break at subsequent iterations, and vice-versa. This is unlikely to be surprising, and setting breakpoints by line overcomes this effect. User labels, that would not be present in the copy, could make for further confusion, but if they provide for additional edges into the loop header, they will actually stop the transformation from taking place.

When --tree-loop-vectorize is enabled, another ch_vect pass is activated, that differs from the regular ch pass only in deciding which loops are to undergo such header copying, so both passes have essentially the same effects on debugging.

--tree-dce: pass_dce, and pass_cd_dce

Enable SSA dead code elimination optimization on trees.

This may remove assignments, branches and even some calls that are deemed unused/dead. Dead assignments are propagated into debug stmts before removal, which makes the removal itself not to affect debugging. Dead branches may cause entire blocks to be removed, making any expectation of stepping through or setting breakpoints at such blocks during debugging impossible to meet. Pure or const calls, as well as malloc and free pairs that are deemed dead may be removed, frustrating expectations of stepping into them during debugging.

--ipa-profile: pass_ipa_profile

Perform interprocedural profile propagation.

This pass propagates execution frequencies from callers to callees. Also, upon identifying the target of an indirect call from execution profiles, it introduces a speculative direct call that can then be inlined or otherwise optimized. None of this affects debugging.

--ipa-pure-const: pass_ipa_pure_const, and pass_local_pure_const

Discover pure and const functions.

Detect and mark functions on whether or not they have side effects, loop, or throw, and propagate the information to decide about callers. This, by itself, has no effect on debugging, but it may enable the elision of calls that would return the same value, without any other side effects, of functions that are not explicitly marked as pure or const, and this elision may be slightly confusing for debugging, as such functions may be called (and hit breakpoints) fewer times than expected, and stepping into elided calls will not be possible.

--ipa-reference: pass_ipa_reference

Discover readonly and non addressable static variables.

This pass analyses how static variables are used by functions, and propagates the gathered information to callers, so that it can be used in later optimizations. There aren't any effects on debugging.

--tree-copy-prop: pass_copy_prop

Enable copy propagation on trees.

This pass identifies and simplifies expressions based on copy-related SSA names. This may unify multiple variables into a single location, in ranges in which they take up equivalent values, making it impossible to modify them independently in the debugger. The identification of such equivalences may also resolve conditional branches to unconditional ones, removing entire basic blocks and the possibility of overriding the conditions in the debugger.

--tree-sink: pass_sink_code

Enable SSA code sinking on trees.

This pass moves statements down the control flow, closer to uses thereof, when it may be profitable, and removes them when they are unused. As the DEF is removed from a position that dominates a debug bind, the bind is adjusted, masking the effects on debugging, at least as far as scalars are concerned. Addressable variables are not subject to value tracking in debug binds, and so the delaying of stores may actually be observable during debugging.

--tree-slsr: pass_strength_reduction

Perform straight-line strength reduction.

This pass replaces computations involving multiplies into ones involving adds, in some cases introducing additional temporaries. In the end, trackable variables end up getting the same values, just computed in a different way, so this does not affect debugging.

--tree-coalesce-vars: pass_expand

Enable SSA coalescing of user variables.

This flag allows the compiler to assign to a single pseudo-register SSA versions originally created for different user variables. With the aid of debug binds, this has very little effect on debugging: the impact is limited to early loss of values expected to be about to be overwritten, e.g. when an earlier value of a variable is already dead, and the location holding it is overwritten by a value computed for a temporary or for another variable, before being copied to the former variable. Between the computation point and the binding point, attempting to inspect the variable may indicate it is optimized out at that point, which is perfectly accurate, if undesirable from a debugging perspective.

--tree-ter: pass_expand

Replace temporary expressions in the SSA->normal pass.

This substitutes singly-used SSA defs into their single (non-debug) uses for expand to have larger expressions to select insns from. Debug binds may end up with more complex expressions than needed, bound before the actual computation of the larger expression takes place, but this does not affect debugging.

--defer-pop: pass_expand

Defer popping functions args from stack until later.

No effect on debugging.

--split-wide-types: pass_lower_subreg, and pass_lower_subreg2

Split wide types into independent registers.

This flag enables two RTL lowering passes that explode wide-mode pseudos into multiple word-mode ones. In many cases this modifies insns in place, but it occasionally emits multiple insns to replace a single one. In no such case does it affect debugging. Such splitting may be performed on user variables, and although we can represent variable locations with independent locations for different fragments, such wide variables do not always get debug binds at assignments for tracking throughout compilation. Location inference from DECLs associated with REGs and MEMs is used for fragments of such variables instead, which does correctly identify locations, but not necessarily at points of the program that reflect the recommended inspection points. This may cause debugging sessions to observe changes to such variables too early or too late, which can make debugging confusing.

Adding debug binds for the fragments, and arranging for GCC to aggregate them back, might get more accurate information, but since this would be done at such a late stage, it is possible that the binds would be introduced at points that do not satisfy the usual expectation that side effects would take place between the markers immediately before and after the assignment. There are also issues with dismembered aggregates, mentioned under --tree-sra, that would likely affect such split variables as well.

--forward-propagate: pass_rtl_fwprop, and pass_rtl_fwprop_addr

Perform a forward propagation pass on RTL.

These RTL passes replace uses of a pseudo with its single reaching definition. This in itself has no impact on debugging. If a pseudo is propagated into all uses, it will become unused, but then it will have been substituted into debug binds as well and, if not, the unused def might end up preserved as a debug temp. There is a possibility that, by propagating a pseudo, it becomes dead earlier, and then, after register allocation, debug binds that referenced it while it was still set end up finding the register reused for other purposes earlier than without this transformation. Since the propagation found the source of the definition was available all the way to the propagation point, and the equivalence between the propagated pseudo and its definition is noted by the variable tracking machinery at the definition point, it is very likely that an alternate expression for the register value will be found.

--dse: pass_rtl_dse1, and pass_rtl_dse2

Use the RTL dead store elimination pass.

This flag is enabled by default, but it's only activated when optimizing. The RTL passes enabled by it remove stores in memory that are overwritten without intervening reads, that store the same value as the previous store, or that write a value to the stack that is not read before the function returns. Since it affects addressable variables, global or local, debug binds do not apply, and so the effects of removing these stores are going to be noticeable in debugging, except for the redundant stores.

--auto-inc-dec: pass_inc_dec

Generate auto-inc/dec instructions.

The flag is enabled by default, but it's only activated when optimizing, and when the target architecture supports auto inc or auto dec addressing modes.

It detects insns that add or subtract a constant or pseudo from a pseudo before or after the pseudo or a copy thereof is used in a memory reference, and it attempts to turn the memory address into a pre- or post-inc, -dec or -mod addressing mode. This may cause one of the pseudos to change earlier or later than expected, and although this is only done when the pseudo is not otherwise used between the original and modified modification insns, debug binds between them are not adjusted, so they will bind to the wrong value, and when the pseudo is modified even that incorrect location may be lost.

--ira-share-save-slots: pass_ira

Share slots for saving different hard registers.

The flag is enabled by default, but it's only activated when optimizing. It allows registers whose lifetimes do not overlap to be saved in the same slot across calls. This could shorten the apparent live range of variables, making them unavailable at spots in which they might be in the absence of this flag.

--omit-frame-pointer: pass_ira

When possible do not generate stack frames.

This flag attempts to avoid reserving and using a register as a frame pointer, using stack pointer-relative addresses as needed. A frame pointer register used to be essential for debugging, but call frame information obviated it: it is now irrelevant for this purpose, and this optimization has no effect on debugging.

--compare-elim: pass_compare_elim_after_reload

Perform comparison elimination after register allocation has finished.

This pass removes redundant compare insns, relying on insns that set flags as side effects instead. It has no effect on debugging.

--shrink-wrap: pass_thread_prologue_and_epilogue

Emit function prologues only before parts of the function that need it, rather than at the top of the function.

This pass attempts to inserts the prologue sequence at a later point than the entry point, which may involve duplicating some blocks and moving non-prologue early insns down to other blocks. The moved insns are simple enough that debug binds can be adjusted and mask the moves, so it does not affect debugging. Block duplication has little to no impact on debugging, though breakpoints set based on code addresses, rather than on logical locations, may notice the difference. The later prologue may confuse debuggers that assume the end of the epilogue, noted in debug information, marks the beginning of user code: such debuggers will likely be significantly affected by this optimization.

--combine-stack-adjustments: pass_stack_adjustments

Looks for opportunities to reduce stack adjustments and stack references.

This flag consolidates consecutive stack allocations, consecutive stack deallocations, or deallocations followed by allocations, within single blocks, adjusting stack pointer-relative addresses as needed. It has no effect on debugging.

--cprop-registers: pass_cprop_hardreg

Perform a register copy-propagation optimization pass.

This pass only replaces (pseudos assigned to) hard regs in SET_SRCs with earlier-defined equivalent values, and removes noop moves. Substitutions are made in debug bind insns too. So, aside from noop moves that stood for source lines on their own in non-SFN settings, this shouldn't affect the debugging experience in any way.

--dce(fast): pass_fast_rtl_dce

Use the RTL dead code elimination pass.

This flag is enabled by default, but the fast rtl_dce pass is only activated when optimizing. Insns are regarded as dead if they only set registers and none of them are live. Dead sets used in debug binds are preserved in debug temps, so this does not affect debugging.

--reorder-blocks: pass_reorder_blocks

Reorder basic blocks to improve code placement.

The reorder blocks pass attempts to increase the number of fallthrough edges by moving basic blocks. This may remove the possibility of breaking at explicit goto statements.

--delayed-branch: pass_delay_slots

Attempt to fill delay slots of branch instructions.

This pass moves insns about, attempting to fill delay slots on arches that support them, most often of calls, branches, jumps and returns. It runs after var-tracking, and it may move insns across debug bind notes that would be affected by it, potentially confusing location information. It may create opportunities for jumps to jumps to be redirected to the ultimate jump target, which may invalidate breakpoints that could have been set at the bypassed jumps. On a few arches, calls followed by jumps may have their delay slots filled with insns that modify the register holding the return address for the call, which may confuse debuggers as to the point of the call, including the recovery of entry-point values from the caller frame and location information.

Conditional markers might enable CFG simplifications without invalidating breakpoints, but failing that, it would probably be wise to disable this and return address adjustments at -Og.

--peephole: pass_final

Enable machine specific peephole optimizations.

This flag is enabled by default, but it is only activated if optimization is enabled, on machines that define peepholes, not to be confused with the newer peephole2, handled by --peephole2.

Unlike peephole2, these older peepholes recognize sequence of insns during the final pass and output assembly code directly. Any debug notes between insns that are recognized as a peephole group are moved before or after the peephole output, which keeps markers mostly correct, but may corrupt binds.

--merge-constants: varasm

Attempt to merge identical constants across compilation units.

With this flag, constant pool entries and other constants that do not amount to objects that may have their addresses taken and compared (or --merge-all-constants is given, requesting even such read-only objects to be merged), are emitted in mergeable sections so that the linker can detect and remove duplicates. This may affect debugging inasmuch as the address/identity of the unified objects matters; since so-unified objects are usually string literals and initializers, rather than user-visible variables, this should seldom if ever affect debugging.

-O1: optimize=1

Perform only very fast optimizations.

This option sets the optimization level to 1.

pass_lower_omp, pass_expand_omp, and pass_expand_omp_ssa

With -O0 or -Og, the maximum vectorization factor for OpenMP is limited to 1. At -O1 or higher, target-specific vector sizes are used instead.


Basic blocks containing only PHI nodes, debug binds and markers may be dropped altogether by the mergephi pass. Dropping markers could make some statements impossible to stop at when stepping, and dropping binds makes their side effects not visible, so that earlier binds seem to remain effective. It might be possible to move the binds and markers into the destination block so as to keep them as conditionals.


Pairs of tests guarding conditional blocks in && or || arrangements may be combined into a single test by the ifcombine pass. The block holding the second test becomes unconditional, so any markers and binds in it will take effect even when they shouldn't. Further optimizations are enabled if the then block is a forwarder to the else block, or vice-versa (a forwarder block is empty except for phi nodes, debug binds and markers). These may further confuse debugging changing the situations in which the forwarder's binds and markers take effect. Conditional binds and markers may alleviate these problems.


The laddress pass lowers address-taking operations that are not invariant, so as to expose the computations involving offsets and array indexing to optimizers. It has no effect on debugging.

--tree-bit-ccp: pass_ccp

Enable SSA-BIT-CCP optimization on trees.

This flag modifies slightly the behavior of the SSA tree-ccp pass, so that it keeps track of individual bits in SSA registers, rather than just entire registers. This allows some further simplifications, especially of conditional branches based on individual bits.

This does not introduce any new kind of impact on the debugging experience but it may make further blocks unreachable and thus unavailable for breakpointing, and further assignments reduced to reuse of constants without additional code.

--tree-forwprop: pass_forwprop

Enable forward propagation on trees.

This pass, enabled by default but activated only at -O1 or higher, is run up to 3 times on each function. It substitutes expressions assigned to SSA names into uses thereof, folding statements in place. This doesn't affect debugging, but other transformations made by these passes do. Loads of complex types whose real or imaginary parts are used separately are broken up into separate component loads, but debug binds referencing the complex value loaded from memory are reset, degrading debug information: the bind stmt might be adjusted instead. Stores of complex values are also split up, without effect on debugging. Expressions taking the address of variables, and possibly adding offsets to them, may be substituted into indirections, enabling variables to become non-addressable and turned into SSA form, as in --tree-phiprop. The conditions in conditional branches may be folded to constants, which changes the control flow graph and can render entire blocks unreachable. Likewise, simplifications in switch expressions may rule out some case targets. It may combine memcpy and memset calls to neighbor ranges into a single memcpy, which may affect debugging if the pointer returned by the memset call is referenced in debug binds. Additional specialized transformations involve bit rotations, permutations, bitfield refs and vector constructors, but none of these affect debugging.

--tree-sra: pass_sra_early, and pass_sra

Perform scalar replacement of aggregates.

This flag enables passes that turn members of aggregates that would normally live in memory into stand-alone scalars that can be optimized like registers. The original aggregate object may in some cases be fully taken apart, but when it is still used as a whole, the scalar is "spilled" back in place and "reloaded" as needed.

After assignments to the scalar introduced by these passes, as well as spills and reloads, debug binds are introduced so that var-tracking can keep track of the fragments of the aggregate, so this pass should be transparent as far as debug information is concerned.

Unfortunately, there are problems or limitations in the var-tracking pass that cause us to not use the annotations for the scalarized members, at least in cases in which the aggregate as a whole is small enough to be regarded as an SSA register. Some investigation to var-tracking is needed to determine how to use at least the conflicting notes that apply to both the whole aggregate and the scalarized member, but this may turn out to show significant shortcomings in VTA (variable tracking at assignments) and require some work to make use of the available annotations so as to bring debug information quality of (fully- and?) partially-scalarized aggregates in line with that of scalars.

Another notable limitation introduced by this pass is that dismembered aggregates can no longer be used in inferior calls that expect references or pointers.

--tree-loop-im: pass_lim

Enable loop invariant motion on trees.

Although this flag is enabled by default, the pass is omitted from the set of passes activated at -Og, so it is only run at -O1 or higher.

This pass moves invariants out of loops, and performs store motion. Floating-point divides and shifts for bit tests may have invariant divisors and shifted bits rearranged for hoisting, without impact on debugging.

Access to memory at an invariant address may be turned into a SSA scalar, with a load at the loop entry and a store at the loop exit; such early loads and delayed stores may be confusing for debugging.

Invariant computations are moved to the edge into the loop from the preheader, after being removed from their original position. The removal triggers propagation into debug binds, which preserves bind equivalences but drops the actual location, and becomes more fragile. With a bit of additional effort, it would be possible to keep the binds unchanged. Still, this movement should have little to no impact on debugging.

--tree-dominator-opts: pass_dominator, pass_phi_only_cprop, and pass_uncprop

Enable dominator optimizations.

Although this flag is enabled even at -Og, the passes controlled by it are omitted from the set of passes activated at -Og, so they are only run at -O1.

It propagates constants and copies into uses, folds expressions, attempts to resolve conditionals, eliminates redundant computations and redundant stores, replaces inequalities with equality tests, propagates coalescible SSA names equivalent to PHI values incoming from each edge, propagates and removes degenerate PHIs, and performs jump threading.

The only transformation that has any significant effect on the debug experience, given that VTA, SFN and LVu mask the effects of the others, is jump threading. See the effects of (gimple) jump threading under --tree-vrp.

--inline-functions-called-once: pass_ipa_inline

Integrate functions only required by their single caller.

This option works as an enabler for certain cases of inlining, in that, if this option is disabled, or optimization is disabled, for a function or for any of its callers, and no other flag or attribute mandates or enables inlining, then the possibility of inlining into all callers and not emitting an out-of-line copy will not even be considered. Oddly, the "called once"/"single caller" bit seems to be a left-over artifact of earlier implementations: there doesn't seem to be any test involving the caller count in the inlining code paths activated by this flag.

Inline substitution, per se, is not usually a significant source of debug information degradation: any piece of debug information that could be represented in the out of line function can be and is equally represented for each inlined copy. Potential loss arises out of debug-lossy optimizations, when performing transformations that are enabled or strengthened by the additional information available when analyzing both the caller and the callee in a single context. For example, the inline expansion of a function within a loop that is unrolled may face significant ambiguity as to how many inlined copies of the function are there, how far scopes in each copy extend, especially if instructions of different iterations are shuffled together by e.g. modulo scheduling.

Another situation in which inlining may affect the debug experience significantly is that of heavy use of abstraction calls. As large numbers of nearly empty, abstraction-only functions are inlined, the density of code vs debug annotations becomes low, and the risk of hitting upper limits on debug annotations counts grows. When they are hit, such annotations as debug markers and binds may be dropped, removing the compiler's ability to mask the effects of optimizations on debugging. The loss of markers removes the linearity of single-stepping and the robustness of the relationship between source locations in the program and observable effects that they bring. The loss of debug binds takes with it much of the possibility of observing variables not held in stable memory locations. Such degradation, that takes debug information back to the days in which the debugging of optimized programs was reasonably held to be unreasonably difficult, may sometimes be avoided at the expense of significant compile time and memory, using such parameters as "max-debug-marker-count", "max-vartrack-size", "max-vartrack-expr-depth", and "max-vartrack-reverse-op-size".

--ssa-backprop: pass_backprop

Enable backward propagation of use properties at the SSA level.

This flag is enabled by default, but the pass is only activated at -O1 or higher.

It detects numeric variables whose sign does not matter, and optimizes away operations that affect only their sign. Debug binds referencing modified SSA DEFs are adjusted when possible, but since some cases involve function calls and those do not belong in debug binds, some binds may be lost, and others, especially after PHI nodes, may be bound to expressions that have their signs reversed, which may be confusing.

--tree-phiprop: pass_phiprop

Enable hoisting loads from conditional pointers.

This pass, enabled by default but activated only at -O1 or higher, replaces phi nodes whose incoming args all take the address of a scalar value, and are later dereferenced, into phi nodes that take the scalar values directly. The pass makes sure that the loaded memory values cannot change between the load points, original and optimized, but this transformation might affect debugging if it involves modifying any of the affected memory variables, as the values may have already been loaded. It may also cause a variable that was addressable to become non-addressable and promoted to an SSA register. Debug binds would only be assigned at the time of this promotion, which may be too late to capture assignments that might have already been moved or optimized out. As a result, such variables, promoted to non-addressable, will have worse location tracking than scalar variables that never have their address taken, but no worse than if they had remained addressable all the way.

--tree-pta: pass_build_alias, pass_build_ealias, and TODO_rebuild_alias

Perform function-local points-to analysis on trees.

This just computes more refined alias sets, it doesn't make any transformations, so whatever effects it might have in the debugging experience are indirect.

--stdarg-opt: pass_stdarg

Optimize amount of stdarg registers saved to stack at start of function.

The code enabled by this flag estimates the maximum sizes of general-purpose and floating-point registers areas used in a stdarg variable argument list function, so as to limit the number of registers that need to be saved. This does not affect debugging.

--tree-builtin-call-dce: pass_call_cdce

Enable conditional dead code elimination for builtin calls.

Although this flag is enabled even at -Og, the pass is omitted from the set of passes activated at -Og, so it is only run at -O1.

This pass replaces builtin calls with simpler operations, and/or guards the operation by conditions that decide whether or not to execute the call, replaced or not. This may be slightly confusing when setting breakpoints at the omitted calls, or attempting to single-step into them.

--tree-cselim: pass_cselim

Transform condition stores into unconditional ones.

This flag is enabled by default when there is a conditional move instruction, but the pass is only activated at -O1 or higher.

The pass moves gimple stores in conditional blocks to subsequent join blocks, introducing PHI nodes to select the value to be stored. Addressable variables rely on var-tracking (MEM annotations) rather than var-tracking-at-assignments debug binds, so moving stores cause observable changes in the debug experience: if a variable that should be modified by a store is inspected after the expected store point, but before the replacement store is executed, an outdated value will be found.

I wonder if it might be possible to insert debug binds to temporarily override the location of variables that live in memory most of their lifetime, so that such deferred writes could be reflected in location lists, and observed immediately through such a bind, in spite of the deferred execution of the store.

As in --hoist-adjacent-loads, the moves could leave the conditional blocks empty, which could make it impossible to set breakpoints at lines within them or to single-step into them, as SFNs get dropped along with the removed blocks. Unlike the combined stores from if/then/else structures, sunk stores from else-less then blocks (or from else blocks with empty then blocks) retain their location information, so one might be able to stop at them even when the conditional block to be executed does not include that line. This can all get confusing, and it could be alleviated with conditional binds and markers.

--ssa-phiopt: pass_phiopt

Optimize conditional patterns using SSA PHI nodes.

This pass performs various transformations (see --hoist-adjacent-loads for more) that may drop small or empty conditional blocks, combining a test and a conditional assignment (represented as a PHI node) into a flag-store, an abs, min, or max expr. If a temporary is needed, it may be cloned from the phi result, but that will then be placed in one of the operands of the original PHI node, so any debug binds referencing the original result remain correctly unchanged. The potential negative impact on the debug experience of these transformations is limited to the removal of a conditional block, with diminished ability to step into the block or set breakpoints in it, and the potential of an early (temporary) overwrite of the location of the variable that will eventually hold the join value, which might make the variable impossible to inspect or modify after such overwrite. The 3-way min-max cases do not change this picture much, except for the possibility of loss of visibility of the result of the intermediate assignment, as bind and marker are removed along with the conditional block.

Another situation in which a conditional block may be eliminated is that in which both edges out of the condition yield the same value for the PHI (e.g. x != a ? a : x simplifies to a). Such simple cases of value unification have just the usual impact of removing a conditional block, but more elaborate cases, with multiple assignments computing the result of the conditional block, have the assignments, but not markers or binds, moved out of the conditional block, with the usual consequences of difficulty of stepping into the removed block, or inspect the results of computations whose debug binds were dropped, before the debug binds at a subsequent join point, if any.

Yet another transformation is factoring a conversion out of a PHI node. If both incoming edges perform the same conversion, or if one is a constant and moving the conversion after the join is still found potentially profitable for enabling other optimizations, a new PHI is introduced with type and values prior to the conversion, the original conversions are removed, a new conversion stmt is introduced at the top of the join block, storing in the original PHI result, and finally the original PHI def is removed. This transformation does not remove any block, the original conversions can be propagated into any debug binds, and the new conversion (without location information) is inserted before the debug bind of the original PHI node. The final removal of the original PHI node does not reset debug binds, because we skip propagation into binds upon PHI node removal, and the conversion assignment becomes the new definition. The moved conversions can still be inspected, thanks to SFN and VTA, and the converted value is bound to the variable that takes that value at the join point too, so this transformation does not affect the debug experience.

--tree-reassoc: pass_reassoc

Enable reassociation on tree level.

Although this flag is enabled by default, the pass is omitted from the set of passes activated at -Og, so it is only run at -O1 or higher.

This patch rearranges multiple stmts that perform the same operation, say addition, ordering operands by rank and issuing multiple operations in parallel when that's advantageous. This ends up removing nearly all of the original stmts and issuing new ones, using new SSA names. Debug binds retain the original operations, and markers allow them to be inspected when single-stepping. The reassociation might insert extraneous calls, however, e.g. turning repeated multiplies into powi calls; this might be slightly confusing if stepping into calls. Range tests in conditional branches may end up simplified, making the branches unconditional, and rendering some blocks unreachable, which prevents setting breakpoints in them.

--tree-loop-optimize: pass_fix_loops, pass_tree_loop, and pass_tree_no_loop

Enable loop optimizations on tree level.

This flag is enabled by default, but it is only activated when optimization at -O1 or higher is enabled.

When activated, this flag enables a pass that detects loops and gathers information about them. If the flag is activated and loops are found in a function, then various loop passes are run over that function; otherwise, only the pass enabled by --tree-slp-vectorize is.

--tree-scev-cprop: pass_scev_cprop

Enable copy propagation of scalar-evolution information.

This flag is enabled by default, but it is only activated when --tree-loop-optimize is activated.

If scalar evolution determines that a PHI node is invariant, replace uses thereof, including those in debug binds, by the invariant. This has no effect on debugging.

It also computes, through scalar evolution, the final value of variables modified in loops, dropping the PHI node in favor of a computation based on values known before the loop is entered. This may affect debugging when the removal of the PHI node resets a debug bind referencing it, but the bind could be preserved, since a new, equivalent definition will be introduced.

--tree-loop-ivcanon: pass_iv_canon, pass_complete_unroll, and pass_complete_unrolli

Create canonical induction variables in loops.

This flag is enabled by default, but it is only activated when --tree-loop-optimize is activated.

This pass estimates the number of iterations of each loop, identifies exit edges and removes those whose conditions are never met, based on gathered information about the maximum number of iterations. It attempts complete loop unrolling and completes if that succeeds. Otherwise, if the loop meets certain conditions, a countdown induction variable is introduced and the loop exit test is replaced so as to compare this variable with zero.

The only transformations that minimally impact debugging are the removal of loop exits, which may render some unreachable blocks unavailable for setting breakpoints (that would never be hit), and loop unrolling, that uses the same machinery and has the same effects on debugging that loop peeling (see --peel-loops).

--ivopts: pass_iv_optimize

Optimize induction variables on trees.

This flag is enabled by default, but it is only activated when --tree-loop-optimize is activated.

For each loop, after detecting base and general induction variables and selecting the optimal set, any new, artificial induction variables are created and added to the loop. Then, uses of induction variables not chosen for the optimal set are rewritten in terms of the optimal set, adjusting their original assignments or inserting new assignments instead of phi nodes. Finally, assignments to induction variables set to be removed are propagated into debug binds, if needed, and then discarded.

Alas, propagation into debug binds may lose plenty of useful information: PHI nodes cannot be propagated into binds, and regular assignments are not removed so that, say, if a definition of A is used in a definition of B and both are to be removed, we get a chance to propagate B and then A into debug binds that referenced only B. If we happen to remove A first, uses of B in debug binds end up having to be reset, losing relevant location information.

--inline-atomics: pass_fold_builtins

Inline __atomic operations when a lock free instruction sequence is available.

This flag is enabled by default, but the transformations described herein, part of the fold builtins pass, are only activated at -O1 or higher.

Various atomic operations are turned into atomic bit test and set, complement or reset. The transformation may invalidate user variables used only in compares with zero.

--if-conversion: pass_rtl_ifcvt, and pass_if_after_combine

Perform conversion of conditional jumps to branchless equivalents.

Various situations in this RTL pass remove tests, conditional branches and basic blocks. This can make for very surprising single-stepping into the blocks guarded by the conditions, as lines that would not be expected to run given the condition actually get to run, or vice-versa. SFNs don't help, they just reinforce whatever block execution is taken, or get dropped altogether.

Aside from the confusing single-stepping, the block removal might (but likely doesn't) cause GCC to lose track of debug bindings. In theory, at confluence points (when entering SSA), we introduce additional debug binds that allow GCC to recover from the loss of bindings in the separate branches. These should allow GCC to get back in sync with the result of the if-converted assignments at the confluence point, so at least after the confluence point, the bindings should have been recovered: if-converted sets will be inserted before the confluence-recovering debug bind.

These transformations usually apply to a single assignment in each conditional block, but there is support for turning multiple assignments in a then block into multiple assignments from IF_THEN_ELSE (cond, then_value, orig_value) too. There aren't further debugging complications in this case, but the blocks can be much longer, breaking users' expectations of single stepping for longer. SFN might make all of this worse, in that the statement markers in the conditional blocks are actually dropped, so you don't get to step into the blocks any more.

Support for conditional markers and binds could alleviate the effects of these transformations.

--move-loop-invariants: pass_rtl_move_loop_invariants

Move loop invariant computations out of loops.

This pass identifies SET insns that are invariant within a loop, and moves them to the loop preheader, possibly using a new pseudo to hold the invariant, or replaces them with a copy from the pseudo holding an equivalent invariant. Debug binds remain in place and need not be adjusted, as the transformations ensure the values are available in the original pseudos at the points right after the original SETs, where the binds will tend to be.

The only risk I can see to debuggability is that moved insns, and insns leading to equivalences that may end up dead and removed at later passes, may leave lines of code without any insns standing for them. The use of SFN and LVu information in debuggers, enabling them to stop at and inspect the state even at such lines, removes this potential problem.

--branch-count-reg: pass_rtl_doloop

Replace add, compare, branch with branch on count register.

This pass replaces the conditional branch at the end of a loop with a single decrement-counter-and-conditionally-loop sequence, when the loop iteration count can be computed. The original loop counter is not removed by this pass, so this pass by itself does not affect debug information. However, the original loop counter may become unused, and then be optimized away, and then it is unlikely that the generic adjustments to debug bind statements will be able to realize it can be computed from the newly-introduced loop counter. There is room for improvement, adjusting the debug binds of the original loop counter in terms of the new related IV. This might require some additional infrastructure that could likely be generalized and used for IVs in general.

--if-conversion2: pass_if_after_reload

Perform conversion of conditional jumps to conditional execution.

This pass turns insns in then and else blocks into COND_EXEC, enabled by the if condition (then) or its negation (else), removing the conditional branch, the branches at the end of the conditional blocks, and bringing it all into a single basic block.

It does not modify or remove debug insns, so single-stepping will enter and execute both blocks, though the side effects of insns whose condition is not active will not be executed. In general, insns that modify a variable will be followed by a debug insn that binds the variable to the location holding its modified value.

Although debug insns don't have conditional binds, the location of a variable often (but not always) remains the same across modification. In the cases it doesn't, only the bind at the confluence of the conditional blocks will get the variable location and value back in sync.

In addition to the post-confluence point, a variable modified within a block turned into conditionally-executed insns can also be correctly inspected right after an (active) assignment to it, i.e., the conditional assignment that would have been executed should the conditional blocks have remained separate. SFN and LVu technology help make sure there will be a usable inspection point with the correct bindings at that point.

At other points in the combined block, variables potentially modified in it may be regarded as bound to a stale or unused location holding an unrelated or uninitialized value, corresponding to what would have been assigned to the variable in the other block. This can get confusing if one does not realize that the block that is apparently being executed was not the one corresponding to the guarding condition.

All of these caveats of conditional execution only apply in the somewhat unusual cases in which the location of the variable actually changes. Because of control flow confluence and variable value unification at that point (regardless of the debug bind at the confluence point), it will most often be the case that the variable lives at the same register or memory location throughout the conditionally executed blocks, so the degradation of the debugging experience by this pass, although possible, should be rare.

Debug binds and markers cannot currently be marked as conditional; making that possible could further alleviate the impact of this transformation.

-Os: optimize=2 + size

Perform optimizations that tend to reduce the code size.

This option sets the optimization level to 2, in a mode that assigns higher priority to reducing code size.

Optimization at level 2 or higher extends tests on whether memory references may overlap with affine combinations analysis. This may infer non-aliasing in cases lower optimization levels wouldn't, enabling further optimizations, but nothing with effects on debugging that couldn't be had in other more obvious cases of non-aliasing.


Optimization level 2 or higher enables a pass that completely unrolls inner loops that iterate just a few times. Unrolling uses the same machinery that performs loop peeling (see --peel-loops) and, by itself, does not affect debugging.


An early rematerialization pass runs at optimization level 2 or higher. It rematerializes pseudos whose live ranges cross calls by copying the reaching definition insns between calls and uses. The pseudo may then be regarded as dead before the call, which might reset binds after the new death points, even when they could be adjusted so as to refer to the definition that will be used for rematerialization. In some cases, however, the expression may be lost entirely, but even when it is preserved, it might be too complex to be recognized as unchanged when the pseudo is rematerialized, so locations or values based on the pseudo might be lost.


Optimizing for size changes the default register allocation region setting back to the one used when not optimizing.


Perform a number of minor, expensive optimizations.


Gimple jump threading is one of the significant transformations enabled by this flag; see the effects of jump threading on debugging under --tree-vrp.


The bswap gimple pass, also enabled by expensive optimizations, recognizes shifts and rotates equivalent to byte-swap transformations, and replaces them with a byte-swap builtin. Any user-visible intermediate computations should have debug bind statements that will ultimately be adjusted and preserved even if the computations themselves are dropped, but some stmt moving, replacing, and inserting-then-removing, might actually mess up debug bind tracking of the final value.


Another expensive optimizations pass is widening_mul. It recognizes various opportunities for math optimizations, such as fusing multiply and add, testing overflows on adds or subtracts, and combining divide and modulus into a single operation. Final assignment stmts are replaced and stmts performing no longer needed computations are removed in a way that doesn't harm debugging.

pass_strength_reduction, pass_expand, pass_combine, pass_cse, pass_ira, pass_reload, and pass_postreload_cse

Some of the changes brought about by this flag are additional canonicalization of addresses when comparing base addresses in alias analysis, searching for alternate base addresses in gimple strength reduction, loop iteration count estimation even for loops with multiple exits, taking conflict counts into account when ordering SSA names for coalescing, combination of temporary slots for automatic variables, reuse of wider-mode ANDs and MEMs for CSE, simplifications and cheap extensions in combine, slightly more elaborate selection of register class preferences and attempts to decrease the number of live ranges in the integrated register allocator, removal of some unneeded reloads, and additional post-reload combine and CSE subpasses. None of these modify passes in ways that impact debugging but that aren't similarly impacted without this flag.


Another of the expensive optimizations is the compgotos RTL pass, that duplicates each small-enough block ending in computed jumps and merges the copies with predecessors that have it as their single successor, with no effects on debugging.

--strict-aliasing: strict_aliasing

Assume strict aliasing rules apply.

This flag limits the cases in which pointer accesses may alias, but that does not enable any kind of transformation with impact on debugging that could be incurred otherwise, using pointers known not to alias through other means.

--vect-cost-model=cheap: pass_vectorize, and pass_slp_vectorize

Use the cheap cost model for vectorization.

This affects --tree-loop-vectorize and --tree-slp-vectorize decisions, but not the kinds of transformations they make.

--tree-vrp: pass_early_vrp, and pass_vrp

Perform Value Range Propagation on trees.

This flag activates two different passes: early vrp and vrp proper. Early vrp is simpler in that it is not iterative, going through basic blocks once in dominance order rather than using the SSA propagation engine.

Once the range assigned to an SSA name is narrowed down to a single constant, subsequent statements referencing the name can be propagated into and possibly folded, and the definition may be removed. Conditional statements may be simplified, removing edges and basic blocks. Expressions in other statements may also be simplified based on ranges.

Such simplifications, in themselves, do not affect significantly the debugging experience. Removed definitions, if mentioned in debug binds, will be propagated into them and preserved there, with markers and views enabling them to be single-stepped and inspected; otherwise simplified statements remain in place with the same outputs, and don't require any debug information changes. Simplified conditions may cause entire blocks to become unreachable and be removed, which would stop placing breakpoints at them, but such breakpoints wouldn't be reached anyway.


At the end of VRP proper, (gimple) jump threading takes place, using value ranges to simplify conditional stmts to tell whether outgoing edges of threadable blocks can be determined from incoming edges.

Gimple jump threading duplicates a block when arriving at it through a certain incoming edge implies exiting it through a certain outgoing edge. This duplication, in itself, does not affect the debug experience: the copied block carries as much debug information as the original block. During threading, however, there are blocks that are not copied, namely forwarding blocks. From a codegen perspective, all they seem to do is to jump to another block. From a debug experience perspective, however, they may contain plenty of bind statements and markers, and those are not duplicated: binds are consolidated so that only the latest bind to each variable is copied, and markers are dropped entirely. This arrangement, intended to reinforce binds after newly-introduced confluences, drops debug binds that would not be observable before the introduction of markers and views. With markers and views, dropping the blocks in favor of bind consolidation amounts to significant loss. Effects need to be assessed, as forwarding blocks and leading/trailing debug stmts may end up removed by CFG cleanup. Better means to preserve them when consolidating forwarding blocks guarded by optimized-out conditions may be needed: conditional markers and binds are a possibility to explore.

--tree-dce(aggressive): pass_cd_dce

See --tree-dce. At optimization level 2 or higher (i.e., starting at -Os), the second tree dead code elimination pass is run in aggressive mode, that takes control dependencies into account, enabling additional conditional branches to be eliminated. This does not, however, fundamentally change the kinds of effects these passes have on debugging.

--ipa-sra: pass_early_ipa_sra

Perform interprocedural reduction of aggregates.

This pass modifies the argument list of a function that takes aggregates as arguments, splitting them into scalars, and adjusting the callers. The impact on debugging could possibly be no different from that of --tree-sra, but the parameter transformations do not retain any traces of the original parameters that could have variable location information generated in a way that reconstructed the original object, or even that tracked each replacement scalar parameter separately. This would require infrastructure to somehow retain the original parameters and describe how they map to the replacement parameters.

--optimize-sibling-calls: pass_tail_recursion, and pass_tail_calls

Optimize sibling and tail recursive calls.

This enables two separate passes. One attempts to turn tail recursion into loops, the other marks non-recursive tail calls as such, so that the expander emits them as jumps rather than calls.

Neither transformation affects debugging within an activation of a function, but they do affect debugging in that call stacks may be missing expected frames, stepping over a tail call would require additional logic in the debugger and the call would not return to the expected caller, and setting a breakpoint at the entry point of a recursively tail-called function may miss the recursive tail-calls.

--tree-switch-conversion: pass_convert_switch

Perform conversions of switch initializations.

This activates switch statement lowering alternatives that may be more efficient than the jump tables or decision trees that are otherwise used.

One of the lowering possibilities uses the switch value as a shift count, and then uses bit tests instead of multiple equality tests. No visible effects on the debug experience are expected from this.

Another turns a switch statement with all cases containing assignments of constants to the same variables into arrays of the constants and assignments to the variables from indexed elements of the arrays. This collapses the code for all (in-range) cases into a single block, losing any debug annotations they might contain. This ultimately prevents stepping into the switch statement or breaking at any of the cases. Optimized-out assignments that might have been preserved in such annotations will be lost altogether. As for assignments that are handled by this transformation, even though debug binds in the cases are lost, binds introduced by VTA after the post-switch PHI nodes will enable the variables to be inspected afterwards.

--partial-inlining: pass_split_functions

Perform partial inlining.

This flag enables splitting of functions, so that a part will be inlined while another part remains as a separate out of line function.

In theory, this shouldn't be a problem for debugging: the inlined part is represented as an inlined function, the part that remains out of line (or that is further split) is represented as an out of line function. Alas, it's not that simple: the out of line portion should be recognized as a part of a function, with an enclosing context taken from the inlined portion. There is no standardized representation that could enable debuggers to recognize this relationship, so at the very least there is going to be confusion as to stack frames, incoming arguments, and available variables from split contexts.

If the partial function is output as an optimized version of the original function (it is), a debugger might also set breakpoints at its entry point as if they were entry points for the entire function.

We have a debug info extension proposal to enable at least the entry point of the out of line part to not be regarded as an entry point for the entire function, which alleviates the breakpoint setting problem, but we may still need more annotations to allow a debugger to represent a single virtual call frame when the inlined portion activates the out of line one, with the entire set of enclosing variables and whatnot.

Without that, this flag can make debugging very difficult.

--ipa-icf: pass_ipa_icf

Perform Identical Code Folding for functions and read-only variables.

This pass identifies read-only variables with identical representation, and functions with equivalent executable code, and outputs only one copy of each. This is a disaster for debugging the discarded functions: line number and variable location information is dropped for all but the selected function in each equivalence group. It is even more confusing because the wrong function seems to be called when stepping into a dropped one, and unexpected breakpoint hits may occur.

This is some room for improvement here, but it is hardly trivial. We should generate debug information for all copies, but we don't want to compile them all the way to the end and then attempt to unify labels and whatnot to output location lists for each variant, and multiple line number tables. Unifying the functions combining and turning all debug annotations, including source locations, into conditionals that identify each of the unified copies could enable us to compile them normally, and then emit a single line number table (augmented with conditionals) and location information for each of the separate copies. Debug information consumers may then be able to identify the copies using return addresses and call-graph debug information, the same machinery used to determine entry-values of parameters.

--devirtualize: pass_ipa_devirt

Try to convert virtual calls to direct ones.

It replaces indirect calls with direct calls, possibly enabling folding, inlining and whatnot. The replacement of calls in itself does not affect debugging, but the enabled transformations might.

--devirtualize-speculatively: pass_ipa_devirt

Perform speculative devirtualization.

This is somewhat like --devirtualize, but the direct call is guarded by a test that confirms the selected target of the call is the correct one, and the indirect call remains as an alternative. Nothing there would affect debugging.

--ipa-cp: pass_ipa_cp

Perform interprocedural constant propagation.

This pass collects plenty of information about opportunities for propagating constants from callers to callees, cloning functions and replacing parameters with the constants or other known properties. This may make room for many other optimizations, including resolution of indirect calls to direct ones.

Cloning and substitution do not impact significantly the debug experience: the clones refer back to the original function as their abstract origin, and the substituted parameters, even if eliminated from the cloned function's ABI, are noted as bound to the constant in the debug info for the concrete function.

One potentially confusing situation that arises out of cloning is to set a breakpoint at a code address, and then be surprised that it is not hit at other activations of the function that do not use the same clone. Since this also comes up with such traditional transformations as inlining and loop unrolling, it probably won't be too surprising.

--ipa-bit-cp: pass_ipa_cp, and pass_ccp

Perform interprocedural bitwise constant propagation.

This flag extends --ipa-cp so that it also gathers information about which bits are known to be zero in values passed from one function to another. This creates additional opportunities for folding, --tree-ccp, etc.

--ipa-vrp: pass_ipa_cp

Perform IPA Value Range Propagation.

This flag extends --ipa-cp so that it also gathers range information in values passed from one function to another. This creates additional opportunities for folding, --tree-vrp, etc.

--inline-small-functions: pass_ipa_inline

Integrate functions into their callers when code size is known not to grow.

Like --inline-functions-called-once, this flag is an enabler for inlining, in that if it's not active, various cases of early inlining (and splitting for --partial-inlining) are suppressed.

--indirect-inlining: pass_ipa_inline

Perform indirect inlining.

Like other inline flags, this flag is an enabler: if it's not active, it stops the compiler short of attempting to resolve indirect edges (e.g., indirect or virtual calls) to direct edges.

--inline-functions: pass_ipa_inline

Integrate functions not declared "inline" into their callers when profitable.

Like other inline flags, this flag is an enabler: if it's not active, it stops the compiler from considering inlining functions not explicitly declared inline. See --inline-functions-called-once for an analysis of the impact of inlining on debugging.

--hoist-adjacent-loads: pass_phiopt

Enable hoisting adjacent loads to encourage generating conditional move instructions.

This flag modifies the ssa-phiopt pass, so as to move before a conditional branch loads of adjacent fields of the same struct into (different SSA names joined into) the same variable, one load in the then block and the other in the else block.

A debug bind will likely follow each of the original loads, so the moves won't change the ability to inspect the destination variable after each load. However, the early overwriting of the variable can make its previous value unavailable sooner than expected.

The moves could leave the conditional blocks empty, especially if a conditional move ends up being used, which could make it impossible to set breakpoints at lines within them or to single-step into them, as SFNs get dropped along with the removed blocks. The moved loads retain their location information, however, so one might be able to stop at them even when the conditional block to be executed does not include that line. This can all get confusing, but I don't see ways to improve that.

--isolate-erroneous-paths-dereference: pass_isolate_erroneous_paths

Turn undefined behavior into traps

This pass detects dereferences of null pointers and replaces them with trap statements. When the deference involves a PHI node, the incoming edge that carries the null value is redirected to a copy of the block, and the copy gets the trap statement instead.

This affects debugging mostly in minor ways. A chunk of code that follows an unconditional null dereference may become unavailable for breakpoints as the traps enables it to be completely optimized away. When a block is copied for the case of conditional null dereferences, references to the copied labels by name may not be resolved to the corresponding locations in the copied blocks. In extreme cases, in which all remaining incoming edges bring a null value, the original block may end up unreachable and optimized away, potentially making the label unavailable even while copies thereof remain.

When an indirect call is replaced with a trap, say because the callee address is null, debugger users may be surprised for not being allowed to step into the called function, even if they modify the pointer so that it is not null, because the call was turned into a trap. Such types of debugging sessions, involving debugging-time modification of pointers that at compile-time could be determined to evaluate to null, may become impossible to carry out after these transformations.

This flag, as well as --isolate-erroneous-paths-attribute and -Wnull-dereference (though a warning flag should not enable optimizations), enable turning divide by zero into trap (unless --non-call-exceptions is enabled), with the same logic and consequences as the above, and addresses of local automatic variables returned from functions into NULL, with no effects on debugging.

The flag --isolate-erroneous-paths-attribute uses the same logic and machinery as this option, but it recognizes cases in which a null pointer is passed to a function in an argument marked (with an attribute) as requiring a nonnull pointer, or returned from a function that marked as returning a nonnull pointer, and replaces the erroneous call or return with a trap. The effects of these transformations on debugging are of essentially the same kind.

--tree-pre: pass_pre

Enable SSA-PRE optimization on trees.

When an expression is computed redundantly in a block and some of its predecessors, make it fully redundant by inserting it in other predecessors, and then remove the redundant computation.

In theory, the insertions have no effect on debugging, but SSA coalescing may cause them to overwrite a variable earlier than expected, making it unavailable for inspection until the expected assignment point. The removals are preserved in debug binds, so as long as the computations are not optimized out, they will be representable, and with SFN and LVu, the binds will be available for inspection at the expected spots.

--code-hoisting: pass_pre

Enable code hoisting.

When equivalent expressions are computed in multiple blocks, move them to a dominating block, and then remove the redundant computations.

The considerations that apply to --tree-pre also apply to this flag.

--tree-tail-merge: pass_pre

Enable tail merging on trees.

This option is conceptually similar to --crossjumping, but it works on the gimple SSA representation, rather than on RTL, as a subpass at the end of SSA-PRE ->#Os-code-hoisting. Despite the name, it only merges entire basic blocks that share a common successor or predecessor.

Considerations that are also similar apply: the combined blocks may refer to different source fragments, they may have different debug annotations that are correctly ignored when comparing blocks, but that are dropped altogether from one of each pair of merged blocks.

I envision a possibility of preserving the annotations with the introduction of conditionals, though, unlike the case of jump threading, it is not immediately obvious how to identify a condition that might be available at run time and that could be used to tell which set of annotations to activate, so as to enable a debugger to show one source fragment or another as active.

--store-merging: pass_store_merging

Merge adjacent stores.

This combines multiple stores to adjacent or overlapping memory locations in a single basic block into fewer wider stores. This is done in gimple, before automatic variables are assigned to specific stack slots, so it is unlikely to combine effects in more than one user variable: it might combine accesses into a single array or structure, i.e., larger addressable objects committed to memory early in compilation.

These are objects that are not tracked or affected by VTA, so debug binds are unlikely to be affected. However, the postponement of merged stores may affect values visible at inspection points derived from statement boundaries (SFN).

--thread-jumps: pass_jump

Perform (RTL) jump threading optimizations.

(Jump threading passes or subpasses in gimple/SSA are enabled by --expensive-optimizations, by --tree-dominator-opts, and by --tree-vrp)

If a block is found to have no side effects, and if its being entered through a certain edge E1 implies it will always be left through an edge E2, this cleanup pass redirects edge E1 to the destination of E2, bypassing the block altogether. This removes from the expected flow any of the markers and bindings that were to be found in the bypassed block. This may be confusing not only when single-stepping a program, for an unexpected jump over a reasonably large piece of code might take place, but also after the bypassed block, as the skipped bindings may not be integrated in the subsequent views.

--gcse: pass_rtl_pre, pass_rtl_hoist, and pass_rtl_cprop

Perform global common subexpression elimination.

The PRE and hoist passes on RTL introduce new pseudos to hold redundant/hoisted expressions, and new insns to compute them as needed to make exprs fully redundant, and then replaces the redundant set insns with copies from the new pseudos. Since the values still end up in the REGs, debug binds referencing them are unchanged and remain valid. Register allocation might be able to optimize away these copies, but with SFN and LVu, it should still be possible to stop after assignments, and inspect the assigned values. The only expected negative effect on the debugging experience is that of early overwriting of variables, should the new pseudos be assigned to the same location as the dead variables whose future values they hold.

Another pass enabled by this flag is a constant/copy propagation RTL pass. As pseudos are replaced with constants or other pseudos, this may simplify and remove conditional branches and get unreachable basic blocks removed, which may then prevent breakpoints from being set at the source code ranges corresponding to the removed blocks. Trapping insns may also be turned into unconditional traps, making the subsequent code unreachable with similar consequences. Insns may become dead as the pseudos they set are replaced; this might cause debug binds referencing them to be reset, if the setting expression cannot be preserved by propagating into the debug bind or by creating a debug temporary. This may result in loss of debug location/value information.

With --gcse-lm, PRE may pull loads out of loops, replacing stores with copies to the pseudo, immediately followed by newly-inserted stores of the pseudo. This may impact debugging in that variables that live in memory will not be loaded again within the loop, so if the debugger is used to modify the value of the variable, that may fail to affect the program.

With --ira-hoist-pressure, hoist changes the weighting of decisions on whether or not to hoist computations to dominating blocks, but that doesn't cause different kinds of transformations to be done, so the kinds of effects on the debugging experience remain unchanged.

--rerun-cse-after-loop: pass_cse2, pass_cse_after_global_opts, and pass_cse

Add a common subexpression elimination pass after loop optimizations.

We run an RTL common subexpression elimination pass when optimization is enabled, and another after RTL global optimizations (--gcse cprop, hoist, and PRE, and --gcse-sm: store motion, never enabled implicitly), if they ran and made any changes; this flag adds another such pass after RTL loop optimizations.

CSE scans blocks linearly, detecting equivalent expressions stored in different pseudos, and replacing uses of later-set pseudos with uses of the earlier-set equivalent ones. This may render the later sets trivially dead, and they are ultimately removed if so.

The register replacements per se do not affect the debug experience; the dead insn removal might, but debug binds will have been replaced as well, so the main issues are the potential early overwrite making a variable unavailable for inspection, and the removal of insns at inspection points, that are made up by SFN and LVu with debugger support.

Register replacement might make it evident that a conditional branch is always or never taken, turning it into an unconditional edge, and then entire blocks might become unreachable. This might prevent breakpoints from being set within such blocks, but since the condition that led to them never held, they would never be reached anyway.

CSE can also combine condition code-setting insns when one block that performs a compare flows into another that performs the same compare, but this has no effect on the debug experience.

--cse-follow-jumps: pass_cse2, pass_cse_after_global_opts, and pass_cse

When running CSE, follow jumps to their targets.

This flag extends the CSE pass (see --rerun-cse-after-loop) so that registers set in one block can be used in substitutions in subsequent blocks that have no other predecessors than those in the path from the setting point. This does not change the effects CSE may have on the debug experience, it just extends such effects across separate blocks.

--dce(ud): pass_ud_rtl_dce

Use the RTL dead code elimination pass.

This flag is enabled by default, but the ud_dce pass described herein is only activated when optimizing at level 2 or higher.

This pass relies on use-def chains to mark all defs of each use. Then, it removes all unmarked insns, resetting debug binds that refer to defs in any removed insns. It would be possible to preserve the defs in debug temps for use in the binds, instead of resetting them, and then the loss of debug locations would be avoided, but as it is, this pass causes variables to lose their bindings.

--caller-saves: pass_ira

Save registers around function calls.

Without this flag, pseudos that live across function calls will not be assigned to call-clobbered registers. With it, they may end up in such registers, and then they will be saved in a stack slot as needed before calls, and restored as needed before other uses. In case a debug bind references the register at a point in which the register might be clobbered, it is adjusted to refer to the stack slot. Since VTA notices the saves and restores and realizes the register and the stack slot hold the same value, and regards call-clobbered registers as such at calls, we end up with variable locations that reflect the saving and restoring. This allows variables assigned to call-clobbered registers to be inspected even while they live in stack slots.

Modifying such variables in a debug session, however, is not guaranteed to work: variable tracking does not find out which of the copies GCC regards as the primary one, if there is one, it just notices when a copy may no longer hold the current value and, at such points, seeks alternate locations holding it. So debug information may suggest modifying the memory slot will change the variable, even though the variable has already been loaded into the register and won't be reloaded from memory again, or vice-versa. The caller-save implementation might be able to overcome this by issuing notes to be used by variable-tracking to enforce the location changes.

--ipa-ra: pass_ira, and pass_final

Use caller save register across calls if possible.

This flag gathers information about which call-clobbered registers may actually be modified in each function, and allows the register allocator in their callers to select registers that it would otherwise avoid, to hold values across calls known to not modify those registers. This has no effect on the debugging experience.

--lra-remat: pass_ira

Do CFG-sensitive rematerialization in LRA.

This pass recomputes the value of spilled registers, instead of loading them back from memory. This makes for confusing debugging sessions, if the spilled register holds a variable that is to be modified by the debugger while it is only available in memory. The expectation that the modified value would be used in subsequent uses will not be met, and at some point after the rematerialization, the variable will seem to magically take its original value back.

This situation is not entirely uncommon in optimized debugging, considering that we only take note of one location for a variable at a time, and we don't indicate whether or not that location is a modifiable one, but it's particularly apparent and worth noting in this case. Tracking all potential locations is remarkably expensive, but we might be able to mark binding statements as modifiable locations and clear that indication when a location expression is modified. This would likely be quite useful to avoid misleading behavior, but it might also limit severely the possibilities of modifying variables in debug sessions one can try and get away with.

--crossjumping: pass_jump2

Perform cross-jumping optimization.

This pass identifies common trailing insns in predecessors of a block, or leading insns in successors of a block, splitting one of the blocks so that the other can have the equivalent insns replaced with a jump.

This transformation ignores debug locations, markers and binds, as needed for -g to not affect codegen, but this makes it unify insn sequences that refer to different portions of the source code, even ones that affect different variables. Users of debuggers may find themselves wondering how they ended up at a certain point of the program without hitting an earlier breakpoint, or just when they expected to be elsewhere. Markers and binds will reflect the apparent source location, even if the code was reached from a different path that had unrelated computations that happened to become the same instructions; this may seem to be less confusing, unless one realizes that the code sequence is just equivalent to that which should be running after an unrelated path in the source program. With that realization, confusion can be even more thorough, as the loss of binds and markers will make expectations about what should happen in the dropped path are unlikely to be met.

All this said, the likelihood that completely unrelated computations be unified by this pass is very low. Trailing compares and jumps, perhaps preceded by code sequences performing identical computations, to the point of storing results in the same registers, will likely not be dissimilar enough as to make debugging impossible, aside from the effect of seemingly finding oneself at the wrong part of the program. Thus, even though very confusing transformations are theoretically possible, odds are that the transformation results may be recognizably similar to what would be expected, and the only real surprise be the unexpected jumps and the inability to set breakpoints.

Instead of dropping binds and markers from the range to be unified, conditional binds and markers could be introduced and used to enable a debugger to distinguish between the unified paths, and the side effects expected from each path, as suggested for --ipa-icf.

--peephole2: pass_peephole2

Enable an RTL peephole pass before sched2.

The peephole passes run close to the end of compilation, looking for sequences of insns that the backend recognizes for special treatment. The peephole2 pass, enabled by this flag, turns a sequence of insns into another sequence of insns, unlike peephole, that outputs alternate assembly code for recognized sequences within final.

These passes run so late that debug insns have already been turned into notes, and notes are skipped when recognizing sequences. Unlike peephole, however, peephole2 discards notes that appear among recognized insns, which may ultimately discard debug location and marker notes, whereas peephole will move them before or after the replacement insns sequence. Both can cause degradation of debug information, leading to missed or incorrectly-placed bindings and inspection points, so that unexpected values can be found when inspecting affected variables.

--schedule-insns2: pass_sched2, and pass_split_before_sched2

Reschedule instructions after register allocation.

This pass computes dependencies between insns, and then reorders them so as to better use hardware units, and so as to hide latencies.

The following assessment of impact is based on the standard insn scheduler used by GCC, and on the extended basic block scheduler, as opposed to the selective scheduler, which is largely incompatible with the debug insn-based technologies introduced to improve debuggability of optimized programs.

Debug insns, be they binds or statement markers, are retained in order, and binds carry their preceding insn as a dependency, in addition to any other dependencies from the bound value, but otherwise debug insns are pulled ahead of nondebug ones. Nondebug insns, however, are never regarded as dependent on debug ones, not even as anti-dependencies, so a nondebug insn that modifies an input to a debug bind resets the bind, which loses debug information. The bound value might still be available in alternate locations, or through other expressions, but no attempt is made to find out alternate representations for the binding in this pass.

Another potentially lossy situation is that of moving an insn so that it overwrites a variable before expected, which may cause the earlier value to no longer be available for inspection.

Without SFN support in debuggers, insn scheduling is the most common cause of the undesirable effect of jumping back and forth when single-stepping optimized programs. With SFN, debuggers can advance from one line to another according to the expected control flow, and, with LVu, observe side effects noted in preceding debug binds, even if insns that carry out those side effects are moved elsewhere.

--align-loops: pass_compute_alignments

Align the start of loops.

No effect on debugging.

--align-jumps: pass_compute_alignments

Align labels which are only reached by jumping.

No effect on debugging.

--align-labels: pass_compute_alignments

Align all labels.

No effect on debugging.

--align-functions: pass_compute_alignments

Align the start of functions.

No effect on debugging.

--reorder-functions: varasm

Reorder functions to improve code placement.

Decides whether to emit (or start) functions in hot or cold sections. No effect on debugging.

-O2: optimize=2

Perform optimizations that tend to make the program run faster.

This options sets the optimization level to 2, in a mode that assigns higher priority to making the code run faster.

--no-inline-functions: pass_ipa_inline

Although -O2 appears after -Os in the crescendo of optimization levels, -Os and -O3 enable --inline-functions but -O2 doesn't.

--optimize-strlen: pass_strlen

Enable string length optimizations on trees.

This patch tracks string and memory calls, as well as char stores, keeping track of string lengths, so as to optimize out builtin calls involving such lengths into constants or previously-computed values. Besides strlen(str) and strchr(str, 0) to length, it can optimize strcat to strcpy or even memcpy, and more. The transformations may involve removing redundant computations, possibly after inserting simpler call sequences, or replacing calls with assignments. Ultimately, if the return values of a call was stored in some SSA name, the transformation will also store in it. It is possible, however, that in the specific case of folding strstr(s,t)[=!]=s to strncmp(s,t,strlen(t))[=!]=0, if the result of the strstr call is stored in a user variable used only for the compare, the transformation will take place and invalidate the debug bind for that variable. There doesn't seem to be any other case in which a result that might have been stored in a user variable could be lost in these transformations.

The other potential surprise for debug sessions is attempting to step into any of these calls, since different functions may be called. For the same reason, setting breakpoints on the functions, both the ones that are explicitly called, and the ones that may end up called instead, will yield surprising results.

--schedule-insns: pass_sched

Reschedule instructions before register allocation.

See the analysis under --schedule-insns2. While that pass runs after mapping pseudo registers to hardware registers or stack slots, this one runs with a virtually infinite (pseudo) register file. Pseudo registers are less likely than hardware ones to overlap and conflict, so scheduling insns before register allocation resets fewer debug binds than scheduling them after register allocation. Furthermore, the earlier scheduling reduces the amount of scheduling done later, which further helps preserve debug binds.

--reorder-blocks-algorithm=stc: pass_reorder_blocks

Set the used basic block reordering algorithm to STC.

The STC algorithm, unlike the default simple one, may duplicate blocks and rotate loops, but still without any significant effect on the debug experience.

-O3: optimize=3

Perform expensive optimizations, that might even make the program larger and slower.

This option sets the optimization level to 3.


At optimization levels 3 or higher, loop peeling and complete unrolling (see --peel-loops) are permitted to grow code size, but this by itself does not affect debugging.

Computation of the iteration count and other loop properties may be simplified using the evolutions of the loop invariants in outer loops, enabling loop transformations that might not otherwise be performed in specific cases, but whose effects on debugging are no different from those of other transformations that could be performed regardless.

--tree-loop-vectorize: pass_vectorize

Enable loop vectorization on trees.

This flag is only activated when --tree-loop-optimize is activated.

This flag enables --tree-loop-if-convert.


Along with --tree-ch, it enables the ch_vect pass.


Along with --section-anchors, it enables the increase_alignment pass, that increases (without any impact on debugging) the alignment of global arrays so that loops over them can be vectorized.

This transformation, regardless of the selected cost model, combines multiple iterations of a loop into one that uses vector operations to perform the equivalent work of the combined iterations. This is extremely confusing for debugging, not just because of the significant control flow changes, but also because debug annotations used to counter the effects of optimizations on debugging are discarded or disabled. It might be possible to aggregate and unroll the debug annotations of multiple iterations at the end of each vectorized iteration, so as to make their effects progressively visible while single-stepping over the markers.

--vect-cost-model=dynamic: pass_vectorize, and pass_slp_vectorize

Use the dynamic cost model for vectorization.

This affects --tree-loop-vectorize and --tree-slp-vectorize decisions, but not the kinds of transformations they make.

--ipa-cp-clone: pass_ipa_cp

Perform cloning to make Interprocedural constant propagation stronger.

This flag, when disabled, stops externally-visible functions from being versioned for constant propagation into them, disabling all transformations enabled by --ipa-cp for such functions. Conversely, enabling it does not introduce any kind of effect that isn't potentially observable when --ipa-cp is enabled, it just extends such effects to externally-visible functions.

--inline-functions: pass_ipa_inline

See --inline-functions under -Os. This is the only flag that's not in a strict crescendo of optimization flags, in that -Os and -O3 have it enabled, but -O2 that's otherwise between -Os and -O3 doesn't.

--tree-partial-pre: pass_pre

In SSA-PRE optimization on trees, enable partial-partial redundancy elimination.

The considerations that apply to SSA-PRE also apply to this flag and its effects on the SSA-PRE pass.

--unswitch-loops: pass_tree_unswitch

Perform loop unswitching.

This flag is only activated when --tree-loop-optimize is activated.

This pass hoists invariant conditionals within inner loops, using loop versioning to create two versions of the loop, one for each value of the conditional, deciding once which version of the loop to enter. It may further hoist such conditionals out of outer loops, without versioning, if the outer loops are simple enough.

One might expect the early execution of the conditional to be confusing for interactive debugging sessions, but it is actually transparent: the condition has to be so trivial to compute that it is moved without the corresponding line number information, and it is executed as if part of the loop preheader. What's more: the original test is not removed from either version of the loop, it is rather replaced with a test that trivially evaluates to true or false. Even if that ends up optimized out, a SFN marker remains for the test in both versions of the loop, so it will be possible to stop at the test point and verify the condition, whatever path is taken from it. Since each block in the original loop will remain in at least one of the loop versions, it will be possible to set breakpoints at any of the lines of the loop after this transformation, even if some of the lines may be duplicated. Single-stepping will not be surprising: guards of conditional blocks will be stopped at, and the blocks will be entered just when expected. As such, the impact of this transformation in the debug experience is extremely low.

--split-loops: pass_loop_split

Perform loop splitting.

This flag is only activated when --tree-loop-optimize is activated.

This turns a loop with conditional blocks and a controlling condition that changes value once throughout the iteration space into two loops, each with only one of the conditional blocks. It uses loop versioning to create two copies of the loop, using the controlling condition to decide which of the versions to run. Then, it connects the exit of the first loop to the entry of the second, adjusts the exit condition of the first loop to transition to the other loop at the point the condition switches, and forces the controlling conditions in each block to the known value, removing the unused conditional blocks in each copy. None of these transformations has a significant impact on debuggability.

The only actual issue I see, that is probably of little significance, is that the block duplicating infrastructure does not copy bind statements for label declarations that were optimized away, so, if such a label is bound within the conditional block that is versioned and then discarded from the original loop, the label will seem to be completely gone, even though a block containing it will still be reachable in one of the loops.

--loop-unroll-and-jam: pass_loop_jam

Perform unroll-and-jam on loops.

This flag is only activated when --tree-loop-optimize is activated.

This transformations unrolls an outer loop and jams the multiple instances of the inner loop into a single loop. This changes the iteration sequence e.g. from [(0,0), (0,1), ..., (0,n), (1,0), ... (1,n), ... (m,n)] to [(0,0), (1,0), (0,1), (1,1), ... (0,n), (1,n), (2,0), (3,0), (2,1), ... (m,n)]. This can be extremely disruptive to debugging, as this sort of transformation, that effectively modifies the order in which major blocks of computation are executed, cannot be made up for with the existing infrastructure to retain debug information across optimizations.

Considering the limited kinds of computations the may be performed in such loops so as to enable this sort of transformation, it seems that it might be possible to attempt to output debug information that would enable a debugger to emulate the original loop nest, but it is not evident that current debug information formats are sufficiently expressive for that, nor that it would be worth the trouble.

It might be more useful to be able to somehow represent what kind of loop transformation took place, so that users can understand what is actually going on, rather than attempting to pretend we are still running the original loop nest.

--tree-loop-distribution: pass_loop_distribution

Enable loop distribution on trees.

--tree-loop-distribute-patterns: pass_loop_distribution

Enable loop distribution for patterns transformed into a library call.

These flags are only activated when --tree-loop-optimize is activated.

Both enable the same pass, that partitions suitable inner loops each into two loops over the same iteration space, copying the loop and then removing stmt that should remain in only one of the loop bodies. The multiple iterations over different statements of a loop can be very confusing when debugging. Removed stmts cause debug binds that reference them to be reset, which makes variables available in at most one of the two iterations.

--loop-interchange: pass_linterchange

Enable loop interchange on trees.

This flag is only activated when --tree-loop-optimize is activated.

This transformation rearranges a loop nest, attempting to swap the induction variables for each pair of loops in a nest. This changes the order in which the nest's iteration space is walked, which is confusing for debugging, and as it swaps and replaces induction variables, it resets binds to the original ones, so the iteration variables will not be visible within the loops after the transformation. This makes it very difficult to do any debugging of such loops.

--tree-loop-if-convert: pass_if_conversion

This pass is enabled by default when --tree-loop-vectorize is enabled, but it is only activated when --tree-loop-optimize is also activated.

It transforms multi-block loop bodies into a single basic block, possibly after versioning the loop, turning statements in conditional blocks into conditional statements. It makes debugging very hard, as it resets all debug binds in the loop, and rearranges control flow so that all conditional blocks become unconditionally executed. Conditional binds and markers might alleviate this, enabling blocks that wouldn't be executed without the optimization to be skipped during debugging.

--predictive-commoning: pass_predcom

Run predictive commoning optimization.

This flag is only activated when --tree-loop-optimize is activated.

This pass optimizes loops by identifying and analyzing dependence chains and unrolling them the right number of times to reuse loads and stored values across iterations and remove dead stores. The removal of dead stores may confuse debugging sessions, because inspecting arrays will not show the temporarily-stored values, while removal of loads may confuse sessions that modify the array expecting modified values to be loaded and used, an expectation that may not be met if the value was already loaded from memory.

--peel-loops: pass_complete_unroll

Perform loop peeling.

This amounts to copying the blocks that make up the loop body so that they can be run linearly before entering the remaining loop. Such block duplication does not in itself cause any harm to the debugging experience, but the linearization of initial iterations of the loop can make room for other optimizations that could in turn make debugging more difficult.

--tree-slp-vectorize: pass_slp_vectorize

Enable basic block vectorization (SLP) on trees.

This pass detects opportunities to use vector operations, instead of multiple operations on adjacent memory, in linear code. Although this pass does not reset debug binds, unlike the loop vectorizer, that hardly matters: the combined operations most often involve memory references, and those do not involve debug binds. So, as they are recombined, the timing of effects diverges from that implied by debug markers, which makes debugging very confusing.

--split-paths: pass_split_paths

Split paths leading to loop backedges.

This flag is only activated when --tree-loop-optimize is activated.

This pass duplicates a basic block that dominates the loop latch, if it ends in a conditional that may exit the loop, and it is the block that closes a simple diamond in the control flow graph. This has no effect on debugging, aside from the need for breakpoints in the duplicate block covering more than one code address.

--gcse-after-reload: pass_gcse2

Perform global common subexpression elimination after register allocation has finished.

Although the implementation of this pass is not the same as that of --gcse PRE or hoist, and this pass's focus is exclusively on eliminating loads, the insertion and deletion of loads uses the same logic and thus has the same effects on debugging. Since pseudos cannot be introduced after reload, it has to reuse registers for loads and copies. This is done without regard to debug binds, but the registers must not be live for them to be reused so, which implies they couldn't be used in debug binds. So, the impact of that should be limited to early unavailability of variables that happened to be available at such registers, or computable with expressions involving them.

-Ofast: optimize=3 + fast

Perform expensive optimizations, and also unsafe math transformations that could make standard-compliant programs misbehave.

This option sets the optimization level to 3, while also enabling the --fast-math option.

--fast-math: fast_math

This flag enables multiple options that disable various aspects of floating-point strict correctness. Several of them may allow simplifications that would otherwise not take place, from folding to removal of exception handling regions that could only catch floating-point exceptions. Such simplifications, though enabled by this flag, are not of kinds that could not possibly arise in the absence of such flags. Its impact on the debugging experience is thus regarded as very low.

--reciprocal-math: pass_cse_reciprocals

Allow optimization for floating-point division which may change the result of the operation due to rounding.

This optimization substitutes floating-point division by a SSA_NAME with multiplication by the reciprocal. Squared divisors are also detected and factored. The reciprocal of the SSA_NAME and of its square, when needed, are inserted after the definition or before a division. Divisions are turned into multiplications in place, so there is no effect on debugging.


Analyzed optimizations are so diverse that it is hardly possible to summarize the various forms of impact on debug information of passes that have any. The good news is that the findings are probably not surprising for anyone familiar with the internal behavior of the passes, and of the techniques used to mask the effects of optimization on debugging. There are, however, a few findings that I consider surprising, in a positive or negative way. A number of highlighted issues can be fixed without much effort; others require far more elaborate work, while others yet may border the unfixable.

I was surprised, throughout the analysis, by how seamless the introduction of VTA turned out to be, especially in gimple. Very few passes required additional logic to adjust debug binds: in nearly every case, the decision was between disregarding debug binds or adjusting them just like non-debug stmts or insns. This was favored by logic that detected and coped with debug uses of dead pseudos in RTL, and that dealt with adjustments to debug binds, sometimes inserting debug temps, when moving or removing assignments in gimple and RTL. Reviewing all these passes, I realized there may be room for improvement when moving SSA defs to dominating blocks: some means to signal, or detect internally, that such a move does not require adjustments would avoid some unnecessary forward propagation or introduction of debug temps, which both carry a risk of loss of debug information. Cases in which SSA defs are removed before new, equivalent defs are inserted at nearly the same point (e.g., replacing a PHI node with an assign) can also be improved.

The option -Wnull-dereference enables the isolate-paths pass, that may have codegen effects (e.g. changing returns of addresses of local automatic variables to null), even if both --isolate-erroneous-paths-* flags, that are supposed to enable codegen changes in this pass, are disabled.

Another case that is not too hard to fix is the lack of adjustment of debug binds under --auto-inc-dec.

Although -Og is supposed to avoid harming debugging, it enables --delayed-branch, that moves insns without regard to preserving correctness of previously-computed variable locations, and other potential harmful effects on branches and calls. It should probably not be enabled at -Og.

Other very late optimization passes that may corrupt variable locations are --peephole, also at -Og, and --peephole2, at -Os. They run after variable tracking, so adjusting debug binds so as to recompute locations is not much of an option. Adjusting notes might be possible, at significant effort, but --peephole2 may actually drop notes that apear between peepholed insns, and it is very hard to argue that doing something else would be uniformly superior. These passes are limited to some target architectures, but their effect on affected architectures could be very significant.

Other passes that may break variable location information are those that move or remove memory stores. Addressable variables are not subject to debug binds, so such changes actually make their effects observable at unexpected points, or not at all. Flags --tree-dse and --tree-sink enable such optimizations, both implied by -Og. Flags --tree-loop-vectorize and --tree-slp-vectorize, both enabled at -O3, may bring about similar effects on variables in memory, but there is hardly any expectation of retaining significant debuggability after these.

Still, it might be worth exploring possibilities of extending VTA-like tracking to non-scalar variables. Besides the above, and the late tracking of addressable variables that become non-addressable and then scalars due to optimizations, it might help mask optimization effects of --split-wide-types, --tree-sra, and --ipa-sra, that introduce scalars too late to ensure debug binds are created at the correct points.

Furthermore, whatever support there is to track split-out components separately, so as to be able to describe the aggregate location member-wise, seems to not be up to the task. The effects of IPA SRA on debugging are even worse, as dismembered params end up not represented at all. It is not clear that there are means to express such an apparently dropped parm as a composition of actual parms: some extensions might be required to even start fixing IPA SRA.

Several optimizations that reorganize the control flow graph may drop debug markers and binds. Gimple jump threading, for example, won't duplicate forwarding blocks, discarding all debug stmts in them. In some cases, it wouldn't be hard to retain them in predecessor or successor blocks, but in others, some way to mark such stmts as conditional might be the only way to preserve them. Conditional binds can be handled with some effort in var-tracking and existing location expressions and lists, but conditional markers would require some extension to line number tables to enable debug information consumers to decide e.g. whether or not a breakpoint at a line was hit when reaching a conditional marker for that line. This could become a very large project, but with significant expected benefits. Such an extension could benefit many other passes: (RTL) --thread-jumps, --if-conversion, --if-conversion2, --ssa-phiopt, --crossjumping, --tree-tail-merge, and even such loop optimizations as --tree-loop-if-convert.

I was a bit surprised to find out that a number of loop optimizations did not harm debugging. It expected loop unrolling would be harmless, but --split-loops, --unswitch-loops, and --peel-loops were also found to not affect debug information, unlike transformations that modify the order in which points in the iteration space are visited, such as --loop-unroll-and-jam and --tree-loop-vectorize.

Another somewhat surprising effect of induction variable optimizations on loops, particularly --branch-count-reg, was the risk of losing bindings for user-defined induction variables. Even if they can be expressed in terms of remaining basic induction variables, if the user-defined induction variable is no longer needed, there is no effort to adjust debug bind insns accordingly. There is room for improvement without much effort.

Partial inlining brings a significant challenge to debug information representation: although a function fragment can be linked back to the original abstract function and set some variables up to take locations and values from from the caller, expressing that the concrete subprogram is a fragment that does not contain an entry point for the function requires extensions. It would take further extensions to express how inlined subroutines combine with this fragment to form the entire abstract subprogram, and even to support multiple splits of the same subprogram. Similar mechanisms could also represent OpenMP function fragments.

Identical code folding (--ipa-icf) is another challenging case for debugging: a single executable code sequence may be used to represent multiple unrelated functions, each requiring a separate set of debug annotations. One potential way to address this is to combine debug notes (markers and binds) from all functions that share the same executable code, making all the notes conditional on DWARF procedures that can determine which of the combined functions is active, from e.g. caller points, or some other means to tell them apart. Ideally the symbolic information of each such function could be kept separate and guarded by the same conditionals, so that only scopes and variables of the activated function are considered available. This will require further extensions to debug information representation conventions.


based on GCC 8.1.1 (gcc-8-branch@259831 68fc0ec2c57b0519bd7e1f9e013f37f112d65a3d)


2019-03-12 v1.0.1

HTML formatting changes for WordPress: removed in-paragraph line breaks, split A NAME tags.

2018-10-11 v1.0

SFNs are only available in C and C++ so far. Improved wording and fixed spelling. Moved --caller-saves to the right place. Split --peephole out of --peephole2. Reorganize cse order and links. Mention the -O* crescendo sooner. Name passes after options, and before paragraphs. Added more anchors, reorganized earlier ones, and added in-text links. Check that all anchors are used, and that no links are dangling.

2018-10-02 v0.9 DRAFT

Introduced section structure and section names, pass names next to flags and a pass list as a TOC. Added some more info on how to tell whether a pass is run. Highlighted the case of addressable variables becoming scalars as benefitting from binds on non-scalars. Added ChangeLog.

2018-09-04 v0.8 DRAFT

First published draft.