GCC gOlogy: studying the impact of optimizations on debugging

<title>
GCC gOlogy: studying the impact of optimizations on debugging
</title>

Alexandre Oliva <aoliva@redhat.com> - 2018-10-02 v0.90 DRAFT (*)

-g-Ology, or gOlogy, stands for the study of how optimization levels
(selected by -O flags) affect the quality of debugging information
(enabled by -g flags).  This report assesses the theoretical and
practical impact of various optimizations available in the GNU
Compiler Collection version 8 on the debugging experience of
applications compiled by it.  The goal is to assess the quality of the
debug information generated by GCC with optimization enabled, document
the effects of optimization passes, and identify and document problems
and opportunities to improve it.

GCC offers various optimization levels, from -O0 to -O3, plus -Og,
-Osize and -Ofast, and over a hundred independently-controllable
optimization flags.  Each of the optimization levels enables a subset
of the optimization flags; enabling debugging information generation,
on the other hand, is not supposed to have any effect whatsoever on
the executable code.  This report focuses on flags that are enabled by
the -O* options, and their effects on (extended) DWARF debug
information generated by GCC.

This report is structured as follows.  The introduction outlines how
GCC gets from source code to output assembly code and debug
information, the major internal representation forms used throughout
compilation, and several techniques used by GCC to keep track of the
mapping from internal representations and output executable code to
corresponding source code concepts.  Then, the bulk of the report goes
through each of the -O flags, and in each of them, through the
optimization passes that are enabled or affected by the -O flag,
describing the general behavior of the pass and what effects it may
have on debug information.  The final section highlights and
consolidates the most relevant findings.


== Introduction

In GCC, language front ends parse a translation unit and deliver to
the so-called middle end a number of functions (procedures, methods,
subprograms) to compile in a form that, although language-independent,
closely resembles a parse tree.  Each function then goes through a
number of passes, some of which are only executed when certain
optimization flags are enabled, or other conditions are met.

The tree form is turned into gimple form, in which each function
amounts to a set of basic blocks in a control flow graph, each
containing a sequence of stmts represented as tuples.  A stmt may be a
label definition, a simple assignment, a function call, a conditional
or unconditional branch, an asm statement, debug binds or markers, or
other less common forms.  Scalar variables are versioned and converted
to static single assignment (SSA) form, in which each reference to a
variable takes a version that links it back to a single definition of
that variable version.  Additional definitions, called PHI nodes, may
be introduced at confluence basic block, indicating which version is
to be taken when arriving from each incoming block.  This is the form
in which most of the optimization passes in GCC take place.

Each function is then expanded to the register transfer language (RTL)
form, in which basic blocks are now formed by a sequence of insns,
each one corresponding to a machine instruction defined in the target
back end, or other machine-independent forms such as debug binds and
markers, notes and other forms not relevant for this report.  Each
insn may contain zero or more computations represented as SETs (one of
which may set PC to indicate a branch), a CALL, an ASM, and indicators
that additional registers or memory can be used or impredictably
modified.  Scalar variables are initially assigned to
pseudo-registers, and many RTL optimization passes operate in this
form.  Register allocation will then map each remaining
pseudo-register to a hardware register (if optimizing) or a stack
slot, adding spills and reloads as needed to satisfy the requirements
of each hardware instruction.

A few RTL passes run after register allocation, and at the end
assembly code is output for each insn, while outputting debug
information that is to be interspersed with the assembly code, and
gathering debug information that is consolidated and output
afterwards.


=== Preserving debug information

There was a time when debugging required disabling optimizations.
Debug information formats back then could only assign a single
location to each variable, and optimizing out the frame pointer would
remove the base reference for all stack-based variables.

GCC has long had the notion that enabling debug information should not
cause any changes to executable code.  To that end, each stmt and insn
carries source location information, i.e., file and line (and, more
recently, column) numbers and lexical blocks, even when debug
information is not enabled.  Without optimization, this makes for
single-stepping in a debugger just in the natural order of execution,
and all variables are assigned stable memory locations, which makes
for a single location per variable throughout its lifetime.

Optimizations introduce complications, combining, simplifying and
removing computations, modifying the order of execution, reusing
registers and stack slots, duplicating portions of code, introducing
alternate induction variables and modifying the iteration order in
loop nests.  Compiler and debug information formats have evolved over
time so as to enable optimized programs to be represented and
debugged, with varying levels of success.

For example, automatic variables in optimized programs may live in a
register for some time, another register at another time, and a stack
slot at other times.  DWARF debug information supports location lists,
that may indicate a different location for a variable for different,
possibly-overlapping executable code ranges.  Memory references in
gimple and RTL forms carry symbolic expressions used for alias
analysis, and also to build location lists; SSA versions, RTL
pseudo-registers and hardware registers also carry symbolic references
to the variables they refer to.  The variable tracking pass
identifies, using such symbolic references, situations in which the
location of a variable varies throughout its lifetime, and arranges
for location lists to be output accordingly.

As location expressions gained the ability to represent value
expressions, it became possible to indicate that in a certain range a
variable holds a known constant value, or that its value is not
available directly, but it can be computed from other locations.
Variable tracking at assignments extended variable tracking,
introducing debug binds early in compilation that associate scalar
source variables with the location in which their value is stored,
arranging for the location/value expressions to be adjusted throughout
the compilation (even if computations are removed or moved past the
binds, so that the bound value expressions remain accurate) while
preserving their natural execution order, and using such binds to
generate location lists.

Although each stmt and insn carries source location information, as
they're shuffled by optimization, single-stepping may seem to go back
to earlier statements, and it becomes impossible to tell when the
effects of a statement are complete.  Statement Frontier Notes (SFN)
are introduced as additional debug notes, emitted in the stmt stream
to mark the beginning of logical statements, thus after any debug
binds associated with previous statements take effect.  Their natural
execution order is retained by the compiler, so the markers can be
used to output source location information marked as recommended stop
points (the is_stmt flag in DWARF line number tables), avoiding
bouncing and making for predictable observability of side effects.

Given optimization, it is not uncommon for no executable code to
remain between inspection points for multiple neighbor statements.
This was a problem because, although multiple source locations can be
associated with a single address in the line number table, ranges in
location lists could only name addresses of executable instructions.
Location view (LVu) numbering was introduced to identify each of the
entries in the line number table that refer to the same code address,
so that they can then be referenced unambiguously in location lists.
The representation of such extended location lists requires extensions
proposed for DWARF v6, and at the time of this writing, there aren't
any debuggers that support such extended location lists.  Still, since
the information is available and we expect debuggers to catch up
eventually, the analyses that follow assume the disambiguation given
by LVu is effective in masking the optimization effects it was created
to overcome.

Despite all this effort, it is not realistic to expect the debug
experience of a program without optimization to be the same as that of
a program optimized even by optimizations regarded as not affecting
debugging.  For example, a variable assigned to an exclusive stack
slot will be available throughout a function, but optimization may
assign it to a register during its limited live range, and then it
won't be possible to inspect it elsewhere.  Setting breakpoints based
on addresses of executable code may not work as effectively in
optimized programs, because the same spot of the program may have been
duplicated by optimization, and then the breakpoint may not hit where
expected.  Having the value of a variable available in a given
locations, say its stack slot, does not guarantee it is possible to
modify it, say it could have just been loaded into a register, that
may then be modified by the program and stored back in the stack slot;
this might happen even without optimization, but the windows for this
possibility are narrower.  Furthermore, folding that logically follows
from reasoning about what is known about a variable at compile time
may no longer be applicable if the variable is modified in the
debugger; if a block was removed because the condition guarding it was
provably false at compile time, changing a variable so that the
condition would evaluate to true will not bring back the code that was
optimized out.

So, inspecting variables in optimized programs is more likely to yield
"optimized out" because optimizations may expose dead ranges that are
not noticed with -O0, and modifying them may always conflict with
optimizations.  As for breakpoints, using source locations rather than
code addresses is less likely to yield surprising results.


== Optimizations

In this section, each optimization level is detailed, enumerating the
flags incrementally enabled by it over the previous level, and
detailing the effects on debugging brought about by each of the
optimization levels and flags.

Determining when a pass is run is an involved process.  Each pass has
a gate function, that decides whether to run the pass based on
optimization levels and flags.  The default_options_table array in
gcc/opts.c arranges for flags to be enabled depending on the
optimization level, but some flags are enabled by default through
their initializer in e.g. gcc/common.opt.  Some are also forced
enabled or disabled depending on other conditions.  However, even if
the gate condition of a pass is enabled, it might not run if any
enclosing pass group fails its own gate condition.

The following outline depicts the optimization passes GCC goes through
while compiling a function, in the order they might run; the
information is extracted from gcc/passes.def.  Indentation indicates
grouping of the indented passes within the previous less-indented pass
group.  Parameters for the pass are indicated between parentheses
after the pass name.

  all_lowering_passes:
      pass_warn_unused_result
      pass_diagnose_omp_blocks
      pass_diagnose_tm_blocks
      pass_lower_omp
      pass_lower_cf
      pass_lower_tm
      pass_refactor_eh
      pass_lower_eh ->#pass_lower_eh+
      pass_build_cfg
      pass_warn_function_return
      pass_expand_omp ->#pass_expand_omp+
      pass_sprintf_length (!fold_return_value)
      pass_walloca (strict_mode)
      pass_build_cgraph_edges

  all_small_ipa_passes:
      pass_ipa_free_lang_data
      pass_ipa_function_and_variable_visibility
      pass_ipa_chkp_versioning
      pass_ipa_chkp_early_produce_thunks
      pass_build_ssa_passes:
          pass_fixup_cfg
          pass_build_ssa
          pass_warn_nonnull_compare
          pass_ubsan
          pass_early_warn_uninitialized
          pass_nothrow
          pass_rebuild_cgraph_edges

      pass_chkp_instrumentation_passes:
          pass_fixup_cfg
          pass_chkp
          pass_rebuild_cgraph_edges

      pass_local_optimization_passes:
          pass_fixup_cfg
          pass_rebuild_cgraph_edges
          pass_local_fn_summary
          pass_early_inline
          pass_all_early_optimizations:
              pass_remove_cgraph_callee_edges
              pass_object_sizes (insert_min_max)
              pass_ccp (!nonzero) ->#pass_ccp ->#pass_ccp+ ->#pass_ccp++
              pass_forwprop ->#pass_forwprop
              pass_early_thread_jumps
              pass_sra_early ->#pass_sra_early
              pass_build_ealias ->#pass_build_ealias
              pass_fre ->#pass_fre
              pass_early_vrp ->#pass_early_vrp
              pass_merge_phi ->#pass_merge_phi
              pass_dse ->#pass_dse
              pass_cd_dce ->#pass_cd_dce ->#pass_cd_dce+
              pass_early_ipa_sra ->#pass_early_ipa_sra
              pass_tail_recursion ->#pass_tail_recursion
              pass_convert_switch ->#pass_convert_switch
              pass_cleanup_eh
              pass_profile ->#pass_profile+
              pass_local_pure_const ->#pass_local_pure_const
              pass_split_functions ->#pass_split_functions
              pass_strip_predict_hints
          pass_release_ssa_names
          pass_rebuild_cgraph_edges
          pass_local_fn_summary

      pass_ipa_oacc:
          pass_ipa_pta
          pass_ipa_oacc_kernels:
              pass_oacc_kernels:
                  pass_ch ->#pass_ch
                  pass_fre ->#pass_fre
                  pass_lim ->#pass_lim
                  pass_dominator (!may_peel_loop_headers) ->#pass_dominator
                  pass_dce ->#pass_dce
                  pass_parallelize_loops (oacc_kernels)
                  pass_expand_omp_ssa ->#pass_expand_omp_ssa+
                  pass_rebuild_cgraph_edges

      pass_target_clone
      pass_ipa_chkp_produce_thunks
      pass_ipa_auto_profile
      pass_ipa_tree_profile:
          pass_feedback_split_functions
      pass_ipa_free_fn_summary (small)
      pass_ipa_increase_alignment
      pass_ipa_tm
      pass_ipa_lower_emutls

  all_regular_ipa_passes:
      pass_ipa_whole_program_visibility
      pass_ipa_profile ->#pass_ipa_profile
      pass_ipa_icf ->#pass_ipa_icf
      pass_ipa_devirt ->#pass_ipa_devirt ->#pass_ipa_devirt+
      pass_ipa_cp ->#pass_ipa_cp ->#pass_ipa_cp+ ->#pass_ipa_cp++ ->#pass_ipa_cp+++
      pass_ipa_cdtor_merge
      pass_ipa_hsa
      pass_ipa_fn_summary
      pass_ipa_inline ->#pass_ipa_inline+ ->#pass_ipa_inline++ ->#pass_ipa_inline+++ ->#pass_ipa_inline++++ ->#pass_ipa_inline+++++ ->#pass_ipa_inline++++++ ->#pass_ipa_inline+++++++
      pass_ipa_pure_const ->#pass_ipa_pure_const
      pass_ipa_free_fn_summary (!small)
      pass_ipa_reference ->#pass_ipa_reference
      pass_ipa_comdats

  all_late_ipa_passes:
      pass_materialize_all_clones
      pass_ipa_pta
      pass_omp_simd_clone

  all_passes:
      pass_fixup_cfg
      pass_lower_eh_dispatch
      pass_oacc_device_lower
      pass_omp_device_lower
      pass_omp_target_link
      pass_all_optimizations:
          pass_remove_cgraph_callee_edges
          pass_strip_predict_hints
          pass_ccp (nonzero) ->#pass_ccp ->#pass_ccp+ ->#pass_ccp++
          pass_post_ipa_warn
          pass_complete_unrolli ->#pass_complete_unrolli ->#pass_complete_unrolli+
          pass_backprop ->#pass_backprop
          pass_phiprop ->#pass_phiprop
          pass_forwprop ->#pass_forwprop
          pass_object_sizes (!insert_min_max)
          pass_build_alias ->#pass_build_alias
          pass_return_slot ->#pass_return_slot
          pass_fre ->#pass_fre
          pass_merge_phi ->#pass_merge_phi
          pass_thread_jumps ->#pass_thread_jumps
          pass_vrp (warn_array_bounds) ->#pass_vrp
          pass_chkp_opt
          pass_dce ->#pass_dce
          pass_stdarg ->#pass_stdarg
          pass_call_cdce ->#pass_call_cdce
          pass_cselim ->#pass_cselim
          pass_copy_prop ->#pass_copy_prop
          pass_tree_ifcombine ->#pass_tree_ifcombine
          pass_merge_phi ->#pass_merge_phi
          pass_phiopt ->#pass_phiopt ->#pass_phiopt+
          pass_tail_recursion ->#pass_tail_recursion
          pass_ch ->#pass_ch
          pass_lower_complex
          pass_sra ->#pass_sra
          pass_thread_jumps ->#pass_thread_jumps
          pass_dominator (may_peel_loop_headers) ->#pass_dominator
          pass_isolate_erroneous_paths ->#pass_isolate_erroneous_paths
          pass_phi_only_cprop ->#pass_phi_only_cprop
          pass_dse ->#pass_dse
          pass_reassoc (insert_powi) ->#pass_reassoc
          pass_dce ->#pass_dce
          pass_forwprop ->#pass_forwprop
          pass_phiopt ->#pass_phiopt ->#pass_phiopt+
          pass_ccp (nonzero) ->#pass_ccp ->#pass_ccp+ ->#pass_ccp++
          pass_cse_sincos ->#pass_cse_sincos
          pass_optimize_bswap ->#pass_optimize_bswap
          pass_laddress ->#pass_laddress
          pass_lim ->#pass_lim
          pass_walloca (!strict_mode)
          pass_pre ->#pass_pre ->#pass_pre+ ->#pass_pre++ ->#pass_pre+++
          pass_sink_code ->#pass_sink_code
          pass_sancov
          pass_asan
          pass_tsan
          pass_dce ->#pass_dce
          pass_fix_loops ->#pass_fix_loops
          pass_tree_loop: ->#pass_tree_loop
              pass_tree_loop_init
              pass_tree_unswitch ->#pass_tree_unswitch
              pass_scev_cprop ->#pass_scev_cprop
              pass_loop_split ->#pass_loop_split
              pass_loop_jam ->#pass_loop_jam
              pass_cd_dce ->#pass_cd_dce ->#pass_cd_dce+
              pass_iv_canon ->#pass_iv_canon
              pass_loop_distribution ->#pass_loop_distribution ->#pass_loop_distribution+
              pass_linterchange ->#pass_linterchange
              pass_copy_prop ->#pass_copy_prop
              pass_graphite:
                  pass_graphite_transforms
                  pass_lim ->#pass_lim
                  pass_copy_prop ->#pass_copy_prop
                  pass_dce ->#pass_dce
              pass_parallelize_loops (!oacc_kernels)
              pass_expand_omp_ssa ->#pass_expand_omp_ssa+
              pass_ch_vect ->#pass_ch_vect
              pass_if_conversion ->#pass_if_conversion
              pass_vectorize: ->#pass_vectorize+ ->#pass_vectorize ->#pass_vectorize+
                  pass_dce ->#pass_dce
              pass_predcom ->#pass_predcom
              pass_complete_unroll ->#pass_complete_unroll ->#pass_complete_unroll+ ->#pass_complete_unroll++
              pass_slp_vectorize ->#pass_slp_vectorize ->#pass_slp_vectorize+ ->#pass_slp_vectorize++
              pass_loop_prefetch
              pass_iv_optimize ->#pass_iv_optimize
              pass_lim ->#pass_lim
              pass_tree_loop_done
          pass_tree_no_loop: ->#pass_tree_no_loop
              pass_slp_vectorize ->#pass_slp_vectorize ->#pass_slp_vectorize+ ->#pass_slp_vectorize++
          pass_simduid_cleanup
          pass_lower_vector_ssa ->#pass_lower_vector_ssa+
          pass_cse_reciprocals ->#pass_cse_reciprocals
          pass_sprintf_length (fold_return_value)
          pass_reassoc (!insert_powi) ->#pass_reassoc
          pass_strength_reduction ->#pass_strength_reduction ->#pass_strength_reudction+
          pass_split_paths ->#pass_split_paths
          pass_tracer
          pass_thread_jumps ->#pass_thread_jumps
          pass_dominator (!may_peel_loop_headers) ->#pass_dominator
          pass_strlen ->#pass_strlen
          pass_thread_jumps ->#pass_thread_jumps
          pass_vrp (!warn_array_bounds) ->#pass_vrp
          pass_warn_restrict
          pass_phi_only_cprop ->#pass_phi_only_cprop
          pass_dse ->#pass_dse
          pass_cd_dce ->#pass_cd_dce ->#pass_cd_dce+
          pass_forwprop ->#pass_forwprop
          pass_phiopt ->#pass_phiopt ->#pass_phiopt+
          pass_fold_builtins ->#pass_fold_builtins+ ->#pass_fold_builtins++
          pass_optimize_widening_mul ->#pass_optimize_widening_mul
          pass_store_merging ->#pass_store_merging
          pass_tail_calls ->#pass_tail_calls
          pass_dce ->#pass_dce
          pass_split_crit_edges
          pass_late_warn_uninitialized
          pass_uncprop ->#pass_uncprop
          pass_local_pure_const ->#pass_local_pure_const
      pass_all_optimizations_g:
          pass_remove_cgraph_callee_edges
          pass_strip_predict_hints
          pass_lower_complex
          pass_lower_vector_ssa ->#pass_lower_vector_ssa+
          pass_ccp (nonzero) ->#pass_ccp ->#pass_ccp+ ->#pass_ccp++
          pass_post_ipa_warn
          pass_object_sizes
          pass_fold_builtins ->#pass_fold_builtins+ ->#pass_fold_builtins++
          pass_sprintf_length (fold_return_value)
          pass_copy_prop ->#pass_copy_prop
          pass_dce ->#pass_dce
          pass_sancov
          pass_asan
          pass_tsan
          pass_split_crit_edges
          pass_late_warn_uninitialized
          pass_uncprop ->#pass_uncprop
          pass_local_pure_const ->#pass_local_pure_const
      pass_tm_init:
          pass_tm_mark
          pass_tm_memopt
          pass_tm_edges
      pass_simduid_cleanup
      pass_vtable_verify
      pass_lower_vaarg
      pass_lower_vector ->#pass_lower_vector+
      pass_lower_complex_O0
      pass_sancov_O0
      pass_lower_switch
      pass_asan_O0
      pass_tsan_O0
      pass_sanopt
      pass_cleanup_eh
      pass_lower_resx
      pass_nrv
      pass_cleanup_cfg_post_optimizing
      pass_warn_function_noreturn
      pass_gen_hsail

      pass_expand ->#pass_expand+ ->#pass_expand++ ->#pass_expand+++ ->#pass_expand++++ ->#pass_expand+++++

      pass_rest_of_compilation:
          pass_instantiate_virtual_regs
          pass_into_cfg_layout_mode
          pass_jump ->#pass_jump+ ->#pass_jump++
          pass_lower_subreg ->#pass_lower_subreg
          pass_df_initialize_opt ->#pass_df_initialize_opt+
          pass_cse ->#pass_cse ->#pass_cse+
          pass_rtl_fwprop ->#pass_rtl_fwprop
          pass_rtl_cprop ->#pass_rtl_cprop
          pass_rtl_pre ->#pass_rtl_pre
          pass_rtl_hoist ->#pass_rtl_hoist
          pass_rtl_cprop ->#pass_rtl_cprop
          pass_rtl_store_motion
          pass_cse_after_global_opts ->#pass_cse_after_global_opts+
          pass_rtl_ifcvt ->#pass_rtl_ifcvt
          pass_reginfo_init
          pass_loop2:
              pass_rtl_loop_init
              pass_rtl_move_loop_invariants ->#pass_rtl_move_loop_invariants ->#pass_rtl_move_loop_invariants+
              pass_rtl_unroll_loops
              pass_rtl_doloop ->#pass_rtl_doloop
              pass_rtl_loop_done
          pass_web
          pass_rtl_cprop ->#pass_rtl_cprop
          pass_cse2 ->#pass_cse2
          pass_rtl_dse1 ->#pass_rtl_dse1
          pass_rtl_fwprop_addr ->#pass_rtl_fwprop_addr
          pass_inc_dec ->#pass_inc_dec
          pass_initialize_regs ->#pass_initialize_regs
          pass_ud_rtl_dce ->#pass_ud_rtl_dce
          pass_combine ->#pass_combine ->#pass_combine+
          pass_if_after_combine ->#pass_if_after_combine
          pass_partition_blocks
          pass_outof_cfg_layout_mode
          pass_split_all_insns
          pass_lower_subreg2 ->#pass_lower_subreg2
          pass_df_initialize_no_opt
          pass_stack_ptr_mod
          pass_mode_switching
          pass_match_asm_constraints
          pass_sms
          pass_live_range_shrinkage
          pass_sched ->#pass_sched
          pass_early_remat ->#pass_early_remat
          pass_ira ->#pass_ira+ ->#pass_ira++ ->#pass_ira+++ ->#pass_ira++++ ->#pass_ira+++++ ->#pass_ira++++++ ->#pass_ira+++++++
          pass_reload ->#pass_reload+ ->#pass_reload++
          pass_postreload:
              pass_postreload_cse ->#pass_postreload_cse
              pass_gcse2 ->#pass_gcse2
              pass_split_after_reload ->#pass_split_after_reload
              pass_ree
              pass_compare_elim_after_reload ->#pass_compare_elim_after_reload
              pass_branch_target_load_optimize1
              pass_thread_prologue_and_epilogue ->#pass_thread_prologue_and_epilogue+ ->#pass_thread_prologue_and_epilogue++
              pass_rtl_dse2 ->#pass_rtl_dse2
              pass_stack_adjustments ->#pass_stack_adjustments
              pass_jump2 ->#pass_jump2+
              pass_duplicate_computed_gotos ->#pass_duplicate_computed_gotos
              pass_sched_fusion
              pass_peephole2 ->#pass_peephole2
              pass_if_after_reload ->#pass_if_after_reload
              pass_regrename
              pass_cprop_hardreg ->#pass_cprop_hardreg
              pass_fast_rtl_dce ->#pass_fast_rtl_dce
              pass_reorder_blocks ->#pass_reorder_blocks ->#pass_reorder_blocks+
              pass_branch_target_load_optimize2
              pass_leaf_regs
              pass_split_before_sched2 ->#pass_split_before_sched2
              pass_sched2 ->#pass_sched2
              pass_stack_regs:
                  pass_split_before_regstack ->#pass_split_before_sched2
                  pass_stack_regs_run
          pass_late_compilation:
              pass_compute_alignments ->#pass_compute_alignments+ ->#pass_compute_alignments++ ->#pass_compute_alignments+++ ->#pass_compute_alignments++++
              pass_variable_tracking ->#pass_variable_tracking
              pass_free_cfg
              pass_machine_reorg
              pass_cleanup_barriers
              pass_delay_slots ->#pass_delay_slots
              pass_split_for_shorten_branches
              pass_convert_to_eh_region_ranges
              pass_shorten_branches ->#pass_shorten_branches
              pass_set_nothrow_function_flags
              pass_dwarf2_frame
              pass_final ->#pass_final+
          pass_df_finish
      pass_clean_state


-O0: optimize=0

Disable optimization.

This flag sets optimization level to 0.  This is the base level, the
golden standard for the debugging experience, against which other
levels are compared.  All automatic variables and parmeters are
allocated to memory, being loaded and, if modified, stored back, at
every use.  All branches and labels are preserved, and no blocks are
duplicated.  Functions are not inlined, except for mandatory inlines,
e.g., functions marked with attribute always_inline.  Source locations
preserved from branches or returns only in CFG edges are materialized
as NOPs.


-Og: optimize=1 + debug

Perform only very fast optimizations with low impact on debugging.

This flag sets the optimization level to 1, but limited by an option
for better debugging that disables a number of optimizations, even
some that would otherwise be enabled at optimization level 1.

 #build+
Optimization enables the selection of the local dynamic TLS model to
access thread-local variables known to be defined in the dynamic
module being compiled.  Without that, the global dynamic TLS model is
used instead, but this change has no effect on debugging.

Type conversions attempt to substitute conversions to float of results
of standard calls that return double to calls that return float.
Likewise, conversions to integral types of results of standard calls
that return double (e.g. round, logb) are converted to calls that
return integral types (lround, ilogb).  These only affect debugging
inasmuch as the behavior of the substituted functions is to be
inspected.

 #gimplify+
Small changes in the processing of nested functions that enable frame
structs and static chains to be optimized away, without impact on
debugging, and in representing variable-length arrays in nested
functions, which may lose some details about the types.

 #pass_expand_omp+ #pass_expand_omp_ssa+
Some OpenMP primitives may also be simplified when optimization is
enabled.  These are internal implementation details, so they shouldn't
affect debugging.

 #pass_lower_eh+
Gimple EH lowering decisions change with optimization, but finally
regions may be duplicated either way, and with the same minor effects
on debugging: different code addresses for the same source code lines.
Critical edges are also split to ease optimizations, and later unsplit
if they remain.

 #pass_ipa_inline+
Optimization affects slightly the way variables and parameters are
remapped when inlining, but these changes have their effects in debug
information masked away.

 #TODO_cleanup_cfg
When optimizing, various passes run cleanups of the control flow
graph.  This may delete unreachable blocks and trivially dead insns
like unused sets or copies to self.  In gimple mode, the removal of
unreachable blocks may propagate SSA defs to uses, but it is hard to
imagine that any uses thereof will be reachable, so there should be no
impact on debugging.  Removed blocks may be missed during debugging:
breakpoints can't be set in removed blocks.  Cleanup may renumber
basic blocks, detect forwarder blocks, remove unused labels and
fallthrough forwarder blocks, merge blocks with unconditional
fallthrough, replace jumps to returns or jumps with copies of the
targets, simplify conditional jumps and remove single-destination
jumps.  The removal of fallthrough forwarder blocks may discard debug
binds and markers, which could make single-stepping or breaking at the
source locations represented by the removed markers impossible.  Binds
might also be lost, though at least in gimple there will often be
redundant binds at confluence points, shortly thereafter.  A similar
negative effect arises when a jump is replaced with a return or
another jump, bypassing any debug markers and binds at the original
target's block.

When optimizing, NOPs that would materialize CFG edge source locations
are not inserted, and extra steps that preserve source locations
during gimplification of jumps and labels are not taken.  If
corresponding debug markers are also dropped, this may remove the
possibility of stopping at some goto.

 #TODO_remove_unused_locals
Optimization enables unused local variables and lexical blocks to be
released early; it may cause variables and scopes that cannot ever be
entered to be omitted altogether from debug information.

 #pass_return_slot
Optimization enables the named return value pass, that detects
functions that return aggregate types in memory, always returning the
same local variable, and unifies that variable with the result, using
the name and source location of the variable, and mapping all uses of
the variable to the result.  This may have an effect on debugging if
the variable happens to be taken from an inlined function: in this
case, the source name and location mapping is skipped, because it
would introduce a name not present in the original function, but the
variable is still remapped to the return declaration, so the source
location of the variable's declaration is lost.

 #pass_cse_sincos
Optimization enables a pass that combines calls to sin, cos and cexpi
with the same SSA operand into a single dominating cexpi call, taking
the real or imaginary part of the result at each former sin or cos
call.  This pass also attempts to simplify pow, powi and cabs calls.
None of these affect debugging, aside from the ability to step into
any of the affected math functions.

 #pass_fold_builtins+
With optimization, a pass that simplifies memcpy to memset if the
copied-from range is known to be all zeros, some stdarg calls to
simple pointer operations if va_list is a simple pointer type, and
other similar transformations that do not affect debugging, aside from
stepping into or breaking at simplified functions.

 #pass_lower_vector+ #pass_lower_vector_ssa+
Optimization enables attempts to optimize divide and modulus
operations on vectors of integral types into combinations of vector
multiply, shift, and add.  It also enables attempts to optimize
initialization of vectors to avoid piecewise initialization.  None of
these affect debugging.

 #pass_expand+
Enabling optimization changes defer_stack_allocation behavior, but its
effect on debugging is limited to narrowing the live ranges of dead
values.

It also enables reordering of operations in expand, so that those
requiring more operands are performed first.  This reordering does not
involve memory-modifying operations, and debug binds cover affected
cases, so it does not affect debugging.

Expand also introduces plenty of pseudos when optimizing, which allows
replacement of common subexpressions and whatnot.  Conversely,
gimplification introduces more temporaries when not optimizing, and it
attempts to reuse temporaries when optimizing.  The effects on
debugging are limited to variations in variable location assignments.

 #pass_jump+ #pass_thread_prologue_and_epilogue+
The jump and pro_and_epilogue RTL passes run cleanup_cfg with
CLEANUP_EXPENSIVE, given optimize.  This performs some more expensive
block merging, and simplification of conditional jumps around jumps.
The merging has no effect on debugging (indeed, it could reduce the
loss of debug markers and binds if done on forwarder blocks), whereas
the simplification might drop markers and binds along with the jumps,
with impact on debugging similar to that of the other jump
simplifications.

 #pass_df_initialize_opt+
Several RTL optimization passes also use dataflow analysis to update
notes about unused register definitions, as well as death points of
registers.  Debug binds that reference registers after their death
points or unused sets are detected during this analysis, and debug
temporaries are introduced next to the death points to preserve the
equivalent expressions for use in the debug binds.  This generally
improves the debugging experience, enabling bind expressions to resort
to the equivalences to express the values bound to user variables even
if the register is reused for another purpose and no longer holds the
value.

 #pass_cse
The first CSE pass is enabled when optimizing.  The effects of this
pass are described under --rerun-cse-after-loop.  A third CSE pass may
be activated with --rerun-cse-after-global-opts.

 #pass_rtl_move_loop_invariants+
Depending on the selected register allocation model, optimization
changes register pressure cost estimates in the RTL loop analyzers,
but that's not something that changes the kinds of optimizations made
there, or the kinds of impacts on debugging they may have.

 #pass_initialize_regs
Optimization enables the init-regs pass, that adds zero-initialization
for pseudos before uninitialized uses, without effects on debugging.

 #pass_combine
Optimization enables combine, a pass that performs arithmetic
substitution of single-use pseudo-set insns into others.  After
successful substitution, insns become useless and are removed, but if
their values are still used in debug binds, the binds are updated
accordingly, and markers ensure the bind effects are still visible.
Therefore, this pass has no effect on debugging.

 #pass_ira+
It also changes the default register allocation region setting,
without effects on debugging.

 #pass_reload+
Optimization enables reload inheritance and removal of redundant
reload stores, without effects on debugging.

 #pass_split_after_reload #pass_split_before_regstack
Additional insn splitting passes are enabled after reload when
optimizing, without any effects on debugging; any impact would have
been brought about by later splitting passes anyway.

 #pass_fast_rtl_dce+
Several RTL optimization passes run a fast dead code elimination subpass,
at the end of the live registers dataflow analysis, as long as --dce
is enabled; see --dce (fast) for details.

 #pass_variable_tracking
Optimization enables variable tracking, debug binds and markers, to
try to mask the effects of optimizations on debugging.  They are not
needed without optimization.

 #pass_shorten_branches+
When optimizing, insn lengths are estimated with multiple passes that
grow lengths as needed, which may result in shorter variants, without
effects on debugging.

 #pass_final+
Final may discard redundant compares when optimizing.  It also links
back single-use labels to jumps to them, for use in machine-specific
transformations such as SH's constant pool placement.


--tree-ccp: #pass_ccp

Enable SSA-CCP optimization on trees.

Conditional constant propagation attempts to determine the value of
conditions that control conditional branches.  It may simplify (fold)
some calls and assigns into constant assignments, and turn conditional
branches into unconditional ones, possibly dropping blocks that become
unreachable.

The most significant effect on the debugging experience is that
setting breakpoints at certain source code ranges may become
impossible as the blocks containing them are dropped.  The extra
folding might make additional lines not be represented by any
instructions, but SFN provides markers to stand for them, and VTA and
LVu ensure the effects of the optimized-away code can be inspected
even without remaining instructions, so the overall impact of this
pass on the debugging information is likely negligible.


--tree-fre: #pass_fre

Enable Full Redundancy Elimination (FRE) on trees.

This pass uses value numbering to identify and remove redundant SSA
computations, replacing them with previously-computed results, while
also propagating copies, removing dead computations, folding
computations, and resolving conditional branches and indirect calls.
Changes are only relevant for debugging sessions that would modify
variables to create situations that wouldn't normally arise at
runtime.  The substitutions and folding have no effect on debugging,
unless variables are changed in the debugger so as to break the
equivalences.  Stmt removals are masked by debug binds, markers and
views.  Resolving conditional branches may remove entire blocks if
they aren't reachable to begin with, but the consequent inability to
set breakpoints on them could be surprising, especially if the
debugging session were to change variables so as to force the
execution of the unreachable block.  Resolving indirect calls to
direct ones might also surprise attempts to modify pointers in a debug
session, attempting to cause a different function to be called.


--tree-dse: #pass_dse

Enable dead store elimination.

This pass removes stores and mem* calls that modify memory that is
overwritten without intervening reads.  Addressable variables, that
might be modified by such removed stmts, are not tracked by debug
binds, so debugging sessions might be confusing as expected effects of
stores are not visible.


--guess-branch-probability: #pass_profile+

Enable guessing of branch probabilities.

No effect on debugging per se.


--tree-ch: #pass_ch #pass_ch_vect

Enable loop header copying on trees.

This flag is only activated when --tree-loop-optimize is activated.

This pass copies loop headers, turning the copies into entry tests.
Debug binds in the copied blocks are also copied to the post-loop
block, modeling the binds introduced after PHI nodes when entering
SSA.  With those additional bindings, duplicating the header blocks
does not impact debugging significantly within the copied blocks or
after them.  One possibly confusing consequence is that setting a
breakpoint at the current program counter, while single-stepping the
loop entry test, will not break at subsequent iterations, and
vice-versa.  This is unlikely to be surprising, and setting
breakpoints by line overcomes this effect.  User labels, that would
not be present in the copy, could make for further confusion, but if
they provide for additional edges into the loop header, they will
actually stop the transformation from taking place.

When --tree-loop-vectorize is enabled, another ch_vect pass is
activated, that differs from the regular ch pass only in deciding
which loops are to undergo such header copying, so both passes have
essentially the same effects on debugging.


--tree-dce: #pass_dce #pass_cd_dce

Enable SSA dead code elimination optimization on trees.

This may remove assignments, branches and even some calls that are
deemed unused/dead.  Dead assignments are propagated into debug stmts
before removal, which makes the removal itself not to affect
debugging.  Dead branches may cause entire blocks to be removed,
making any expectation of stepping through or setting breakpoints at
such blocks during debugging impossible to meet.  Pure or const calls,
as well as malloc and free pairs that are deemed dead may be removed,
frustrating expectations of stepping into them during debugging.


--ipa-profile: #pass_ipa_profile

Perform interprocedural profile propagation.

This pass propagates execution frequencies from callers to callees.
Also, upon identifying the target of an indirect call from execution
profiles, it introduces a speculative direct call that can then be
inlined or otherwise optimized.  None of this affects debugging.


--ipa-pure-const: #pass_ipa_pure_const #pass_local_pure_const

Discover pure and const functions.

Detect and mark functions on whether or not they have side effects,
loop, or throw, and propagate the information to decide about callers.
This, by itself, has no effect on debugging, but it may enable the
elision of calls that would return the same value, without any other
side effects, of functions that are not explicitly marked as pure or
const, and this elision may be slightly confusing for debugging, as
such functions may be called (and hit breakpoints) fewer times than
expected, and stepping into elided calls will not be possible.


--ipa-reference: #pass_ipa_reference

Discover readonly and non addressable static variables.

This pass analyses how static variables are used by functions, and
propagates the gathered information to callers, so that it can be used
in later optimizations.  There aren't any effects on debugging.


--tree-copy-prop: #pass_copy_prop

Enable copy propagation on trees.

This pass identifies and simplifies expressions based on copy-related
SSA names.  This may unify multiple variables into a single location,
in ranges in which they take up equivalent values, making it
impossible to modify them independently in the debugger.  The
identification of such equivalences may also resolve conditional
branches to unconditional ones, removing entire basic blocks and the
possibility of overriding the conditions in the debugger.


--tree-sink: #pass_sink_code

Enable SSA code sinking on trees.

This pass moves statements down the control flow, closer to uses
thereof, when it may be profitable, and removes them when they are
unused.  As the DEF is removed from a position that dominates a debug
bind, the bind is adjusted, masking the effects on debugging, at least
as far as scalars are concerned.  Addressable variables are not
subject to value tracking in debug binds, and so the delaying of
stores may actually be observable during debugging.


--tree-slsr: #pass_strength_reduction

Perform straight-line strength reduction.

This pass replaces computations involving multiplies into ones
involving adds, in some cases introducing additional temporaries.  In
the end, trackable variables end up getting the same values, just
computed in a different way, so this does not affect debugging.


--tree-coalesce-vars: #pass_expand++

Enable SSA coalescing of user variables.

This flag allows the compiler to assign to a single pseudo-register
SSA versions originally created for different user variables.  With
the aid of debug binds, this has very little effect on debugging: the
impact is limited to early loss of values expected to be about to be
overwritten, e.g. when an earlier value of a variable is already dead,
and the location holding it is overwritten by a value computed for a
temporary or for another variable, before being copied to the former
variable.  Between the computation point and the binding point,
attempting to inspect the variable may indicate it is optimized out at
that point, which is perfectly accurate, if undesirable from a
debugging perspective.


--tree-ter: #pass_expand+++

Replace temporary expressions in the SSA->normal pass.

This substitutes singly-used SSA defs into their single (non-debug)
uses for expand to have larger expressions to select insns from.
Debug binds may end up with more complex expressions than needed,
bound before the actual computation of the larger expression takes
place, but this does not affect debugging.


--defer-pop: #pass_expand++++

Defer popping functions args from stack until later.
No effect on debugging.


--split-wide-types: #pass_lower_subreg #pass_lower_subreg2

Split wide types into independent registers.

This flag enables two RTL lowering passes that explode wide-mode
pseudos into multiple word-mode ones.  In many cases this modifies
insns in place, but it occasionally emits multiple insns to replace a
single one.  In no such case does it affect debugging.  Such splitting
may be performed on user variables, and although we can represent
variable locations with independent locations for different fragments,
such wide variables do not always get debug binds at assignments for
tracking throughout compilation.  Location inference from DECLs
associated with REGs and MEMs is used for fragments of such variables
instead, which does correctly identify locations, but not necessarily
at points of the program that reflect the recommended inspection
points.  This may cause debugging sessions to observe changes to such
variables too early or too late, which can make debugging confusing.

Adding debug binds for the fragments, and arranging for GCC to
aggregate them back, might get more accurate information, but since
this would be done at such a late stage, it is possible that the binds
would be introduced at points that do not satisfy the usual
expectation that side effects would take place between the markers
immediately before and after the assignment.  There are also issues
with dismembered aggregates, mentioned under --tree-sra, that would
likely affect such split variables as well.


--forward-propagate: #pass_rtl_fwprop #pass_rtl_fwprop_addr

Perform a forward propagation pass on RTL.

These RTL passes replace uses of a pseudo with its single reaching
definition.  This in itself has no impact on debugging.  If a pseudo
is propagated into all uses, it will become unused, but then it will
have been substituted into debug binds as well and, if not, the unused
def might end up preserved as a debug temp.  There is a possibility
that, by propagating a pseudo, it becomes dead earlier, and then,
after register allocation, debug binds that referenced it while it was
still set end up finding the register reused for other purposes
earlier than without this transformation.  Since the propagation found
the source of the definition was available all the way to the
propagation point, and the equivalence between the propagated pseudo
and its definition is noted by the variable tracking machinery at the
definition point, it is very likely that an alternate expression for
the register value will be found.


--dse: #pass_rtl_dse1 #pass_rtl_dse2

Use the RTL dead store elimination pass.

This flag is enabled by default, but it's only activated when
optimizing.  The passes enabled by it remove stores in memory that are
overwritten without intervening reads, that store the same value as
the previous store, or that write a value to the stack that is not
read before the function returns.  Since it affects addressable
variables, global or local, debug binds do not apply, and so the
effects of removing these stores are going to be noticeable in
debugging, except for the redundant stores.


--auto-inc-dec: #pass_inc_dec

Generate auto-inc/dec instructions.

The flag is enabled by default, but it's only activated when
optimizing, and when the target architecture supports auto inc or auto
dec addressing modes.

It detects insns that add or subtract a constant or pseudo from a
pseudo before or after the pseudo or a copy thereof is used in a
memory reference, and it attempts to turn the memory address into a
pre- or post-inc, -dec or -mod addressing mode.  This may cause one of
the pseudos to change earlier or later than expected, and although
this is only done when the pseudo is not otherwise used between the
original and modified modification insns, debug binds between them are
not adjusted, so they will bind to the wrong value, and when the
pseudo is modified even that incorrect location may be lost.


--ira-share-save-slots: #pass_ira++

Share slots for saving different hard registers.

The flag is enabled by default, but it's only activated when
optimizing.  It allows registers whose lifetimes do not overlap to be
saved in the same slot across calls.  This could shorten the apparent
live range of variables, making them unavailable at spots in which
they might be in the absence of this flag.


--omit-frame-pointer: #pass_ira+++

When possible do not generate stack frames.

This flag attempts to avoid reserving and using a register as a frame
pointer, using stack pointer-relative addresses as needed.  A frame
pointer register used to be essential for debugging, but call frame
information obviated it: it is now irrelevant for this purpose, and
this optimization has no effect on debugging.


--compare-elim: #pass_compare_elim_after_reload

Perform comparison elimination after register allocation has finished.

This pass removes redundant compare insns, relying on insns that set
flags as side effects instead.  It has no effect on debugging.


--shrink-wrap: #pass_thread_prologue_and_epilogue++

Emit function prologues only before parts of the function that need it,
rather than at the top of the function.

This pass attempts to inserts the prologue sequence at a later point
than the entry point, which may involve duplicating some blocks and
moving non-prologue early insns down to other blocks.  The moved insns
are simple enough that debug binds can be adjusted and mask the moves,
so it does not affect debugging.  Block duplication has little to no
impact on debugging, though breakpoints set based on code addresses,
rather than on logical locations, may notice the difference.  The
later prologue may confuse debuggers that assume the end of the
epilogue, noted in debug information, marks the beginning of user
code: such debuggers will likely be significantly affected by this
optimization.


--combine-stack-adjustments: #pass_stack_adjustments

Looks for opportunities to reduce stack adjustments and stack references.

This flag consolidates consecutive stack allocations, consecutive
stack deallocations, or deallocations followed by allocations, within
single blocks, adjusting stack pointer-relative addresses as needed.
It has no effect on debugging.


--cprop-registers: #pass_cprop_hardreg

Perform a register copy-propagation optimization pass.

AFAICT this only replaces (pseudos assigned to) hard regs in SET_SRCs
with earlier-defined equivalent values, and removes noop moves.
Substitutions are made in debug bind insns too.  So, aside from noop
moves that stood for source lines on their own in non-SFN settings,
this shouldn't affect the debugging experience in any way.


--dce (fast): #pass_fast_rtl_dce

Use the RTL dead code elimination pass.

This flag is enabled by default, but the fast rtl_dce pass is only
activated when optimizing.  Insns are regarded as dead if they only
set registers and none of them are live.  Dead sets used in debug
binds are preserved in debug temps, so this does not affect debugging.


--reorder-blocks: #pass_reorder_blocks

Reorder basic blocks to improve code placement.

The reorder blocks pass attempts to increase the number of fallthrough
edges by moving basic blocks.  This may remove the possibility of
breaking at explicit goto statements.


--delayed-branch: #pass_delay_slots

Attempt to fill delay slots of branch instructions.

This pass moves insns about, attempting to fill delay slots on arches
that support them, most often of calls, branches, jumps and returns.
It runs after var-tracking, and it may move insns across debug bind
notes that would be affected by it, potentially confusing location
information.  It may create opportunities for jumps to jumps to be
redirected to the ultimate jump target, which may invalidate
breakpoints that could have been set at the bypassed jumps.  On a few
arches, calls followed by jumps may have their delay slots filled with
insns that modify the register holding the return address for the
call, which may confuse debuggers as to the point of the call,
including the recovery of entry-point values from the caller frame and
location information.

Conditional markers might enable CFG simplifications without
invalidating breakpoints, but failing that, it would probably be wise
to disable this and return address adjustments at -Og.


--merge-constants: #varasm+

Attempt to merge identical constants across compilation units.

With this flag, constant pool entries and other constants that do not
amount to objects that may have their addresses taken and compared (or
--merge-all-constants is given, requesting even such read-only objects
to be merged), are emitted in mergeable sections so that the linker
can detect and remove duplicates.  This may affect debugging
inasmuchas the address/identity of the unified objects matters; since
so-unified objects are usually string literals and initializers,
rather than user-visible variables, this should seldom if ever affect
debugging.


-O1: optimize=1

Perform only very fast optimizations.

This option sets the optimization level to 1.

With -O0 or -Og, the maximum vectorization factor for OpenMP is
limited to 1.  At -O1 or higher, target-specific vector sizes are used
instead.

 #pass_merge_phi
Basic blocks containing only PHI nodes, debug binds and markers may be
dropped altogether by the mergephi pass.  Dropping markers could make
some statements impossible to stop at when stepping, and dropping
binds makes their side effects not visible, so that earlier binds seem
to remain effective.  It might be possible to move the binds and
markers into the destination block so as to keep them as conditionals.

 #pass_tree_ifcombine
Pairs of tests guarding conditional blocks in && or || arrangements
may be combined into a single test by the ifcombine pass.  The block
holding the second test becomes unconditional, so any markers and
binds in it will take effect even when they shouldn't.  Further
optimizations are enabled if the then block is a forwarder to the else
block, or vice-versa (a forwarder block is empty except for phi nodes,
debug binds and markers).  These may further confuse debugging
changing the situations in which the forwarder's binds and markers
take effect.  Conditional binds and markers may alleviate these
problems.

 #pass_laddress
The laddress pass lowers address-taking operations that are not
invariant, so as to expose the computations involving offsets and
array indexing to optimizers.  It has no effect on debugging.


--tree-bit-ccp: #pass_ccp+

Enable SSA-BIT-CCP optimization on trees.

This flag modifies slightly the behavior of the SSA tree-ccp pass, so
that it keeps track of individual bits in SSA registers, rather than
just entire registers.  This allows some further simplifications,
especially of conditional branches based on individual bits.

This does not introduce any new kind of impact on the debugging
experience but it may make further blocks unreachable and thus
unavailable for breakpointing, and further assignments reduced to
reuse of constants without additional code.


--tree-forwprop: #pass_forwprop

Enable forward propagation on trees.

This pass, enabled by default but activated only at -O1 or higher, is
run up to 3 times on each function.  It substitutes expressions
assigned to SSA names into uses thereof, folding statements in place.
This doesn't affect debugging, but other transformations made by these
passes do.  Loads of complex types whose real or imaginary parts are
used separately are broken up into separate component loads, but debug
binds referencing the complex value loaded from memory are reset,
degrading debug information: the bind stmt might be adjusted instead.
Stores of complex values are also split up, without effect on
debugging.  Expressions taking the address of variables, and possibly
adding offsets to them, may be substituted into indirections, enabling
variables to become non-addressable and turned into SSA form, as in
--tree-phiprop.  The conditions in conditional branches may be folded
to constants, which changes the control flow graph and can render
entire blocks unreachable.  Likewise, simplifications in switch
expressions may rule out some case targets.  It may combine memcpy and
memset calls to neighbor ranges into a single memcpy, which may affect
debugging if the pointer returned by the memset call is referenced in
debug binds.  Additional specialized transformations involve bit
rotations, permutations, bitfield refs and vector constructors, but
none of these affect debugging.


--tree-sra: #pass_sra_early #pass_sra

Perform scalar replacement of aggregates.

This flag enables passes that turn members of aggregates that would
normally live in memory into stand-alone scalars that can be optimized
like registers.  The original aggregate object may in some cases be
fully taken apart, but when it is still used as a whole, the scalar is
"spilled" back in place and "reloaded" as needed.

After assignments to the scalar introduced by these passes, as well as
spills and reloads, debug binds are introduced so that var-tracking
can keep track of the fragments of the aggregate, so this pass should
be transparent as far as debug information is concerned.

Unfortunately, there are problems or limitations in the var-tracking
pass that cause us to not use the annotations for the scalarized
members, at least in cases in which the aggregate as a whole is small
enough to be regarded as an SSA register.  Some investigation to
var-tracking is needed to determine how to use at least the
conflicting notes that apply to both the whole aggregate and the
scalarized member, but this may turn out to show significant
shortcomings in VTA (variable tracking at assignments) and require
some work to make use of the available annotations so as to bring
debug information quality of (fully- and?) partially-scalarized
aggregates in line with that of scalars.

Another notable limitation introduced by this pass is that dismembered
aggregates can no longer be used in inferior calls that expect
references or pointers.


--tree-loop-im: #pass_lim

Enable loop invariant motion on trees.

Although this flag is enabled by default, the pass is omitted from the
set of passes activated at -Og, so it is only run at -O1 or higher.

This pass moves invariants out of loops, and performs store motion.
Floating-point divides and shifts for bit tests may have invariant
divisors and shifted bits rearranged for hoisting, without impact on
debugging.

Access to memory at an invariant address may be turned into a SSA
scalar, with a load at the loop entry and a store at the loop exit;
such early loads and delayed stores may be confusing for debugging.

Invariant computations are moved to the edge into the loop from the
preheader, after being removed from their original position.  The
removal triggers propagation into debug binds, which preserves bind
equivalences but drops the actual location, and becomes more fragile.
With a bit of additional effort, it would be possible to keep the
binds unchanged.  Still, this movement should have little to no impact
on debugging.


--tree-dominator-opts: #pass_dominator #pass_phi_only_cprop #pass_uncprop

Enable dominator optimizations.

Although this flag is enabled even at -Og, the passes controlled by it
are omitted from the set of passes activated at -Og, so they are only
run at -O1.

It propagates constants and copies into uses, folds expressions,
attempts to resolve conditionals, eliminates redundant computations
and redundant stores, replaces inequalities with equality tests,
propagates coalescible SSA names equivalent to PHI values incoming
from each edge, propagates and removes degenerate PHIs, and performs
jump threading.

The only transformation that has any significant effect on the debug
experience, given that VTA, SFN and LVu mask the effects of the
others, is jump threading.  See the effects of (gimple) jump threading
under --tree-vrp.


--inline-functions-called-once: #pass_ipa_inline++

Integrate functions only required by their single caller.

This option works as an enabler for certain cases of inlining, in
that, if this option is disabled, or optimization is disabled, for a
function or for any of its callers, and no other flag or attribute
mandates or enables inlining, then the possibility of inlining into
all callers and not emitting an out-of-line copy will not even be
considered.  Oddly, the "called once"/"single caller" bit seems to be
a left-over artifact of earlier implementations: there doesn't seem to
be any test involving the caller count in the inlining code paths
activated by this flag.

Inline substitution, per se, is not usually a significant source of
debug information degradation: any piece of debug information that
could be represented in the out of line function can be and is equally
represented for each inlined copy.  Potential loss arises out of
debug-lossy optimizations, when performing transformations that are
enabled or strengthened by the additional information available when
analyzing both the caller and the callee in a single context.  For
example, the inline expansion of a function within a loop that is
unrolled may face significant ambiguity as to how many inlined copies
of the function are there, how far scopes in each copy extend,
especially if instructions of different iterations are shuffled
together by e.g. modulo scheduling.

Another situation in which inlining may affect the debug experience
significantly is that of heavy use of abstraction calls.  As large
numbers of nearly empty, abstraction-only functions are inlined, the
density of code vs debug annotations becomes low, and the risk of
hitting upper limits on debug annotations counts grows.  When they are
hit, such annotations as debug markers and binds may be dropped,
removing the compiler's ability to mask the effects of optimizations
on debugging.  The loss of markers removes the linearity of
single-stepping and the robustness of the relationship between source
locations in the program and observable effects that they bring.  The
loss of debug binds takes with it much of the possibility of observing
variables not held in stable memory locations.  Such degradation, that
takes debug information back to the days in which the debugging of
optimized programs was reasonably held to be unreasonably difficult,
may sometimes be avoided at the expense of significant compile time
and memory, using such parameters as "max-debug-marker-count",
"max-vartrack-size", "max-vartrack-expr-depth", and
"max-vartrack-reverse-op-size".


--ssa-backprop: #pass_backprop

Enable backward propagation of use properties at the SSA level.

This flag is enabled by default, but the pass is only activated at -O1
or higher.

It detects numeric variables whose sign does not matter, and optimizes
away operations that affect only their sign.  Debug binds referencing
modified SSA DEFs are adjusted when possible, but since some cases
involve function calls and those do not belong in debug binds, some
binds may be lost, and others, especially after PHI nodes, may be
bound to expressions that have their signs reversed, which may be
confusing.


--tree-phiprop: #pass_phiprop

Enable hoisting loads from conditional pointers.

This pass, enabled by default but activated only at -O1 or higher,
replaces phi nodes whose incoming args all take the address of a
scalar value, and are later dereferenced, into phi nodes that take the
scalar values directly.  The pass makes sure that the loaded memory
values cannot change between the load points, original and optimized,
but this transformation might affect debugging if it involves
modifying any of the affected memory variables, as the values may have
already been loaded.  It may also cause a variable that was
addressable to become non-addressable and promoted to an SSA register.
Debug binds would only be assigned at the time of this promotion,
which may be too late to capture assignments that might have already
been moved or optimized out.  As a result, such variables, promoted to
non-addressable, will have worse location tracking than scalar
variables that never have their address taken, but no worse than if
they had remained addressable all the way.


--tree-pta: #pass_build_alias #pass_build_ealias #TODO_rebuild_alias

Perform function-local points-to analysis on trees.

This just computes more refined alias sets, it doesn't make any
transformations, so whatever effects it might have in the debugging
experience are indirect.


--stdarg-opt: #pass_stdarg

Optimize amount of stdarg registers saved to stack at start of function.

The code enabled by this flag estimates the maximum sizes of
general-purpose and floating-point registers areas used in a stdarg
variable argument list function, so as to limit the number of
registers that need to be saved.  This does not affect debugging.


--tree-builtin-call-dce: #pass_call_cdce

Enable conditional dead code elimination for builtin calls.

Although this flag is enabled even at -Og, the pass is omitted from
the set of passes activated at -Og, so it is only run at -O1.

This pass replaces builtin calls with simpler operations, and/or
guards the operation by conditions that decide whether or not to
execute the call, replaced or not.  This may be slightly confusing
when setting breakpoints at the omitted calls, or attempting to
single-step into them.


--tree-cselim: #pass_cselim

Transform condition stores into unconditional ones.

This flag is enabled by default when there is a conditional move
instruction, but the pass is only activated at -O1 or higher.

The pass moves gimple stores in conditional blocks to subsequent join
blocks, introducing PHI nodes to select the value to be stored.
Addressable variables rely on var-tracking (MEM annotations) rather
than var-tracking-at-assignments debug binds, so moving stores cause
observable changes in the debug experience: if a variable that should
be modified by a store is inspected after the expected store point,
but before the replacement store is executed, an outdated value will
be found.

I wonder if it might be possible to insert debug binds to temporarily
override the location of variables that live in memory most of their
lifetime, so that such deferred writes could be reflected in location
lists, and observed immediately through such a bind, in spite of the
deferred execution of the store.

As in --hoist-adjacent-loads, the moves could leave the conditional
blocks empty, which could make it impossible to set breakpoints at
lines within them or to single-step into them, as SFNs get dropped
along with the removed blocks.  Unlike the combined stores from
if/then/else structures, sunk stores from else-less then blocks (or
from else blocks with empty then blocks) retain their location
information, so one might be able to stop at them even when the
conditional block to be executed does not include that line.  This can
all get confusing, and it could be alleviated with conditional binds
and markers.


--ssa-phiopt: #pass_phiopt

Optimize conditional patterns using SSA PHI nodes.

This pass performs various transformations (see --hoist-adjacent-loads
for more) that may drop small or empty conditional blocks, combining a
test and a conditional assignment (represented as a PHI node) into a
flag-store, an abs, min, or max expr.  If a temporary is needed, it
may be cloned from the phi result, but that will then be placed in one
of the operands of the original PHI node, so any debug binds
referencing the original result remain correctly unchanged.  The
potential negative impact on the debug experience of these
transformations is limited to the removal of a conditional block, with
diminished ability to step into the block or set breakpoints in it,
and the potential of an early (temporary) overwrite of the location of
the variable that will eventually hold the join value, which might
make the variable impossible to inspect or modify after such
overwrite.  The 3-way min-max cases do not change this picture much,
except for the possibility of loss of visibility of the result of the
intermediate assignment, as bind and marker are removed along with the
conditional block.

Another situation in which a conditional block may be eliminated is
that in which both edges out of the condition yield the same value for
the PHI (e.g. x != a ? a : x simplifies to a).  Such simple cases of
value unification have just the usual impact of removing a conditional
block, but more elaborate cases, with multiple assignments computing
the result of the conditional block, have the assignments, but not
markers or binds, moved out of the conditional block, with the usual
consequences of difficulty of steping into the removed block, or
inspect the results of computations whose debug binds were dropped,
before the debug binds at a subsequent join point, if any.

Yet another transformation is factoring a conversion out of a PHI
node.  If both incoming edges perform the same conversion, or if one
is a constant and moving the conversion after the join is still found
potentially profitable for enabling other optimizations, a new PHI is
introduced with type and values prior to the conversion, the original
conversions are removed, a new conversion stmt is introduced at the
top of the join block, storing in the original PHI result, and finally
the original PHI def is removed.  This transformation does not remove
any block, the original conversions can be propagated into any debug
binds, and the new conversion (without location information) is
inserted before the debug bind of the original PHI node.  The final
removal of the original PHI node does not reset debug binds, because
we skip propagation into binds upon PHI node removal, and the
conversion assignment becomes the new definition.  The moved
conversions can still be inspected, thanks to SFN and VTA, and the
converted value is bound to the variable that takes that value at the
join point too, so this transformation does not affect the debug
experience.


--tree-reassoc: #pass_reassoc

Enable reassociation on tree level.

Although this flag is enabled by default, the pass is omitted from the
set of passes activated at -Og, so it is only run at -O1 or higher.

This patch rearranges multiple stmts that perform the same operation,
say addition, ordering operands by rank and issuing multiple
operations in parallel when that's advantageous.  This ends up
removing nearly all of the original stmts and issuing new ones, using
new SSA names.  Debug binds retain the original operations, and
markers allow them to be inspected when single-stepping.  The
reassociation might insert extraneous calls, however, e.g. turning
repeated multiplies into powi calls; this might be slightly confusing
if stepping into calls.  Range tests in conditional branches may end
up simplified, making the branches unconditional, and rendering some
blocks unreachable, which prevents setting breakpoints in them.


--tree-loop-optimize: #pass_fix_loops #pass_tree_loop #pass_tree_no_loop

Enable loop optimizations on tree level.

This flag is enabled by default, but it is only activated when
optimization at -O1 or higher is enabled.

When activated, this flag enables a pass that detects loops and
gathers information about them.  If the flag is activated and loops
are found in a function, then various loop passes are run over that
function; otherwise, only the pass enabled by --tree-slp-vectorize
will.


--tree-scev-cprop: #pass_scev_cprop

Enable copy propagation of scalar-evolution information.

This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.

If scalar evolution determines that a PHI node is invariant, replace
uses thereof, including those in debug binds, by the invariant.  This
has no effect on debugging.

It also computes, through scalar evolution, the final value of
variables modified in loops, dropping the PHI node in favor of a
computation based on values known before the loop is entered.  This
may affect debugging when the removal of the PHI node resets a debug
bind referencing it, but the bind could be preserved, since a new,
equivalent definition will be introduced.


--tree-loop-ivcanon: #pass_iv_canon #pass_complete_unroll+ #pass_complete_unrolli+

Create canonical induction variables in loops.

This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.

This pass estimates the number of iterations of each loop, identifies
exit edges and removes those whose conditions are never met, based on
gathered information about the maximum number of iterations.  It
attempts complete loop unrolling and completes if that
succeeds.  Otherwise, if the loop meets certain conditions, a
countdown induction variable is introduced and the loop exit test is
replaced so as to compare this variable with zero.

The only transformations that minimally impact debugging are the
removal of loop exits, which may render some unreachable blocks
unavailable for setting breakpoints (that would never be hit), and
loop unrolling, that uses the same machinery and has the same effects
on debugging that loop peeling (see --peel-loops).


--ivopts: #pass_iv_optimize

Optimize induction variables on trees.

This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.

For each loop, after detecting base and general induction variables
and selecting the optimal set, any new, artificial induction variables
are created and added to the loop.  Then, uses of induction variables
not chosen for the optimal set are rewritten in terms of the optimal
set, adjusting their original assignments or inserting new assignments
instead of phi nodes.  Finally, assignments to induction variables set
to be removed are propagated into debug binds, if needed, and then
discarded.

Alas, propagation into debug binds may lose plenty of useful
information: PHI nodes cannot be propagated into binds, and regular
assignments are not removed so that, say, if a definition of A is used
in a definition of B and both are to be removed, we get a chance to
propagate B and then A into debug binds that referenced only B.  If we
happen to remove A first, uses of B in debug binds end up having to be
reset, losing relevant location information.


--inline-atomics: #pass_fold_builtins++

Inline __atomic operations when a lock free instruction sequence is available.

This flag is enabled by default, but the transformations described
herein, part of the fold builtins pass, are only activated at -O1 or
higher.

Various atomic operations are turned into atomic bit test and set,
complement or reset.  The transformation may invalidate user variables
used only in compares with zero.


--if-conversion: #pass_rtl_ifcvt #pass_if_after_combine

Perform conversion of conditional jumps to branchless equivalents.

Various situations in this pass remove tests, conditional branches and
basic blocks.  This can make for very surprising single-stepping into
the blocks guarded by the conditions, as lines that would not be
expected to run given the condition actually get to run, or
vice-versa.  SFNs don't help, they just reinforce whatever block
execution is taken, or get dropped altogether.

Aside from the confusing single-stepping, the block removal might (but
likely doesn't) cause GCC to lose track of debug bindings.  In theory,
at confluence points (when entering SSA), we introduce additional
debug binds that allow GCC to recover from the loss of bindings in the
separate branches.  These should allow GCC to get back in sync with
the result of the if-converted assignments at the confluence point, so
at least after the confluence point, the bindings should have been
recovered: if-converted sets will be inserted before the
confluence-recovering debug bind.

These transformations usually apply to a single assignment in each
conditional block, but there is support for turning multiple
assignments in a then block into multiple assignments from
IF_THEN_ELSE (cond, then_value, orig_value) too.  There aren't further
debugging complications in this case, but the blocks can be much
longer, breaking users' expectations of single stepping for longer.
SFN might make all of this worse, in that the statement markers in the
conditional blocks are actually dropped, so you don't get to step into
the blocks any more.

Support for conditional markers and binds could alleviate the effects
of these transformations.


--move-loop-invariants: #pass_rtl_move_loop_invariants

Move loop invariant computations out of loops.

This pass identifies SET insns that are invariant within a loop, and
moves them to the loop preheader, possibly using a new pseudo to hold
the invariant, or replaces them with a copy from the pseudo holding an
equivalent invariant.  Debug binds remain in place and need not be
adjusted, as the transformations ensure the values are available in
the original pseudos at the points right after the original SETs,
where the binds will tend to be.

The only risk I can see to debuggability is that moved insns, and
insns leading to equivalences that may end up dead and removed at
later passes, may leave lines of code without any insns standing for
them.  The use of SFN and LVu information in debuggers, enabling them
to stop at and inspect the state even at such lines, removes this
potential problem.


--branch-count-reg: #pass_rtl_doloop

Replace add, compare, branch with branch on count register.

This pass replaces the conditional branch at the end of a loop with a
single decrement-counter-and-conditionally-loop sequence, when the
loop iteration count can be computed.  The original loop counter is
not removed by this pass, so this pass by itself does not affect debug
information.  However, the original loop counter may become unused,
and then be optimized away, and then it is unlikely that the generic
adjustments to debug bind statements will be able to realize it can be
computed from the newly-introduced loop counter.  There is room for
improvement, adjusting the debug binds of the original loop counter in
terms of the new related IV.  This might require some additional
infrastructure that could likely be generalized and used for IVs in
general.


--if-conversion2: #pass_if_after_reload

Perform conversion of conditional jumps to conditional execution.

This pass turns insns in then and else blocks into COND_EXEC, enabled
by the if condition (then) or its negation (else), removing the
conditional branch, the branches at the end of the conditional blocks,
and bringing it all into a single basic block.

It does not modify or remove debug insns, so single-stepping will
enter and execute both blocks, though the side effects of insns whose
condition is not active will not be executed.  In general, insns that
modify a variable will be followed by a debug insn that binds the
variable to the location holding its modified value.

Although debug insns don't have conditional binds, the location of a
variable often (but not always) remains the same across modification.
In the cases it doesn't, only the bind at the confluence of the
conditional blocks will get the variable location and value back in
sync.

In addition to the post-confluence point, a variable modified within a
block turned into conditionally-executed insns can also be correctly
inspected right after an (active) assignment to it, i.e., the
conditional assignment that would have been executed should the
conditional blocks have remained separate.  SFN and LVu technology
help make sure there will be a usable inspection point with the
correct bindings at that point.

At other points in the combined block, variables potentially modified
in it may be regarded as bound to a stale or unused location holding
an unrelated or uninitialized value, corresponding to what would have
been assigned to the variable in the other block.  This can get
confusing if one does not realize that the block that is apparently
being executed was not the one corresponding to the guarding
condition.

All of these caveats of conditional execution only apply in the
somewhat unusual cases in which the location of the variable actually
changes.  Because of control flow confluence and variable value
unification at that point (regardless of the debug bind at the
confluence point), it will most often be the case that the variable
lives at the same register or memory location throughout the
conditionally executed blocks, so the degradation of the debugging
experience by this pass, although possible, should be rare.

Debug binds and markers cannot currently be marked as conditional;
making that possible could further alleviate the impact of this
transformation.


-Os: optimize=2 + size

Perform optimizations that tend to reduce the code size.

This option sets the optimization level to 2, in a mode that assigns
higher priority to reducing code size.

Optimization at level 2 or higher extends tests on whether memory
references may overlap with affine combinations analysis.  This may
infer non-aliasing in cases lower optimization levels wouldn't,
enabling further optimizations, but nothing with effects on debugging
that couldn't be had in other more obvious cases of non-aliasing.

 #pass_complete_unrolli
Optimization level 2 or higher enables a pass that completely unrolls
inner loops that iterate just a few times.  Unrolling uses the same
machinery that performs loop peeling (see --peel-loops) and, by
itself, does not affect debugging.

 #pass_early_remat
An early rematerialization pass runs at optimization level 2 or
higher.  It rematerializes pseudos whose live ranges cross calls by
copying the reaching definition insns between calls and uses.  The
pseudo may then be regarded as dead before the call, which might reset
binds after the new death points, even when they could be adjusted so
as to refer to the definition that will be used for rematerialization.
In some cases, however, the expression may be lost entirely, but even
when it is preserved, it might be too complex to be recognized as
unchanged when the pseudo is rematerialized, so locations or values
based on the pseudo might be lost.

 #pass_ira++++
Optimizing for size changes the default register allocation region
setting back to the one used when not optimizing.


--expensive-optimizations:

Perform a number of minor, expensive optimizations.

 #pass_thread_jumps
Gimple jump threading is one of the significant transformations
enabled by this flag; see the effects of jump threading on debugging
under --tree-vrp.

 #pass_optimize_bswap
The bswap gimple pass, also enabled by expensive optimizations,
recognizes shifts and rotates equivalent to byte-swap transformations,
and replaces them with a byte-swap builtin.  Any user-visible
intermediate computations should have debug bind statements that will
ultimately be adjusted and preserved even if the computations
themselves are dropped, but some stmt moving, replacing, and
inserting-then-removing, might actually mess up debug bind tracking of
the final value.

 #pass_optimize_widening_mul
Another expensive optimizations pass is widening_mul.  It recognizes
various opportunities for math optimizations, such as fusing multiply
and add, testing overflows on adds or subtracts, and combining divide
and modulus into a single operation.  Final assignment stmts are
replaced and stmts performing no longer needed computations are
removed in a way that doesn't harm debugging.

 #pass_strength_reduction+ #pass_expand+++++ #pass_combine+ #pass_cse+ #pass_ira+++++ #pass_reload++ #pass_postreload_cse
Some of the changes brought about by this flag are additional
canonicalization of addresses when comparing base addresses in alias
analysis, searching for alternate base addresses in gimple strength
reduction, loop iteration count estimation even for loops with
multiple exits, taking conflict counts into account when ordering SSA
names for coalescing, combination of temporary slots for automatic
variables, reuse of wider-mode ANDs and MEMs for CSE, simplifications
and cheap extensions in combine, slightly more elaborate selection of
register class preferences and attempts to decrease the number of live
ranges in the integrated register allocator, removal of some unneeded
reloads, and additional post-reload combine and CSE subpasses.  None
of these modify passes in ways that impact debugging but that aren't
similarly impacted without this flag.

 #pass_duplicate_computed_gotos
Another of the expensive optimizations is the compgotos RTL pass, that
duplicates each small-enough block ending in computed jumps and merges
the copies with predecessors that have it as their single successor,
with no effects on debugging.


--strict-aliasing:

Assume strict aliasing rules apply.

This flag limits the cases in which pointer accesses may alias, but
that does not enable any kind of transformation with impact on
debugging that could be incurred otherwise, using pointers known not
to alias through other means.


--caller-saves: #flag_ira+

Save registers around function calls.

Without this flag, pseudos that live across function calls will not be
assigned to call-clobbered registers.  With it, they may end up in
such registers, and then they will be saved in a stack slot as needed
before calls, and restored as needed before other uses.  In case a
debug bind references the register at a point in which the register
might be clobbered, it is adjusted to refer to the stack slot.  Since
VTA notices the saves and restores and realizes the register and the
stack slot hold the same value, and regards call-clobbered registers
as such at calls, we end up with variable locations that reflect the
saving and restoring.  This allows variables assigned to
call-clobbered registers to be inspected even while they live in stack
slots.  Modifying such variables in a debug session, however, is not
guaranteed to work: variable tracking does not find out which of the
copies GCC regards as the primary one, if there is one, it just
notices when a copy may no longer hold the current value and, at such
points, seeks alternate locations holding it.  So debug information
may suggest modifying the memory slot will change the variable, even
though the variable has already been loaded into the register and
won't be reloaded from memory again, or vice-versa.  The caller-save
implementation might be able to overcome this by issuing notes to be
used by variable-tracking to enforce the location changes.


--vect-cost-model=cheap: #pass_vectorize+ #pass_slp_vectorize+

Use the cheap cost model for vectorization.

This affects --tree-loop-vectorize and --tree-slp-vectorize decisions,
but not the kinds of transformations they make.


--tree-vrp: #pass_early_vrp #pass_vrp

Perform Value Range Propagation on trees.

This flag activates two different passes: early vrp and vrp proper.
Early vrp is simpler in that it is not iterative, going through basic
blocks once in dominance order rather than using the SSA propagation
engine.

Once the range assigned to an SSA name is narrowed down to a single
constant, subsequent statements referencing the name can be propagated
into and possibly folded, and the definition may be removed.
Conditional statements may be simplified, removing edges and basic
blocks.  Expressions in other statements may also be simplified based
on ranges.

Such simplifications, in themselves, do not affect significantly the
debugging experience.  Removed definitions, if mentioned in debug
binds, will be propagated into them and preserved there, with markers
and views enabling them to be single-stepped and inspected; otherwise
simplified statements remain in place with the same outputs, and don't
require any debug information changes.  Simplified conditions may
cause entire blocks to become unreachable and be removed, which would
stop placing breakpoints at them, but such breakpoints wouldn't be
reached anyway.


At the end of VRP proper, (gimple) jump threading takes place, using
value ranges to simplify conditional stmts to tell whether outgoing
edges of threadable blocks can be determined from incoming edges.

Gimple jump threading duplicates a block when arriving at it through a
certain incoming edge implies exiting it through a certain outgoing
edge.  This duplication, in itself, does not affect the debug
experience: the copied block carries as much debug information as the
original block.  During threading, however, there are blocks that are
not copied, namely forwarding blocks.  From a codegen perspective, all
they seem to do is to jump to another block.  From a debug experience
perspective, however, they may contain plenty of bind statements and
markers, and those are not duplicated: binds are consolidated so that
only the latest bind to each variable is copied, and markers are
dropped entirely.  This arrangement, intended to reinforce binds after
newly-introduced confluences, drops debug binds that would not be
observable before the introduction of markers and views.  With markers
and views, dropping the blocks in favor of bind consolidation amounts
to significant loss.  Effects need to be assessed, as forwarding
blocks and leading/trailing debug stmts may end up removed by CFG
cleanup.  Better means to preserve them when consolidating forwarding
blocks guarded by optimized-out conditions may be needed: conditional
markers and binds are a possibility to explore.


--tree-dce (aggressive): #pass_cd_dce+

See --tree-dce.  At optimization level 2 or higher (i.e., starting at
-Os), the second tree dead code elimination pass is run in aggressive
mode, that takes control dependences into account, enabling additional
conditional branches to be eliminated.  This does not, however,
fundamentally change the kinds of effects these passes have on
debugging.


--ipa-sra: #pass_early_ipa_sra

Perform interprocedural reduction of aggregates.

This pass modifies the argument list of a function that takes
aggregates as arguments, splitting them into scalars, and adjusting
the callers.  The impact on debugging could possibly be no different
from that of --tree-sra, but the parameter transformations do not
retain any traces of the original parameters that could have variable
location information generated in a way that reconstructed the
original object, or even that tracked each replacement scalar
parameter separately.  This would require infrastructure to somehow
retain the original parameters and describe how they map to the
replacement parameters.


--optimize-sibling-calls: #pass_tail_recursion #pass_tail_calls

Optimize sibling and tail recursive calls.

This enables two separate passes.  One attempts to turn tail recursion
into loops, the other marks non-recursive tail calls as such, so that
the expander emits them as jumps rather than calls.

Neither transformation affects debugging within an activation of a
function, but they do affect debugging in that call stacks may be
missing expected frames, stepping over a tail call would require
additional logic in the debugger and the call would not return to the
expected caller, and setting a breakpoint at the entry point of a
recursively tail-called function may miss the recursive tail-calls.


--tree-switch-conversion: #pass_convert_switch

Perform conversions of switch initializations.

This activates switch statement lowering alternatives that may be more
efficient than the jump tables or decision trees that are otherwise
used.

One of the lowering possibilities uses the switch value as a shift
count, and then uses bit tests instead of multiple equality tests.  No
visible effects on the debug experience are expected from this.

Another turns a switch statement with all cases containing assignments
of constants to the same variables into arrays of the constants and
assignments to the variables from indexed elements of the arrays.
This collapses the code for all (in-range) cases into a single block,
losing any debug annotations they might contain.  This ultimately
prevents stepping into the switch statement or breaking at any of the
cases.  Optimized-out assignments that might have been preserved in
such annotations will be lost altogether.  As for assignments that are
handled by this transformation, even though debug binds in the cases
are lost, binds introduced by VTA after the post-switch PHI nodes will
enable the variables to be inspected afterwards.


--partial-inlining: #pass_split_functions

Perform partial inlining.

This flag enables splitting of functions, so that a part will be
inlined while another part remains as a separate out of line function.

In theory, this shouldn't be a problem for debugging: the inlined part
is represented as an inlined function, the part that remains out of
line (or that is further split) is represented as an out of line
function.  Alas, it's not that simple: the out of line portion should
be recognized as a part of a function, with an enclosing context taken
from the inlined portion.  There is no standardized representation
that could enable debuggers to recognize this relationship, so at the
very least there is going to be confusion as to stack frames, incoming
arguments, and available variables from split contexts.

If the partial function is output as an optimized version of the
original function (it is), a debugger might also set breakpoints at
its entry point as if they were entry points for the entire function.

We have a debug info extension proposal to enable at least the entry
point of the out of line part to not be regarded as an entry point for
the entire function, which alleviates the breakpoint setting problem,
but we may still need more annotations to allow a debugger to
represent a single virtual call frame when the inlined portion
activates the out of line one, with the entire set of enclosing
variables and whatnot.

Without that, this flag can make debugging very difficult.


--ipa-icf: #pass_ipa_icf

Perform Identical Code Folding for functions and read-only variables.

This pass identifies read-only variables with identical
representation, and functions with equivalent executable code, and
outputs only one copy of each.  This is a disaster for debugging the
discarded copies: line number and variable location information is
dropped for all but the selected function in each equivalence group.
It is even more confusing because the wrong function seems to be
called when stepping into a dropped, and unexpected breakpoint hits
may occur.

This is some room for improvement here, but it is hardly trivial.  We
should generate debug information for all copies, but we don't want to
compile them all the way to the end and then attept to unify labels
and whatnot to output location lists for each variant, and multiple
line number tables.  Unifying the functions combining and turning all
debug annotations, including source locations, into conditionals that
identify each of the unified copies could enable us to compile them
normally, and then emit a single line number table (augmented with
conditionals) and location information for each of the separate
copies.  Debug information consumers may then be able to identify the
copies using return addresses and call-graph debug information, the
same machinery used to determine entry-values of parameters.


--devirtualize: #pass_ipa_devirt

Try to convert virtual calls to direct ones.

It replaces indirect calls with direct calls, possibly enabling
folding, inlining and whatnot.  The replacement of calls in itself
does not affect debugging, but the enabled transformations might.


--devirtualize-speculatively: #pass_ipa_devirt+

Perform speculative devirtualization.

This is somewhat like --devirtualize, but the direct call is guarded
by a test that confirms the selected target of the call is the correct
one, and the indirect call remains as an alternative.  Nothing there
would affect debugging.


--ipa-cp: #pass_ipa_cp

Perform interprocedural constant propagation.

This pass collects plenty of information about opportunities for
propagating constants from callers to callees, cloning functions and
replacing parameters with the constants or other known properties.
This may make room for many other optimizations, including resolution
of indirect calls to direct ones.

Cloning and substitution do not impact significantly the debug
experience: the clones refer back to the original function as their
abstract origin, and the substituted parameters, even if eliminated
from the cloned function's ABI, are noted as bound to the constant in
the debug info for the concrete function.

One potentially confusing situation that arises out of cloning is to
set a breakpoint at a code address, and then be surprised that it is
not hit at other activations of the function that do not use the same
clone.  Since this also comes up with such traditional transformations
as inlining and loop unrolling, it probably won't be too surprising.


--ipa-bit-cp: #pass_ipa_cp+ #pass_ccp++

Perform interprocedural bitwise constant propagation.

This flag extends --ipa-cp so that it also gathers information about
which bits are known to be zero in values passed from one function to
another.  This creates additional opportunities for folding,
--tree-ccp, etc.


--ipa-vrp: #pass_ipa_cp++

Perform IPA Value Range Propagation.

This flag extends --ipa-cp so that it also gathers range information
in values passed from one function to another.  This creates
additional opportunities for folding, --tree-vrp, etc.


--inline-small-functions: #pass_ipa_inline+++

Integrate functions into their callers when code size is known not to grow.

Like --inline-functions-called-once, this flag is an enabler for
inlining, in that if it's not active, various cases of early inlining
(and splitting for --partial-inlining, see below) are suppressed.


--indirect-inlining: #pass_ipa_inline++++

Perform indirect inlining.

Like other inline flags, this flag is an enabler: if it's not active,
it stops the compiler short of attempting to resolve indirect edges
(e.g., indirect or virtual calls) to direct edges.


--inline-functions: #pass_ipa_inline+++++

Integrate functions not declared "inline" into their callers when profitable.

Like other inline flags, this flag is an enabler: if it's not active,
it stops the compiler from considering inlining functions not
explicitly declared inline.  See --inline-functions-called-once for an
analysis of the impact of inlining on debugging.


--hoist-adjacent-loads: #pass_phiopt+

Enable hoisting adjacent loads to encourage generating conditional move
instructions.

This flag modifies the ssa-phiopt pass, so as to move before a
conditional branch loads of adjacent fields of the same struct into
(different SSA names joined into) the same variable, one load in the
then block and the other in the else block.

A debug bind will likely follow each of the original loads, so the
moves won't change the ability to inspect the destination variable
after each load.  However, the early overwriting of the variable can
make its previous value unavailable sooner than expected.

The moves could leave the conditional blocks empty, especially if a
conditional move ends up being used, which could make it impossible to
set breakpoints at lines within them or to single-step into them, as
SFNs get dropped along with the removed blocks.  The moved loads
retain their location information, however, so one might be able to
stop at them even when the conditional block to be executed does not
include that line.  This can all get confusing, but I don't see ways
to improve that.


--isolate-erroneous-paths-dereference: #pass_isolate_erroneous_paths

Turn undefined behavior into traps

This pass detects dereferences of null pointers and replaces them with
trap statements.  When the deference involves a PHI node, the incoming
edge that carries the null value is redirected to a copy of the block,
and the copy gets the trap statement instead.

This affects debugging mostly in minor ways.  A chunk of code that
follows an unconditional null dereference may become unavailable for
breakpoints as the traps enables it to be completely optimized away.
When a block is copied for the case of conditional null dereferences,
references to the copied labels by name may not be resolved to the
corresponding locations in the copied blocks.  In extreme cases, in
which all incoming edges bring a null value, the original block may
end up unreachable and optimized away, potentially making the label
unavailable even while copies thereof remain.

When an indirect call is replaced with a trap, say because the callee
address is null, debugger users may be surprised for not being allowed
to step into the called function, even if they modify the pointer so
that it is not null, because the call was turned into a trap.  Such
types of debugging sessions, involving debugging-time modification of
pointers that at compile-time could be determined to evaluate to null,
may become impossible to carry out after these transformations.

This flag, as well as --isolate-erroneous-paths-attribute and
-Wnull-dereference (FIXME: huh?!?  a warning flag enabling
optimizations?!?), enable turning divide by zero into trap (unless
--non-call-exceptions is enabled), with the same logic and
consequences as the above, and addresses of local automatic variables
returned from functions into NULL, with no effects on debugging.

The flag --isolate-erroneous-paths-attribute uses the same logic and
machinery as this option, but it recognizes cases in which a null
pointer is passed to a function in an argument marked (with an
attribute) as requiring a nonnull pointer, or returned from a function
that marked as returning a nonnull pointer, and replaces the erroneous
call or return with a trap.  The effects of these transformations on
debugging are of essentially the same kind.


--tree-pre: #pass_pre

Enable SSA-PRE optimization on trees.

When an expression is computed redundantly in a block and some of its
predecessors, make it fully redundant by inserting it in other
predecessors, and then remove the redundant computation.

In theory, the insertions have no effect on debugging, but SSA
coalescing may cause them to overwrite a variable earlier than
expected, making it unavailable for inspection until the expected
assignment point.  The removals are preserved in debug binds, so as
long as the computations are not optimized out, they will be
representable, and with SFN and LVu, the binds will be available for
inspection at the expected spots.


--code-hoisting: #pass_pre+

Enable code hoisting.

When equivalent expressions are computed in multiple blocks, move them
to a dominating block, and then remove the redundant computations.

The considerations that apply to SSA-PRE also apply to this flag.


--tree-tail-merge: #pass_pre++

Enable tail merging on trees.

This option is conceptually similar to --crossjumping, but it works on
the gimple SSA representation, rather than on RTL, and, despite the
name, it only merges entire basic blocks that share a common successor
or predecessor.

Similar considerations apply: the combined blocks may refer to
different source fragments, they may have different debug annotations
that are correctly ignored when comparing blocks, but that are dropped
altogether from one of each pair of merged blocks.

I envision a possibility of preserving the annotations with the
introduction of conditionals, though, unlike the case of jump
threading, it is not immediately obvious how to identify a condition
that might be available at run time and that could be used to tell
which set of annotations to activate, so as to enable a debugger to
show one source fragment or another as active.


--store-merging: #pass_store_merging

Merge adjacent stores.

This combines multiple stores to adjacent or overlapping memory
locations in a single basic block into fewer wider stores.  This is
done in gimple, before automatic variables are assigned to specific
stack slots, so it is unlikely to combine effects in more than one
user variable: it might combine accesses into a single array or
structure, i.e., larger addressable objects committed to memory early
in compilation.

These are objects that are not tracked or affected by VTA, so debug
binds are unlikely to be affected.  However, the postponement of
merged stores may affect values visible at inspection points derived
from statement boundaries (SFN).


--thread-jumps: #pass_jump++

Perform (RTL) jump threading optimizations.

(Jump threading passes or subpasses in gimple/SSA are enabled by
--expensive-optimizations, by --tree-dominator-opts, and by
--tree-vrp)

If a block is found to have no side effects, and if its being entered
through a certain edge E1 implies it will always be left through an
edge E2, this cleanup pass redirects edge E1 to the destination of E2,
bypassing the block altogether.  This removes from the expected flow
any of the markers and bindings that were to be found in the bypassed
block.  This may be confusing not only when single-stepping a program,
for an unexpected jump over a reasonably large piece of code might
take place, but also after the bypassed block, as the skipped bindings
may not be integrated in the subsequent views.


--gcse: #pass_rtl_pre #pass_rtl_hoist #pass_rtl_cprop

Perform global common subexpression elimination.

The PRE and hoist passes on RTL introduce new pseudos to hold
redundant/hoisted expressions, new insns to compute them as needed to
make exprs fully redundant, and replaces the redundant set insns with
copies from the new pseudos.  Since the values still end up in the
REGs, debug binds referencing them are unchanged and remain valid.
Register allocation might be able to optimize away these copies, but
with SFN and LVu, it should still be possible to stop after
assignments and inspect the assigned values.  The only expected
negative effect on the debugging experience is that of early
overwriting of variables, should the new pseudos be assigned to the
same location as the dead variables whose future values they hold.

Another pass enabled by this flag is a constant/copy propagation RTL
pass.  As pseudos are replaced with constants or other pseudos, this
may simplify and remove conditional branches and get unreachable basic
blocks removed, which may then prevent breakpoints from being set at
the source code ranges corresponding to the removed blocks.  Trapping
insns may also be turned into unconditional traps, making the
subsequent code unreachable with similar consequences.  Insns may
become dead as the pseudos they set are replaced; this might cause
debug binds referencing them to be reset, if the setting expression
cannot be preserved by propagating into the debug bind or by creating
a debug temporary.  This may result in loss of debug location/value
information.

With --gcse-lm, PRE may pull loads out of loops, replacing stores with
copies to the pseudo, immediately followed by newly-inserted stores of
the pseudo.  This may impact debugging in that variables that live in
memory will not be loaded again within the loop, so if the debugger is
used to modify the value of the variable, that may fail to affect the
program.

With --ira-hoist-pressure, hoist changes the weighting of decisions on
whether or not to hoist computations to dominating blocks, but that
doesn't cause different kinds of transformations to be done, so the
kinds of effects on the debugging experience remain unchanged.


--cse-follow-jumps: #pass_cse_after_global_opts+

When running CSE, follow jumps to their targets.

This flag extends the CSE pass (see --rerun-cse-after-loop) so that
registers set in one block can be used in substitutions in subsequent
blocks that have no other predecessors than those in the path from the
setting point.  This does not change the effects CSE may have on the
debug experience, it just extends such effects across separate blocks.


--rerun-cse-after-loop: #pass_cse2

Add a common subexpression elimination pass after loop optimizations.

We run an RTL common subexpression elimination pass when optimization
is enabled; this flag adds another such pass after RTL loop
optimizations.

CSE scans blocks linearly, detecting equivalent expressions stored in
different pseudos, and replacing uses of later-set pseudos with uses
of the earlier-set equivalent ones.  This may render the later sets
trivially dead, and they are ultimately removed if so.

The register replacements per se do not affect the debug experience;
the dead insn removal might, but debug binds will have been replaced
as well, so the main issues are the potential early overwrite making a
variable unavailable for inspection, and the removal of insns at
inspection points, that are made up by SFN and LVu with debugger
support.

Register replacement might make it evident that a conditional branch
is always or never taken, turning it into an unconditional edge, and
then entire blocks might become unreachable.  This might prevent
breakpoints from being set within such blocks, but since the condition
that led to them never held, they would never be reached anyway.

CSE can also combine condition code-setting insns when one block that
performs a compare flows into another that performs the same compare,
but this has no effect on the debug experience.


--dce (ud): #pass_ud_rtl_dce

Use the RTL dead code elimination pass.

This flag is enabled by default, but the ud_dce pass described herein
is only activated when optimizing at level 2 or higher.

This pass relies on use-def chains to mark all defs of each use.
Then, it removes all unmarked insns, resetting debug binds that refer
to defs in any removed insns.  It would be possible to preserve the
defs in debug temps for use in the binds, instead of resetting them,
and then the loss of debug locations would be avoided, but as it is,
this pass causes variables to lose their bindings.


--ipa-ra: #pass_ira++++++

Use caller save register across calls if possible.

This flag gathers information about which call-clobbered registers may
actually be modified in each function, and allows the register
allocator to select registers that it would otherwise avoid, to hold
values across calls known to not modify those registers.  This has no
effect on the debugging experience.


--lra-remat: #pass_ira+++++++

Do CFG-sensitive rematerialization in LRA.

This pass recomputes the value of spilled registers, instead of
loading them back from memory.  This makes for confusing debugging
sessions, if the spilled register holds a variable that is to be
modified by the debugger while it is only available only in memory.
The expectation that the modified value would be used in subsequent
uses will not be met, and at some point after the rematerialization,
the variable will seem to magically take its original value back.

This situation is not entirely uncommon in optimized debugging,
considering that we only take note of one location for a variable at a
time, and we don't indicate whether or not that location is a
modifiable one, but it's particularly apparent and worth noting in
this case.  Tracking all potential locations is remarkably expensive,
but we might be able to mark binding statements as modifiable
locations and clear that modifiable indication if the expressions in
them are modified.  This would likely be quite useful to avoid
misleading behavior, but it might also limit severely the
possibilities of modifying variables in debug sessions one can try and
get away with.


--crossjumping: #pass_jump2+

Perform cross-jumping optimization.

This pass identifies common trailing insns in predecessors of a block,
or leading insns in successors of a block, splitting one of the blocks
so that the other can have the equivalent insns replaced with a jump.

This transformation ignores debug locations, markers and binds, as
needed for -g to not affect codegen, but this makes it unify insn
sequences that refer to different portions of the source code, and
even that affect different variables.  Users of debuggers may find
themselves wondering how they ended up at a certain point of the
program without hitting an earlier breakpoint, or just when they
expected to be elsewhere.  Markers and binds will reflect the apparent
source location, even if the code was reached from a different path
that had unrelated computations that happened to become the same
instructions; this may seem to be less confusing, unless one realizes
that the code sequence is just equivalent to that which should be
running after an unrelated path in the source program.  With that
realization, confusion can be even more thorough, as the loss of binds
and markers will make expectations about what should happen in the
dropped path are unlikely to be met.

All this said, the likelihood that completely unrelated computations
be unified by this pass is very low.  Trailing compares and jumps,
perhaps preceded by code sequences performing identical computations,
to the point of storing results in the same registers, will likely not
be dissimilar enough as to make debugging impossible, aside from the
effect of seemingly finding oneself at the wrong part of the program.
Thus, even though very confusing transformations are theoretically
possible, odds are that the transformation results may be recognizably
similar to what would be expected, and the only real surprise be the
unexpected jumps and the inability to set breakpoints.

Instead of dropping binds and markers from the range to be unified,
conditional binds and markers could be introduced and used to enable a
debugger to distinguish between the unified paths, and the side
effects expected from each path.


--peephole2: #pass_peephole2

Enable an RTL peephole pass before sched2.

The peephole passes run close to the end of compilation, looking for
sequences of insns that the backend recognizes for special treatment.
The peephole2 pass, enabled by this flag, turns a sequence of insns
into another sequence of insns, unlike peephole, that outputs
alternate assembly code for recognized sequences.

These passes run so late that debug insns have already been turned
into notes, and notes are skipped when recognizing sequences.  Unlike
peephole, however, peephole2 discards notes that appear among
recognized insns, which may ultimately discard debug location and
marker notes, whereas peephole will move them before or after the
replacement insns sequence.  Both can cause degradation of debug
information, leading to missed or incorrectly-placed bindings and
inspection points, so that unexpected values can be found when
inspecting affected variables.


--schedule-insns2: #pass_sched2 #pass_split_before_sched2

Reschedule instructions after register allocation.

This pass computes dependencies between insns, and then reorders them
so as to better use hardware units, and so as to hide latencies.

The following assessment of impact is based on the standard insn
scheduler used by GCC, and on the extended basic block scheduler, as
opposed to the selective scheduler, which is largely incompatible with
the debug insn-based technologies introduced to improve debuggability
of optimized programs.

Debug insns, be they binds or statement markers, are retained in
order, and binds carry their preceding insn as a dependency, in
addition to any other dependencies from the bound value, but otherwise
debug insns are pulled ahead of nondebug ones.  Nondebug insns,
however, are never regarded as dependent on debug ones, not even as
anti-dependencies, so a nondebug insn that modifies an input to a
debug bind resets the bind, which loses debug information.  The bound
value might still be available in alternate locations, or through
other expressions, but no attempt is made to find out alternate
representations for the binding in this pass.

Another potentially lossy situation is that of moving an insn so that
it overwrites a variable before expected, which may cause the earlier
value to no longer be available for inspection.

Without SFN support in debuggers, insn scheduling is the most common
cause of the undesirable effect of jumping back and forth when
single-stepping optimized programs.  With SFN, debuggers can advance
from one line to another according to the expected control flow, and,
with LVu, observe side effects noted in preceding debug binds, even if
insns that carry out those side effects are moved elsewhere.


--align-loops: #pass_compute_alignments+

Align the start of loops.

No effect on debugging.


--align-jumps: #pass_compute_alignments++

Align labels which are only reached by jumping.

No effect on debugging.


--align-labels: #pass_compute_alignments+++

Align all labels.

No effect on debugging.


--align-functions: #pass_compute_alignments++++

Align the start of functions.

No effect on debugging.


--reorder-functions: #varasm+

Reorder functions to improve code placement.

Decides whether to emit (or start) functions in hot or cold sections.
No effect on debugging.


-O2: optimize=2

Perform optimizations that tend to make the program run faster.

This options sets the optimization level to 2, in a mode that assigns
higher priority to making the code run faster.


--no-inline-functions: #pass_ipa_inline++++++

Although -O2 appears after -Os in the crescendo of optimization
levels, -Os and -O3 enable --inline-functions (see above) but -O2
doesn't.


--optimize-strlen: #pass_strlen

Enable string length optimizations on trees.

This patch tracks string and memory calls, as well as char stores,
keeping track of string lengths, so as to optimize out builtin calls
involving such lengths into constants or previously-computed values.
Besides strlen(str) and strchr(str, 0) to length, it can optimize
strcat to strcpy or even memcpy, and more.  The transformations may
involve removing redundant computations, possibly after inserting
simpler call sequences, or replacing calls with assignments.
Ultimately, if the return values of a call was stored in some SSA
name, the transformation will also store in it.  It is possible,
however, that in the specific case of folding strstr(s,t)[=!]=s to
strncmp(s,t,strlen(t))[=!]=0, if the result of the strstr call is
stored in a user variable used only for the compare, the
transformation will take place and invalidate the debug bind for that
variable.  There doesn't seem to be any other case in which a result
that might have been stored in a user variable could be lost in these
transformations.

The other potential surprise for debug sessions is attempting to step
into any of these calls, since different functions may be called.  For
the same reason, setting breakpoints on the functions, both the ones
that are explicitly called, and the ones that may end up called
instead, will yield surprising results.


--schedule-insns: #pass_sched

Reschedule instructions before register allocation.

See the analysis under --schedule-insns2.  While that pass runs after
mapping pseudo registers to hardware registers or stack slots, this
one runs with a virtually infinite (pseudo) register file.  Pseudo
registers are less likely than hardware ones to overlap and conflict,
so scheduling insns before register allocation resets fewer debug
binds than scheduling them after register allocation.  Furthermore,
the earlier scheduling reduces the amount of scheduling done later,
which further helps preserve debug binds.


--reorder-blocks-algorithm=stc: #pass_reorder_blocks+

Set the used basic block reordering algorithm to STC.

The STC algorithm, unlike the default simple one, may duplicate blocks
and rotate loops, but still without any significant effect on the
debug experience.


-O3: optimize=3

Perform expensive optimizations, that might even make the program
larger and slower.

This option sets the optimization level to 3.

 #pass_complete_unroll
At optimization levels 3 or higher, loop peeling and complete
unrolling (see --peel-loops) are permitted to grow code size, but this
by itself does not affect debugging.

Computation of the iteration count and other loop properties may be
simplified using the evolutions of the loop invariants in outer loops,
enabling loop transformations that might not otherwise be performed in
specific cases, but whose effects on debugging are no different from
those of other transformations that could be performed regardless.


--tree-loop-vectorize: #pass_vectorize

Enable loop vectorization on trees.

This flag is only activated when --tree-loop-optimize is activated.

This flag enables --tree-loop-if-convert (see below).  Along with
--tree-ch, it enables the ch_vect pass (see under --tree-ch above).
Along with --section-anchors, it enables the increase_alignment pass,
that increases (without any impact on debugging) the alignment of
global arrays so that loops over them can be vectorized.

This transformation, regardless of the selected cost model, combines
multiple iterations of a loop into one that uses vector operations to
perform the equivalent work of the combined iterations.  This is
extremely confusing for debugging, not just because of the significant
control flow changes, but also because debug annotations used to
counter the effects of optimizations on debugging are discarded or
disabled.  It might be possible to aggregate and unroll the debug
annotations of multiple iterations at the end of each vectorized
iteration, so as to make their effects progressively visible while
single-stepping over the markers.


--vect-cost-model=dynamic: #pass_vectorize+ #pass_slp_vectorize++

Use the dynamic cost model for vectorization.

This affects --tree-loop-vectorize and --tree-slp-vectorize decisions,
but not the kinds of transformations they make.


--ipa-cp-clone: #pass_ipa_cp+++

Perform cloning to make Interprocedural constant propagation stronger.

This flag, when disabled, stops externally-visible functions from
being versioned for propagation into them, disabling all
transformations enabled by --ipa-cp for such functions.  Enabling it
does not introduce any kind of effect that isn't potentially
observable when --ipa-cp is enabled.


--inline-functions: #pass_ipa_inline+++++++

See above, under -Os.  This is the only flag that's not in a strict
crescendo of optimization flags, in that -Os and -O3 have it enabled,
but -O2 that's otherwise between -Os and -O3 doesn't.


--tree-partial-pre: #pass_pre+++

In SSA-PRE optimization on trees, enable partial-partial redundancy elimination.

The considerations that apply to PRE also apply to this flag and its
effects on the PRE pass.


--unswitch-loops: #pass_tree_unswitch

Perform loop unswitching.

This flag is only activated when --tree-loop-optimize is activated.

This pass hoists invariant conditionals within inner loops, using loop
versioning to create two versions of the loop, one for each value of
the conditional, deciding once which version of the loop to enter.  It
may further hoist such conditionals out of outer loops, without
versioning, if the outer loops are simple enough.

One might expect the early execution of the conditional to be
confusing for interactive debugging sessions, but it is actually
transparent: the condition has to be so trivial to compute that it is
moved without the corresponding line number information, and it is
executed as if part of the loop preheader.  What's more: the original
test is not removed from either version of the loop, it is rather
replaced with a test that trivially evaluates to true or false.  Even
if that ends up optimized out, a SFN marker remains for the test in
both versions of the loop, so it will be possible to stop at the test
point and verify the condition, whatever path is taken from it.  Since
each block in the original will remain in at least one of the loop
versions, it will be possible to set breakpoints at any of the lines
of the loop after this transformation, even if some of the lines may
be duplicated.  Single-stepping will not be surprising: guards of
conditional blocks will be stopped at, and the blocks will be entered
just when expected.  As such, the impact of this transformation in the
debug experience is extremely low.


--split-loops: #pass_loop_split

Perform loop splitting.

This flag is only activated when --tree-loop-optimize is activated.

This turns a loop with conditional blocks and a controlling condition
that changes value once throughout the iteration space into two loops,
each with only one of the conditional blocks.  It uses loop versioning
to create two copies of the loop, using the controlling condition to
decide which of the versions to run.  Then, it connects the exit of
the first loop to the entry of the second, adjusts the exit condition
of the first loop to transition to the other loop at the point the
condition switches, and forces the controlling conditions in each
block to the known value, removing the unused conditional blocks in
each copy.  None of these transformations has a significant impact on
debuggability.

The only actual issue I see, that is probably of little significance,
is that the block duplicating infrastructure does not copy bind
statements for label declarations that were optimized away, so, if
such a label is bound within the conditional block that is versioned
and then discarded from the original loop, the label will seem to be
completely gone, even though a block containing it will still be
reachable in one of the loops.


--loop-unroll-and-jam: #pass_loop_jam

Perform unroll-and-jam on loops.

This flag is only activated when --tree-loop-optimize is activated.

This transformations unrolls an outer loop and jams the multiple
instances of the inner loop into a single loop.  This changes the
iteration sequence e.g. from [(0,0), (0,1), ..., (0,n), (1,0),
... (1,n), ... (m,n)] to [(0,0), (1,0), (0,1), (1,1), ... (0,n),
(1,n), (2,0), (3,0), (2,1), ... (m,n)].  This can be extremely
disruptive to debugging, as this sort of transformation, that
effectively modifies the order in which major blocks of computation
are executed, cannot be made up for with the existing infrastructure
to retain debug information across optimizations.

Considering the limited kinds of computations the may be performed in
such loops so as to enable this sort of transformation, it seems that
it might be possible to attempt to output debug information that would
enable a debugger to emulate the original loop nest, but it is not
evident that current debug information formats are sufficiently
expressive for that, nor that it would be worth the trouble.

It might be more useful to be able to somehow represent what kind of
loop transformation took place, so that users can understand what is
actually going on, rather than attempting to pretend we are still
running the original loop nest.


--tree-loop-distribution: #pass_loop_distribution

Enable loop distribution on trees.

--tree-loop-distribute-patterns: #pass_loop_distribution+

Enable loop distribution for patterns transformed into a library call.

These flags are only activated when --tree-loop-optimize is activated.

Both enable the same pass, that partitions suitable inner loops each
into two loops over the same iteration space, copying the loop and
then removing stmt that should remain in only one of the loop bodies.
The multiple iterations over different statements of a loop can be
very confusing when debugging.  Removed stmts cause debug binds that
reference them to be reset, which makes variables available in at most
one of the two iterations.


--loop-interchange: #pass_linterchange

Enable loop interchange on trees.

This flag is only activated when --tree-loop-optimize is activated.

This transformation rearranges a loop nest, attempting to swap the
induction variables for each pair of loops in a nest.  This changes
the order in which the nest's iteration space is walked, which is
confusing for debugging, and as it swaps and replaces induction
variables, it resets binds to the original ones, so the iteration
variables will not be visible within the loops after the
transformation.  This makes it very difficult to do any debugging of
such loops.


--tree-loop-if-convert: #pass_if_conversion

This pass is enabled by default when --tree-loop-vectorize is enabled,
but it is only activated when --tree-loop-optimize is also activated.

It transforms multi-block loop bodies into a single basic block,
possibly after versioning the loop, turning statements in conditional
blocks into conditional statements.  It makes debugging very hard, as
it resets all debug binds in the loop, and rearranges control flow so
that all conditional blocks become unconditionally executed.
Conditional binds and markers might alleviate this, enabling blocks
that wouldn't be executed without the optimization to be skipped
during debugging.


--predictive-commoning: #pass_predcom

Run predictive commoning optimization.

This flag is only activated when --tree-loop-optimize is activated.

This pass optimizes loops by identifying and analyzing dependence
chains and unrolling them the right number of times to reuse loads and
stored values across iterations and remove dead stores.  The removal
of dead stores may confuse debugging sessions, because inspecting
arrays will not show the temporarily-stored values, while removal of
loads may confuse sessions that modify the array expecting modified
values to be loaded and used, an expectation that may not be met if
the value was already loaded from memory.


--peel-loops: #pass_complete_unroll++

Perform loop peeling.

This amounts to copying the blocks that make up the loop body so that
they can be run linearly before entering the remaining loop.  Such
block duplication does not in itself cause any harm to the debugging
experience, but the linearization of initial iterations of the loop
can make room for other optimizations that could in turn make
debugging more difficult.


--tree-slp-vectorize: #pass_slp_vectorize

Enable basic block vectorization (SLP) on trees.

This pass detects opportunities to use vector operations, instead of
multiple operations on adjacent memory, in linear code.  Although this
pass does not reset debug binds, unlike the loop vectorizer, that
hardly matters: the combined operations most often involve memory
references, and those do not involve debug binds.  So, as they are
recombined, the timing of effects diverges from that implied by debug
markers, which makes debugging very confusing.


--split-paths: #pass_split_paths

Split paths leading to loop backedges.

This flag is only activated when --tree-loop-optimize is activated.

This pass duplicates a basic block that dominates the loop latch, if
it ends in a conditional that may exit the loop, and it is the block
that closes a simple diamond in the control flow graph.  This has no
effect on debugging, aside from the need for breakpoints in the
duplicate block covering more than one code address.


--gcse-after-reload: #pass_gcse2

Perform global common subexpression elimination after register allocation has
finished.

Although the implementation of this pass is not the same as that of
gcse PRE or hoist, and this pass's focus is exclusively on eliminating
loads, the insertion and deletion of loads uses the same logic and
thus has the same effects on debugging.  Since pseudos cannot be
introduced after reload, it has to reuse registers for loads and
copies.  This is done without regard to debug binds, but the registers
must not be live for them to be reused so, which implies they couldn't
be used in debug binds.  So, the impact of that should be limited to
early unavailability of variables that happened to be available at
such registers, or at expressions involving them.


-Ofast: optimize=3 + fast

Perform expensive optimizations, and also unsafe math transformations
that could make standard-compliant programs misbehave.

This option sets the optimization level to 3, while also enabling the
--fast-math option.


--fast-math:

This flag enables multiple options that disable various aspects of
floating-point strict correctness.  Several of them may allow
simplifications that would otherwise not take place, from folding to
removal of exception handling regions that could only catch
floating-point exceptions.  Such simplifications, though enabled by
this flag, are not of kinds that could not possibly arise in the
absence of such flags.  Its impact on the debugging experience is thus
regarded as very low.


--reciprocal-math: #pass_cse_reciprocals

Allow optimization for floating-point division which may change the
result of the operation due to rounding.

This optimization substitutes floating-point division by a SSA_NAME
with multiplication by the reciprocal.  Squared divisors are also
detected and factored.  The reciprocal of the SSA_NAME and of its
square, when needed, are inserted after the definition or before a
division.  Divisions are turned into multiplications in place, so
there is no effect on debugging.


== Highlights

Analyzed optimizations are so diverse that it is hardly possible to
summarize the various forms of impact on debug information of passes
that have any.  The good news is that the findings are probably not
surprising for anyone familiar with the internal behavior of the
passes, and of the techniques used to mask the effects of optimization
on debugging.  There are, however, a few findings that I consider
surprising, in a positive or negative way.  A number of highlighted
issues can be fixed without much effort; others require far more
elaborate work, while others yet may border the unfixable.

I was surprised, throughout the analysis, by how seamless the
introduction of VTA turned out to be, especially in gimple.  Very few
passes required additional logic to adjust debug binds: in nearly
every case, the decision was between disregarding debug binds or
adjusting them just like nondebug stmts or insns.  This was favored by
logic that detected and coped with debug uses of dead pseudos in RTL,
and that dealt with adjustments to debug binds, sometimes inserting
debug temps, when moving or removing assignments in gimple and RTL.
Reviewing all these passes, I realized there may be room for
improvement when moving SSA defs to dominating blocks: some means to
signal, or detect internally, that such a move does not require
adjustments would avoid some unnecessary forward propagation or
introduction of debug temps, which both carry a risk of loss of debug
information.  Cases in which SSA defs are removed before new,
equivalent defs are inserted at nearly the same point (e.g., replacing
a PHI node with an assign) can also be improved.

The option -Wnull-dereference enables the isolate-paths pass, that may
have codegen effects (e.g. changing returns of addresses of local
automatic variables to null), even if both --isolate-erroneous-paths-*
flags, that are supposed to enable codegen changes in this pass, are
disabled.

Another case that is not too hard to fix is the lack of adjustment of
debug binds under --auto-inc-dec.

Although -Og is supposed to avoid harming debugging, it enables
--delayed-branch, that moves insns without regard to preserving
correctness of previously-computed variable locations, and other
potential harmful effects on branches and calls.  It should probably
not be enabled at -Og.

Besides --delayed-branch, other very late optimization passes that may
corrupt variable locations are the --peephole* ones.  They run after
variable tracking, so adjusting debug binds so as to recompute
locations is not much of an option.  Adjusting notes might be
possible, at significant effort, but --peephole2 may actually drop
notes that apear between peepholed insns, and it is very hard to argue
that doing something else would be uniformly superior.  These passes
are limited to some target architectures, but their effect on affected
architectures could be very significant.

Other passes that may break variable location information are those
that move or remove memory stores.  Addressable variables are not
subject to debug binds, so such changes actually make their effects
observable at unexpected points, or not at all.  Flags --tree-dse and
--tree-sink enable such optimizations, both implied by -Og.  Flags
--tree-loop-vectorize and --tree-slp-vectorize, both enabled at -O3,
may bring about similar effects on variables in memory, but there is
hardly any expectation of retaining significant debuggability after
these.

Still, it might be worth exploring possibilities of extending VTA-like
tracking to non-scalar variables.  Besides the above, and the late
tracking of addressable variables that become non-addressable and then
scalars due to optimizations, it might help mask optimization effects
of --split-wide-types, --tree-sra, and --ipa-sra, that introduce
scalars too late to ensure debug binds are introduced at the correct
points.

Furthermore, whatever support there is to track split-out components
separately, so as to be able to describe the aggregate location
member-wise, seems to not be up to the task.  The effects of --ipa-sra
on debugging are even worse, as dismembered params end up not
represented at all.  It is not clear that there are means to express
such an apparently dropped parm as a composition of actual parms: some
extensions might be required to even start fixing --ipa-sra.

Several optimizations that reorganize the control flow graph may drop
debug markers and binds.  Gimple jump threading, for example, won't
duplicate forwarding blocks, discarding all debug stmts in them.  In
some cases, it wouldn't be hard to retain them in predecessor or
successor blocks, but in others, some way to mark such stmts as
conditional might be the only way to preserve them.  Conditional binds
can be handled with some effort in var-tracking and existing location
expressions and lists, but conditional markers would require some
extension to line number tables to enable debug information consumers
to decide e.g. whether or not a breakpoint at a line was hit when
reaching a conditional marker for that line.  This could become a very
large project, but with significant expected benefits.  Such an
extension could benefit many other passes: (RTL) --thread-jumps,
--if-conv*, --ssa-phiopt, --crossjumping/--tree-tail-merge, and even
such loop optimizations as --tree-loop-if-convert.

I was a bit surprised to find out that a number of loop optimizations
did not harm debugging.  It was expected that loop unrolling would be
harmless, but --split-loops, --unswitch-loops, and --peel-loops were
also found to not affect debug information, unlike transformations
that modify the order in which points in the iteration space are
visited, such as --loop-unroll-and-jam and --tree-loop-vectorize.

Another somewhat surprising effect of induction variable optimizations
on loops, particularly --branch-count-reg, was the risk of losing
bindings for user-defined induction variables.  Even if they can be
expressed in terms of remaining basic induction variables, if the
user-defined induction variable is no longer needed, there is no
effort to adjust debug binds accordingly.  There is room for
improvement without much effort.

Partial inlining brings a significant challenge to debug information
representation: although a function fragment can be linked back to the
original abstract function and set some variables up to take locations
and values from from the caller, expressing that the concrete
subprogram is a fragment that does not contain an entry point for the
function requires extensions.  It would take further extensions to
express how inlined subroutines combine with this fragment to form the
entire abstract subprogram, and even to support multiple splits of the
same subprogram.

Identical code folding (--ipa-icf) is another challenging case for
debugging: a single executable code sequence may be used to represent
multiple unrelated functions, each requiring a separate set of debug
annotations.  One potential way to address this is to combine debug
notes from all functions that share the same executable code, making
all the notes conditional on DWARF procedures that can determine which
of the combined functions is active, from e.g. callers or some other
means to tell them apart.  Ideally the symbolic information of each
such function could be kept separate and guarded by the same
conditionals, so that only scopes and variables of the activated
function are considered available.  This will require further
extensions to debug information.


== (*) Todo

Look for passes that lose or corrupt debug information, and that could
be improved, that still need to be brought to the highlights, for
completeness.

Check for links to other flags/passes in the text, as well as for
textual references such as above/below that may need adjusting after
reordering passes and flags to match execution order.


== ChangeLog

=== 2018-10-02 v0.9 DRAFT

Introduced section structure and section names, pass names next to
flags and a pass list as a TOC.  Added some more info on how to tell
whether a pass is run.  Highlighted the case of addressable variables
becoming scalars as benefitting from binds on non-scalars.  Added
ChangeLog.


=== 2018-09-04 v0.8 DRAFT

First published draft.


----

based on GCC 8.1.1 (gcc-8-branch@259831 68fc0ec2c57b0519bd7e1f9e013f37f112d65a3d)