Using weakrefs to avoid weakening strong references Alexandre Oliva 2007-02-08 Introduction ============ Consider a header file that defines inline functions that would like to use (or just test for a definition of) a certain symbol (function, variable, whatever), if it is defined in the final program or one of the libraries it links with, but that have alternate code paths in case the symbol is not defined, so it would like to not force the symbol to be defined. This is the case of gthr-* headers in GCC, that libstdc++ uses and exposes to users, creating a number of problems. Such a header has traditionally been impossible to implement without declaring the symbol as weak, which has the effect that any references to the symbol in the user's code will also be regarded as weak. This has two negative side effects: - if the function is defined in a static library, and the library is linked into the program, the object file containing the definition may not be linked in, because all references to it are weak, even references that should have been strong. - if the user accidentally fails to link in the library providing the referenced symbol, she won't get an error message, and the code that assumed strong references is likely to crash. Existing solutions ================== One way to avoid this problem is to move the direct reference to the symbol from the inline function into a function in a separate library, or even move the entire function there. The library references the symbol weakly, without affecting user code. This probably impacts performance negatively, and may require a new library to be linked in, which an all-inline header file (say, C++ template definitions) would rather avoid. Another way to avoid the problem it is to create a variable in a separate library, initialized with a weak reference to the symbol, and access the variable in the inline function. This still has a small impact on performance and may require a new library, but the most serious problem is that it defines a variable as part of the interface of a library, which is generally regarded as poor practice. Weakrefs ======== The idea to address the problem is to enable the compiler to distinguish references that are intended to be weak from those that are to be strong, and combine them in the same way that the linker would combine an object file with a weak undefined symbol and another object containing a symbol with the same name. The idea was to enable people to write code as if they had combined two such object files into a single translation unit. The idea of a weak alias may immediately come to mind, but this is not what we are looking for. A weak alias is a definition that is in itself weak (i.e., it yields to other definitions), that holds the same value as another definition in the same translation unit. This other definition can be strong or weak, but it must be a definition. A weak alias cannot reference an undefined symbol, weak or strong. What we need, in contrast, is some means to define an alias that doesn't, by itself, cause an external definition of the symbol to be brought in. If the symbol is referenced directly elsewhere, however, then it must be defined. This is similar to the notion of weak references in garbage collection literature, in which a strong reference stops an object from being garbage-collected, but a weak reference does not. I've decided to name this kind of alias a weakref. I could have introduce means in the compiler to create such weakrefs, and handled them entirely within the compiler, as long as it can see the entire translation unit before deciding whether to issue or not a .weak directive for the referenced symbol. However, since the notion can be useful in the assembler as well, especially for large or complex preprocessed assembly sources, I went ahead and decided to implement it in the assembler, and get the compiler to use that. This notion may also be useful for compilers that combine multiple translation units into a single assembly output file. Assembler implementation ------------------------ The following syntax was chosen for assembly code: .weakref , The semantics are as follows: - if is referenced or defined, then .weakref has no effect whatsoever on its symbol; - if is never referenced or defined other than in .weakref directives, but is, then is marked as weak undefined in the symbol table; - multiple aliases may be weakrefs to the same target, and the effect is equivalent to having a single weakref - if is redefined, it ceases to refer to , and loses the .weakref status; - uses of are implicitly turned into uses of the last definition of ; - itself is never added to the symbol table, since all uses are resolved locally. Compiler implementation ----------------------- The following syntax is to be used in C sources: static __attribute((weakref(""))); may be a function of variable declaration. It is obviously heavily based on the alias notation, and it actually uses the alias machinery underneath, so almost all of the same restrictions apply. The only one that does not is that, while the alias attribute must reference a defined symbol, weakref must reference a declared, but not necessarily defined, symbol. Both use the assembly name of the target, which might differ from the source-file representations. weakref implicitly marks (but not ) as weak. It is actually implemented in terms of a no-argument weakref attribute, that still implies weak, and an alias attribute. Therefore, the above is equivalent to: static __attribute((weakref,alias(""))); which would still be equivalent if one added the weak attribute: static __attribute((weak,weakref,alias(""))); If no alias attribute is associated with a weakref declaration, the effects of the weakref attribute are limited to the effect of the weak attribute. The compiler should map this to .weakref in the assembler if the assembler supports it. Failing assembly support, the weakref is correctly rejected, but we could arrange for the compiler to handle it internally. Conclusion ========== This new feature will enable a long-standing libstdc++ bug to be fixed. Some of its headers that are meant to be included by user code include gthr headers that were originally meant to be internal to libgcc. They contain numerous #pragma weak directives for thread library functions, as well as inline functions that reference them. Several of these inline functions are called from within template definitions, so refraining from including the header is not an option. With this new feature, it will be possible to rework the header so as to not reference the thread library symbols that the user might call on its own, but rather weakrefs to them, such that the symbols won't be marked as weak if there are user references to them, but they will if only the inline functions that use the weakrefs (indirectly) reference them. As long as this is implemented within the compiler, such that no assembly support is required, we can switch to this new feature on all platforms. Otherwise, this will leave platforms/assemblers that don't support this new feature the option to introduce such support, retain the problems caused by the weak pragmas or take the performance hit to fix it. Copyright 2005, 2007 Red Hat, Inc. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. http://creativecommons.org/licenses/by-sa/3.0/ ChangeLog ========= 2011-05-17 Alexandre Oliva * Relicensed from OPL1.0 with further restrictions to CC BY-SA. 2007-02-17 Alexandre Oliva * Added license 2007-02-08 Alexandre Oliva * weakrefs are static, not extern, as expected for visibility local to the translation unit. GCC has been like this for a while. Fix a few typos. Add copyright notice. 2005-10-10 Alexandre Oliva * Initial revision.