今天工作中遇到了一个很诡异的问题:

项目中, a.c和b.c编译出的obj先打包成静态库c.a, 再和另外一个d.a以及main.c编译出的obj链接成最后的bin文件. a.c中有private_init()和private_read()两个函数, 其中private_init()是个空函数, private_read()则在d.a中被调用.

出现的现象是这样: 如果我main.c中不调用那两个函数, 最后就会链接报错 undefined reference to ‘private_read’, 但如果我在main.c中调用其中一个, 即使只是调用空函数private_init(), 链接也会正常完成. 更诡异得是, Makefile中的链接参数是”-lc -ld”, 但如果改成”-ld -lc”, 现象就消失了.

想了半天没想明白, 最后翻了下_Linkers & Loaders_ by John R. Levine, 终于找到了答案:

Searching libraries

After a library is created, the linker has to be able to search it. Library search generally happens during the first linker pass, after all of the individual input files have been read. If the library or libraries have symbol directories, the linker reads in the directory, and checks each symbol in turn against the linker’s symbol table. If the symbol is used but undefined, the linker includes that symbol’s file from the library. It’s not enough to mark the file for later loading; the linker has to process the symbols in the segments in the library file just like those in an explicitly linked file. The segments go in the segment table, and the symbols, both defined and undefined are entered into the global symbol table. It’s quite common for one library routine to refer to symbols in another library routine, for example, a higher level I/O routine like printf might refer to a lower level putc or write routine.

Library symbol resolution is an interative process. After the linker has made a pass over the symbols in the directory, if it included any files from the library during that pass, it should make another pass to resolve any symbols required by the included files, until it makes a complete pass over the directory and finds nothing else to include. Not all linkers do this; many just make a single sequential pass over the directory and miss any backwards dependencies from a file to another file earlier in the library. Tools like tsort and lorder can minimize the difficulty due to single-pass linkers, but it’s not uncommon for programmers to explcitly list the same library several times on the linker command line to force multiple passes and resolve all the symbols.

Unix linkers and many Windows linkers take an intermixed list of object files and libraries on the command line or in a control file, and process each in order, so that the programmer can control the order in which objects are loaded and libraries are searched. Although in principle this offers a great deal of flexibility and the ability to interpose private versions of library routines by listing the private versions before the library versions, in practice the ordered search provides little extra utility. Programmers invariably list all of their object files, then any application-specific libraries, then system libraries for math functions, network facilities and the like, and finally the standard system libraries.

When programmers use multiple libraries, it’s often necessary to list libraries more than once when there are circular dependencies among libraries. That is, if a routine in library A depends on a routine in library B, but another routine in library B depends on a routine in library A, neither searching A followed by B or B followed by A will find all of the required routines. The problem becomes even worse when the dependencies involve three or more libraries. Telling the linker to search A B A or B A B, or sometimes even A B C D A B C D is inelegant but solves the problem. Since there are rarely any duplicated symbols among the libraries, if the linker simply searched them all as a group as IBM’s mainframe linkers and AIX linker do, programmers would be well served.

简单说就是: 传统的Unix编译环境下, 静态库的加载是顺序搜索一遍, 遇到没链接的函数就记下, 如果这个函数在后面的库或obj中出现了, 链接器就会把这个函数所在的obj链接过去, 不包含未链接函数的obj就会被忽略过去(静态库也只是obj的简单打包而已).

于是, “-lc”的时候, 因为那两个函数在main.c中都没有被调用, 链接器就把a.o略过去了, “-ld”的时候虽然有了a.o中函数的调用, 但链接器已经把它给略过去, 找不到了.

所以如果库A依赖库B, 链接的顺序就应该写为A B, 如果相互依赖就应该为A B A或者B A B的顺序.

PS: 我查了一下, 大多现代的编译链都不会有这个问题, 但是传统, 标准以及我手里的这个编译链却都是这样的, 为了通用, 以后还是注意点吧.