Motivation
Global variables are powerful but have the risk of being altered carelessly. Under most cases, we can add static
modifier on this global variables such that these variables can only be altered in the file. However, there do have some situations that we have to use global variables across different files. In this case, we usually encounter an error happened in linking phase, i.e., error:multiple definition
. Some of these errors are obvious and easily to debug while others can be really puzzling. Here I will give you an example that I encountered.
Suppose we have two source files, and the content is
1 |
|
1 | // aux.c |
then we run the command
1 | gcc -o test main.c test.c |
and everything goes smoothly. That is to say, the code presented is correct, variable gvar
is not multiple definition
. Note that there is no extern
modifier for gvar
and the result of this program is 5, which means that the variable gvar
is shared across two files. This is quite strange since we know that if two global variable have the same name in a project, it will incur multiple definition
errors.
What is more, I found something more interesting. When we change the postfix of these two files,
1 | mv main.c main.cpp |
the compiler told me that there is a multiple definition
error for variable gvar
. How does this happen, the content in the files are not changed and we only changed the file name. An intuitive explanation is that the different of c++
and c
cause the puzzling bug since .cpp
is the file type for c++
and .c
is for c
.
Strong and weak symbols
Actually, these strange phenomenons are all caused by one features provided by GCC, called strong and weak symbols. For global variables, it was divided into three types.
- initialized to a non-zero value
- initialized to zero
- not initialized, just defined
In GCC, the first two types of global variables is called strong symbols that are store in .DATA
and .BSS
section. As for the third type, it is called weak symbols, and it is saved in .COMMON
section.
There are three rules that must be followed for these variables
- only one strong symbol is allowed with the same name
- if there exists one strong symbol and several weak symbols, the weak symbols are overrode by strong symbols
- if there exists several weak symbols, GCC will choose one that have the largest size (memory occupation).
Now we can clarify why the c version program can run without any errors. In aux.c
, we define a strong symbol gvar
and it is initialized to 5. In main.c
, we only define the variable gvar
, and it is a weak symbol. When we compile the program using GCC, the gvar
in main.c
is overrode by gvar
in aux.c
according to the second rule. Therefore, the program runs smoothly and the result is 5. If we change the main.c
as follows, it will incur multiple definition
also.
1 |
|
Wait, there is still one puzzling problem left. Why the program incurs multiple definition
error when the file name is changed ?
Actually, when we change the file type from .c
to .cpp
, the GCC compiler will use the rules for c++
problem to compile this c program. Therefore, to answer this question, we need to investigate the difference when GCC handle the strong/weak symbol between .cpp
and .c
.
Here is my conclusion. For c program, if you define an global variable and not initialize it, GCC will regard it as weak symbol. However, for c++
program, the default type is strong variable. That is to say, for line int gvar;
in main.cpp
, it is a strong symbol. Since we have another strong symbol with the same name in aux.cpp
, the compiler gives the error.
If you want to use weak symbol in a c++
program, you need to explicitly declare the variable is weak. For example, if we write a c++
program like this,
1 |
|
1 | // aux.cpp |
the program will have the same behavior like the c version.
To avoid the bugs like that, we can use the -fno-common
option provided by GCC, it will regard all variables as strong symbols. However, in some cases, we have to use weak symbols (see next section). Therefore, we should develop a good coding habit. There are three rules we can follow,
- eliminate all global variables (hard)
- add
static
modifier for global variables, provide interfaces for accesses (medium) - initialize all global variables, such as zero (easy)
Function of s. w. symbols
It seems that we should use strong symbols instead of weak symbols in programming, so why does GCC provide weak symbols? As far as I known, weak symbols are useful for library functions. For example, if the symbols in library are weak symbols, users can easily override some library functions for personal objectives. What’s more, programmers can declare some weak symbols of library functions. If the program is linked with the library, program can provide more powerful features, Otherwise, the program can still run without any errors. Here is a simple example.
1 |
|
If the program is not linked with pthread
library, it will run in single-thread mode. Otherwise, it can run in multi-thread mode.
Manage your global variables
If you have to use global variables, here is an way to manage your global variables in an comfortable way. Create two files called global_var.h
and global_var.c
. Declare all global variables using extern
modifier in global_var.h
. Initialize all global variables in global_var.c
. For instance,
1 | // global_var.h |
1 | // global_var.c |
When you need to use global variables in other files, such as main.c
, simple include global_var.h
and you will be able to access all global variables.
1 | // main.c |
Through this way, you can easily manage your global variables. However, be sure to use global variables as less as possible.
Reference
http://www.bitscn.com/CL/741921.html
http://blog.csdn.net/astrotycoon/article/details/8008629
http://blog.csdn.net/hu_jiacheng/article/details/8800540