Unique header guard generation using AWK
Something that often bothers me in C and C++ is having duplicate header guards.
Often for smaller projects I use the name of the file in all caps with a _HPP
appended (_H
for C programs), for my header guards.
For example parser.hpp
would have the following header guard:
#ifndef PARSER_HPP #define PARSER_HPP // Omitted code #endif // PARSER_HPP
In order to not have to type this everytime I create a new header file. I use autoinsert.el in combination with yasnippet to have Emacs type it for me.
The issue
This is nice and dandy but I have had cases where a library I included would have the same header guard as another header file in the project. This causes me to have to painstakingly figure out why a function in some source file is considered undeclared by the compiler.
Preferably you would want to have your header guards, be generated and be unique. Recently whilst working on a project the codebase started to grow to a size where the usual formula for small projects was not scaling well. The project had 79 headers, that had to all be changed to something longer and unique.
I decided to automate this task by using AWK, to change the header guards.
I had decided on the following format for my header guards <project name>_<path>_HPP
.
Since the project name is usually unique and the path to a file is usually unique to a project I decided this format was good fit.
// (header guard if I had a project called apples with a directory called foo and a header file called bar.hpp) #define APPLES_FOO_BAR_HPP
Why not just use pragma once?
Now you might see this issue and think to yourself, why not just use #pragma once
?
I really like #pragma once
but sadly it is not standardized, meaning not all compilers are guaranteed to support it.
For this project I want the source code to be able to compile on as many platforms as possible.
Furthermore there are some cases where #pragma once
can get confused (What are include guards and pragma once?).
Making my header guards unique by incorporating the path in header guard is not a lot different from what #pragma once
does behind the scenes.
In this particular case prefer using header guards cause it also instantly makes obvious in what project, directory and file I am currently in.
This makes including files relative to the one I am working in a little easier.
I can also change the header guard of two header to not conflict in cases where #pragma once
would detect a file as being the same.
The idea is also to generate these header guards once, and to not have to regenerate them unless this is necessary (this is a pipe dream, I am aware).
The AWK script
So I conjured up the following AWK script to uniquely name my header files.
#!/usr/bin/env -S gawk -i inplace -f # Generate header guard function gen_hg() { guard = toupper(FILENAME) gsub(/(\/)|(\.)/, "_", guard) gsub("SRC", "PROJECT", guard) return guard } # Replace header guard definition with unique header guard NR <= 2 && /^#(define|ifndef).+_HPP$/ { print $1, gen_hg() next } # Replace endif with generated header guard comment NR == FNR && /^#endif.+_HPP$/ { print $1, "//", gen_hg() next } # Print the other lines { print $0 }
Shebang
#!/usr/bin/env -S gawk -i inplace -f
This shebang uses env -S
to split up the arguments of the shebang.
This allows us to specify more than one flag/option on the shebang.
Without this everything after the binary will be treated as one argument (atleast on Linux).
We need this functionality cause we want to use the GNU AWK inplace extension to replace the contents of a header file. As AWK's default behavior is to print to standard out.
The -f
flag on the shebang tells AWK to derive its input from files.
As AWK's default behavior is to read from standard input.
Now we can pass files directly to our script.
./header_guard.awk file1.hpp file2.hpp file3.hpp
The function
# Generate header guard function gen_hg() { guard = toupper(FILENAME) gsub(/(\/)|(\.)/, "_", guard) gsub("SRC", "PROJECT", guard) return guard }
The first thing our function does is convert the FILENAME
variable to upper case.
The contents of the FILENAME
variable is the path to the file it is currently processing.
We then use the gsub
function to substitute the directory delimiters (/
) (we do not support Windows) and file suffix (.
) with _
.
Lastly we use gsub
again to substitute the source directory with the name of our project.
Rules
# Replace header guard definition with unique header guard NR <= 2 && /^#(define|ifndef).+_HPP$/ { print $1, gen_hg() next }
The special NR
variable is set to the current record number (this is usually the line number).
In this case we check if we are on a line number lower than or equal to two.
As we only want to replace header guards not macro definitions present in a header.
Then we check if the line contains a #define
or an #ifndef
that has a macro that ends in _HPP
.
If this is also the case then we print the first column using $1
.
Which will be either #define
or #ifndef
.
Then we generate and print our new header guard and and use next
to skip the rest of the rules and and start processing the next line.
# Replace endif with generated header guard comment NR == FNR && /^#endif.+_HPP$/ { print $1, "//", gen_hg() next }
This rule is meant to update the #endif
at the end of the file.
Its very similar to the previous rule expect we compare NR
to FNR
(this contains the total number of records) to see if we are on the last line or not.
# Print the other lines { print $0 }
This rule is reached only if the previous two rules did not trigger. If that is the case than we are dealing with a regular line we can just print normally.
Usage
After you mark the script as executable you can use it like this.
./header_guard.awk src/lexer/lexer.hpp src/parser/parser.hpp
To change all files recursively under a directory you could use the find
command.
# Run this from your projects root directory find src/ -iname "*.hpp" -exec ./tools/header_guard.awk {} \;