Forest of UNIX

Templating using AWK

Recently I was scrounging around on the internet and whilst reading through some posts on Hacker News.

pp.awk

I found a simple shell preprocessor called pp.awk written in just 26 line's of AWK. The syntax of the preprocessor is nice and simple. Here is a little example snippet:

foo.pp:

hello world
#!
date
#!

Anything in between the #! and the !# will be evaluated as a shell command. Anything outside of that will be printed regularly Here is a snippet from the Usage section of the README:

pp.awk ./foo.pp
echo 'hello world'
date

pp.awk ./foo.pp | sh
hello world
Fri 28 May 2021 01:44:27 PM CEST

As we can see the preprocessor creates a shell script that prints any line that is outside of the preprocessing block.

Limitations

After seeing how this worked two limitations came to mind:

  1. I cant run the script standalone.
  2. This only supports shells.

Limitation 1. is an issue cause I do not want to have to manually specify the file I want to process. I just want to execute the .pp file and have it process itself. I do not want to have to pipe it to the shell that I want to use. As which shell or interpreter I want to use it should be defined in the script itself.

Limitation 2. is an issue cause what if I want to use Python, Perl, Ruby, Common Lisp, Javascript, Clojure, or AWK itself as the interpreter? I could change the script to use some other kind of print statement but this is not really a good solution. As I would have to make a script per language.

tmplt.awk

Seeing these limitations inspired me to write an improved version. As recently I have been playing around with AWK.

Limitation 1

To solve the first limitation I decided to use a little bash trick that I learned from Roswell scripts. You see in Common Lisp you start comment blocks with #|. Since in shell scripting the # is treated as a comment you could write the following:

#!/bin/sh
#|-*- mode:lisp -*-|#
#| <- This is a Common Lisp comment block
exec ros -Q -- $0 "$@"
|#

(defun main (&rest argv)
  (declare (ignorable argv)))

When you execute this script the shebang will use sh to run the file as a bash script till it gets to the exec ros line. This line will call the script itself using the Roswell Common Lisp implementation manager. The reason we use exec to call the program is cause exec will replace its own process with a specified command. This means that when the Common Lisp program quits it wont return and execute the rest of the file as a shell script. Also when the file is executed as a Common Lisp program it will not error on the exec line since that is in a comment block.

This gives us more flexibility than a shebang would. As a shebang on most linux systems does not support multiple arguments. You could use #!/usr/bin/env -S <program name> <arguments> to split the arguments in the shebang and this would work and be portable. But this does not have the power and flexibility of being able to execute a file as shell script and then also being able to normally execute. If i wanted to I could pipe the output of the script to another program. This solution is great for writing single file setup scripts.

In order to get this functionality I added support for comment blocks. A comment block can be opened using #? and closed using ?#.

About comments

As of now I have only added support for a single type of comment block. In the case you specifically need to print a #? at the start of the line, I have this ugly solution:

#!
echo "?#"
!#

Limitation 2

In order to get over the second limitation I added support for specifying the interpreter after you open a template block.

#|:<interpreter of choice>
<code>
|#

I also changed the template blocks to use either #| or #!.

Result

After implementing these improvements I created a repository for the script called tmplt.awk. I actually use the script for the templating of my CGI scripts. Here is the code for how I implement my static.sh page:

#!/usr/bin/env bash
#?
exec cgi/bin/tmplt.awk "$0"
?#
Content-type: text/html


<head>
<link rel="stylesheet" type="text/css" href="/css/default.css">
<link rel="stylesheet" type="text/css" href="/css/cgi.css">
<title> Backgrounds </title>
</head>
<body>
<div class="page">
<div class="content">
<div class="content-body">
<h1> Overview of static content </h1>
<h2> Tree view </h2>
<pre>
<code>
#!:bash
tree static/
!#
</code>
</pre>

<h2> List view </h2>
<ol>
#!:bash
while IFS='' read -r -d '' filename
do
    echo "<li>"
    link="/$filename"
    echo "<a href=\"$link\">${link}</a>"
    echo "</li>"
done < <(find static/ -type f -print0)
!#
</ol>

<div class="cgi-exit">
  <a href="/">
    Exit
    </a>
</div>

</div>
</div>
</div>
</body>

The script

If you are curious here is the code of tmplt.awk, it is just a little over a 100 lines of AWK:

#!/usr/bin/env -S awk -f

# This awk script contains the source for a simple preprocessor.
# Comment blocks are started with #? and end in ?#.
# Template blocks are started with either a #! or #| and end in |# or !#.
# Everything outside of a comment block or template block is printed regularly.
# Everything inside of a template block is evaluated by default using sh.
# You can specify the interpreter/program to run on the template block,
# in the following way: #!:<interpreter>
# Examples:
# #!:python
# #!:perl
# #!:fish


# Function to extract the | or ! indicator
function indicator(t_str)
{
  start = match(t_str, /[|!]/)

  return substr(t_str, start, 1)
}

# Function to extract the interpreter argument
function interpreter(t_str)
{
  start = match(t_str, /^#[|!]:[a-zA-Z0-9]+/)
  if(start){
    str = substr(t_str, RSTART + 3, RLENGTH)
  }else{
    # Set default interpreter to use
    str = "sh"
  }

  return str
}

BEGIN {
  # Set temporary dir
  "mktemp --directory '/tmp/tmplt.awk-XXXXXX'" | getline tmp_dir
  tmp_file = tmp_dir "/template.tmp"
}

END {
  # Remove the temporary directory when the script is done
  system("rm  -r " tmp_dir "/")
}

# Rule for detecting begin of a comment block
!tmplt_mode && /^#\?/ {
  cmnt_mode = 1
}

# Rule for detecting end of a comment block
cmnt_mode && /^\?#/ {
  cmnt_mode = 0
  next
}

# As long as we are in comment mode skip the line
cmnt_mode {
  next
}

# Rule for detecting begin of template
/^#[|!]/ {
  # If we are on the first line we do not want the shebang
  # To have us enter template mode
  if(NR != 1){
    tmplt_mode = 1
    tmplt_indicator = indicator($0)
    tmplt_interpreter = interpreter($0)
  }

  next
}

# Rule for printing when we are not in the template mode
! tmplt_mode {
  print $0
}

# Rule for detecting end of template
tmplt_mode && /^[|!]#/ {
  if(tmplt_indicator == indicator($0)){
    tmplt_mode = 0
    tmplt_indicator = 0

    print verbatim > tmp_file
    verbatim = ""

    system(tmplt_interpreter " " tmp_file)

    # Awk redirection only clears the file contents on first open
    # We must explicitly close it for it to be cleared again
    close(tmp_file)
  }
}

# Rule for evaluating contents of template
tmplt_mode {
  verbatim = verbatim $0 "\n"
}

The script works by writing everything in a template block to a file in a temporary directory and then executing the interpreter of choice on it. A possible nice feature to have in the future would be to allow a placeholder for where you insert the filename. As of now the last argument is always the file (just as in a shebang).