Back to Blog.
A guided tour of this website's custom site generation system.
Maria Nicolae,
As mentioned in my last post, this website uses a custom site generation system, that I wrote in Python. In this post, I present this build system, taking you through a "guided tour". The full source code of the build system, together with the content sources at the time of writing, can be downloaded here.
First of all, to understand this build system, it will be helpful to understand what problems it is trying to solve. After all, I could have just handwritten all of the HTML for my website. First of all, each page on this site has some obvious boilerplate: the header, which includes a navigation bar, and the footer, which contains a "contact me" message. This boilerplate is not only subject to change, such as adding a new page to the navigation bar, but also differs between pages, such as how the navigation item for the current page is not a hyperlink, unlike those of the other pages. Thus, an important job of the site generator is to format this boilerplate and insert the page content into it. Additionally, some pages, such as the Blog homepage, have content that needs to be generated programmatically.
In the root directory of my website project, there is a Makefile
, which is the entry point of the generator, as well as some directories:
build
, the directory in which the built files are placed, whose structure mirrors that of the website root directory,data
, for data which is needed to generate certain page contents,scripts
, for Python scripts containing site generation logic, andsrc
, which contains content sources that I author.The Makefile itself does not contain any site generation logic. Rather, I made my own make
-style build system in Python, built the site generator on top of that, and it is that system that gets invoked by the Makefile for building and cleaning:
build:
python scripts/build.py build
clean:
python scripts/build.py clean
It might seem redundant to do this rather than just use make
directly. However, what this allows me to do is to generate rules programmatically using arbitrary logic, rather than having to declare them all statically. This is important, of course, for things like the blog and the YouTube Archive.
In addition to these build
and clean
rules, the Makefile contains rules for copying the contents of build/
to the root directory of a web server. This includes the root of a local web server, which I use to preview edits while drafting, and the root of the VPS that the website runs on:
deploy_local: build
rsync -a --exclude=.gitkeep build/ /var/www/localhost/html/
deploy_production: build
rsync -a --exclude=.gitkeep --delete \
build/ root@marianicolae.com:/srv/http/root/
To be clear on what's going on here, all site generation is done on my local machine, and the VPS that hosts my website merely receives the generated files via rsync
. This is because I always preview the changes on my local machine before going live, and so I need to run the site generator on that machine no matter what; I might as well not make my VPS (which is lower in performance than my PC) duplicate that work.
make
-Style Build System in PythonThe basic conceptual model of make
, which I replicated in my build system, is that of rules for creating files (targets), with these rules being linked together via dependencies, forming a directed acyclic graph structure. In make
, a rule consists of a sequence of shell commands (the recipe). A template for a rule in a Makefile is:
.PHONY: target
target: dep1 dep2
cmd1 arg11 arg12
cmd2 arg21 arg22
In my Python build system, implemented in scripts/build_system.py
, an analogous declaration of a rule would be
import build_system
b = build_system.Build()
r = b.add_rule('target', 'dep1', 'dep2', phony=True)
r.add_command('cmd1 arg11 arg12')
r.add_command('cmd2 arg12 arg22')
Here, a Build
object is analogous to a Makefile, contains a system of declared rules, with each rule being a Rule
object containing the target, dependencies, and recipe. Other methods not shown in this example are:
Build.add_deps(target, *new_deps)
for dynamically adding dependencies to existing targets,Build.make(target)
to execute a rule, including incremental build logic using file timestamps, like in make
,Rule.add_command_args(*args)
to add commands to recipes as individual arguments rather than an entire command string, andRule.add_function(func, *args, **kwargs)
to add execution of a Python function, func(*args, **kwargs)
, as a recipe step.The latter of these methods allows Python scripting to be integrated into the build system more tightly than what is possible in make
, where such scripts have to be invoked in the command line.
In practice, there is a lot of repetition in the build rules for this website. At minimum, each webpage needs to be added as a dependency to the top-level phony target, and in practice, there is also boilerplate HTML that needs to be added to every page. To address this, I wrote wrappers around the build system to automatically handle this.
First, there is scripts/site_build.py
, which handles the top-level phony targets, build
and clean
. First of all, these targets are initialised by a function that operates on the Build
object:
build_dir = 'build'
def add_build_clean_rules(build):
build.add_rule('build', phony=True)
rule = build.add_rule('clean', phony=True)
rule.add_command_args('rm', '-rf', build_dir)
rule.add_command_args('mkdir', build_dir)
rule.add_command_args('touch', f'{build_dir}/.gitkeep')
Additionally, this script contains a wrapper function for adding rules for webpages, which adds the webpage as a dependency of the build
targets, and abstracts away the relationship between page URL paths and build file paths:
pages = dict()
def page_rule(build, page_title, page_path, *deps):
fpath = page_fpath(page_path)
rule = build.add_rule(fpath, *deps)
build.add_deps('build', fpath)
global pages
pages[page_title] = page_path
return rule
Note that this script also stores a dictionary of page titles and page URL paths; this will be important for the next layer of wrapper. The page_fpath
function assigns a build file path to each page URL path:
def page_fpath(page_path):
if page_path[-1] == '/':
return f'{build_dir}{page_path}index.html'
else:
return f'{build_dir}{page_path}'
The next layer of wrapper is scripts/boilerplate.py
, which handles the "boilerplate" HTML (header and footer) of a page. Its API is:
add_navigation(*page_titles)
, which is used to specify which pages are listed in the navigation bar,simple_rule(build, page_title, page_path, page_src, years='')
, which adds a complete rule for a page consisting of HTML (from the file path page_src
) wrapped in boilerplate, andgenerated_rule(build, page_title, page_path, *deps, years='')
, which adds a rule for a page whose "content" HTML (that which is inserted between the boilerplate) is generated by a Python function.generated_rule
returns an object with an add_function
method that wraps the Rule.add_function
method. This wrapper method takes the HTML generated by the given function and wraps it in the boilerplate HTML.
The years
argument is the string for the year range in the copyright notice in the footer. I specify these year ranges manually, rather than simply pulling this information from the file metadata, because this metadata is often fragile; certain operations for copying, moving, and archiving files tend to overwrite it, and I need these dates to be accurate even when restoring backups, migrating servers, etc.
The HTML for a boilerplated webpage is generated using Python string formatting, inserting the title, navigation bar, year range, and content HTML into a template. This format-string template is stored in src/boilerplate.htmlfmt
, which is automatically included as a dependency in the rules generated by simple_rule
and generated_rule
. Both of these functions use the same internal function to perform the formatting:
_boilerplate_fpath = 'src/boilerplate.htmlfmt'
_boilerplate = None
def _format_boilerplate(title, body, years):
global _boilerplate
if _boilerplate is None:
with open(_boilerplate_fpath, 'r') as f:
_boilerplate = f.read()
navigation = _generate_navigation(title)
output = _boilerplate.format(title=title, navigation=navigation,
body=body, years=years)
return output
Another internal function, _generate_navigation
, is called here to generate the HTML for the navigation bar, which differs for each page. For the sake of brevity, I will not print the code for this function here, but it is what uses the site_build.pages
dictionary of page names and URLs.
The top-level build script scripts/build.py
, abridged to show only what we have discussed up to this point, is:
import build_system
import boilerplate
import site_build
...
import sys
b = build_system.Build()
site_build.add_build_clean_rules(b)
# build css stylesheet by copying
r = b.add_rule('build/style.css', 'src/style.css')
r.add_command_args('cp', r.deps[0], r.target)
b.add_deps('build', r.target)
boilerplate.simple_rule(b, 'Home', '/', 'src/home.html', years='2025')
boilerplate.simple_rule(b, 'Contact', '/contact/',
'src/contact.html', years='2025')
boilerplate.simple_rule(b, 'Honours Thesis', '/honours/',
'src/honours.html', years='2024-2025')
...
boilerplate.add_navigation('Home', 'Contact', ...)
b.make(sys.argv[1])
In this section, I go over the build process for those pages whose content is programmatically generated, namely the blog and the YouTube archive. The latter is simpler, so we start there.
The build rules for the YouTube archive are declared in scripts/build.py
by the code
import yt_archive
...
r = boilerplate.generated_rule(b, 'YouTube Archive', '/youtube/',
'src/yt_archive/page.htmlfmt', 'src/yt_archive/video.htmlfmt',
'src/yt_archive/description.htmlfmt', 'data/video_info.json',
years='2015-2025')
r.add_function(yt_archive.func, *r.deps)
The code that generates this page is in its own script, scripts/yt_archive.py
. To do so, it takes video metadata stored in data/video_info.json
and inserts it into the format-string templates in the *.htmlfmt
files. src/yt_archive/video.htmlfmt
specifies the format of each entry in the list of videos, with src/yt_archive/description.htmlfmt
specifying the collapsible display of the video description. The full list of videos is then inserted into src/yt_archive/page.htmlfmt
, which includes the opening text of the page.
The generation process for the blog consists of two parts. First, there's the blog homepage, which is a list of items much like the YouTube Archive page, and second, there are the pages for each individual post. In scripts/build.py
, the rules for the blog are declared as
import blog
...
blog.add_blog_rules(b, 'src/blog/blog.htmlfmt',
'src/blog/post_listing.htmlfmt', 'src/blog/post.htmlfmt',
'data/blog_posts.json', years='2025')
Here, data/blog_posts.json
contains metadata about the blog posts, like their titles, descriptions, and creation dates, as well as the paths to their HTML source files. The blog homepage is generated by formatting src/blog/post_listing.htmlfmt
for each blog post, concatenating these into the full list of blog posts, and inserting this list into src/blog/blog.htmlfmt
.
The function blog.add_blog_rules
wraps boilerplate.generated_rule
, and dynamically inserts a build rule for each blog post, in addition to the rule for the blog homepage itself. The blog post rules use the file src/blog/post.htmlfmt
, whose content is
<p>Back to <a href='/blog/'>Blog</a>.</p>
<hgroup>
<h1>{title}</h1>
<p>By Maria Nicolae, {date}.</p>
</hgroup>
{post}
This creates the blog post boilerplate consisting of a link back to the blog homepage and a heading containing the title, author, and date. Below this, the source HTML of the post itself is inserted, replacing {post}
in the format string.
The RSS feed for this website consists of two types of feed items: blog posts, and other one-off items. In scripts/build.py
, the RSS feed rules are declared by
import rss_feed
...
rss_feed.add_rss_rule(b, 'build/rss.xml', 'src/rss/feed.xmlfmt',
'src/rss/item.xmlfmt', 'src/rss/items.json', 'data/blog_posts.json')
Once again, the bulk of the logic is in its own script, scripts/rss_feed.py
. It uses, as sources for the feed items, data/blog_posts.json
for blog posts, and src/rss/items.json
for all other feed items. Each item is formatted using src/rss/item.xmlfmt
, and the concatenation of these items is then inserted into the feed format src/rss/feed.xmlfmt
. The logic for generating the RSS feed includes cutoffs for both the maximum number of items (25) and the maximum length of the RSS feed items (100 kB).
This site generation system that I have built achieves the two main goals I set out to achieve with it. The repetition and boilerplate inherent to this website, like in most website designs, has been automated. Additionally, the system gives me the control and flexibility to implement dynamic or programmatic content, such as the list of posts in the blog homepage, and extra boilerplate for specific pages, like blog posts themselves.