epubsynth
A command-line program for generating EPUB documents.
In a Nutshell
epubsynth generates EPUBs from source files as well as
using provided metadata:
epubsynth \
--output book.epub \
--spine titlepage.xhtml \
chapter1.xhtml \
chapter2.xhtml \
chapter3.xhtml \
--stylesheets style.css \
--resources fig1.png \
fig2.jpg \
--dc-title "A Book" \
--dc-creator "Ann Author" \
--dc-contributor "Ann Other Author"
Furthermore, epubsynth handles templating so that
boilerplate does not need to be included in XHTML source files; their
contents are inserted into the <body> tag.
This program is primarily intended to be used inside a shell script,
or as part of a build system like make, rather than
directly in an interactive shell. Accordingly, the command-line syntax
is quite verbose, and includes metadata that is meant to be the same
between invocations.
Installation
On Arch Linux, use the Arch User Repository
(AUR) package maintained by myself to install
epubsynth. Otherwise, use make install and
make uninstall to install and uninstall respectively.
Usage
The command-line interface for epubsynth consists of no
positional arguments, only options, out of which --output
and --spine are required:
epubsynth --output OUTPUT --spine NAME... [OPTION...]
Note that, while epubsynth aims for correctness in any
boilerplate it generates, it does not validate input
source files for correctness to their respective formats (e.g. XHTML),
or check for broken links in these source files. For this reason, it is
recommended to use a tool like epubcheck to
validate the generated EPUB file.
Output and Source Files
The file path of the generated EPUB is specified using the
--output option (alias -o). An EPUB file is a
container of content files (in fact, a ZIP archive), and these content
files are specified using the options:
--spine NAME... (alias -s): The sequence
of XHTML and/or SVG files that define the reading order (required).
--stylesheets NAME...: The CSS stylesheets to be
included.
--toc=toc.xhtml: The table of contents file. This must
be XHTML type: EPUB 2 table of contents files (NCX type) are not
supported.
--resources NAME...: All other resources to be
included, that are not better categorised into the preceding
options.
The NAMEs of the content files are the names/paths
inside the EPUB container, which do not necessarily match the paths to
the corresponding source files in the file system. The source paths can
be configured using the option --source-dir=. (aliases
--src-dir and -d). For example,
--source-dir src --resource fig1.png inserts
src/fig1.png1 in the file system as fig1.png
in the container. Additionally, the names mimetype,
package.opf, and any name beginning with
META-INF are reserved in the EPUB container, and therefore
cannot be used.
XHTML files specified in --spine, by default, have their
contents inserted into the <body> element of a
template; see Templating to learn how to
configure this behaviour.
Finally, note that --toc has a default value, and
therefore cannot be blank. Unlike for all other input file options,
$SOURCE_DIR/$TOC need not exist in the file system, and if
it does not, the table of contents will be automatically generated (see
Table of Contents). If it does
exist, its contents will, by default, be inserted into the
<nav epub:type='toc'> element of a template; see Templating to learn how to configure this
behaviour.
Metadata in EPUB documents follows the Dublin
Core vocabulary. These metadata can be set using command line
options. The following options correspond to fields that are mandatory
in EPUB:
--dc-title=Untitled: Document title.
--dc-identifier=urn:uuid:$RANDOM_UUID: Document
identifier, either a URL or an RFC 8141 NID. Defaults
to a random UUID, but because this field is expected to be consistent
between document revisions, it is strongly recommended that you set this
explicitly. To this end, epubsynth will emit a warning if
this option is not set.
--dc-language=en: Document language, as an RFC 5646
identifier.
Other options for optional metadata are
These cover all of the /elements/1.1/
namespace, except for dc:format, which is always set to
application/epub+zip in EPUBs generated by
epubsynth, and cannot be changed.
Table of Contents
The table of contents for the EPUB document can be set in one of two
ways: automatic generation and manual authoring.
Automatic Generation
If the file referred to by --toc does not exist inside
--source-dir, a table of contents will be automatically
generated. The option --toc-headings can be used to specify
a list of headings, synchronised with the list of files in
--spine, for the table of contents. For example,
--spine titlepage.xhtml chapter1.xhtml chapter2.xhtml \
--toc-headings "Title Page" "Chapter 1" "Chapter 2"
will result in the table of contents
- Title
- Chapter 1
- Chapter 2
Files in --spine can be omitted from the table of
contents by setting their corresponding value in
--toc-headings to the empty string "", as well
as by making --toc-headings shorter than
--spine. For example,
--spine titlepage.xhtml chapter1.xhtml chapter2.xhtml backpage.xhtml \
--toc-headings "" "Chapter 1" "Chapter 2"
will result in the table of contents
If --toc-headings is not set, the file names in
--spine themselves are used as the headings. For
example,
--spine chapter1.xhtml chapter2.xhtml
without --toc-headings will result in the table of
contents
- chapter1.xhtml
- chapter2.xhtml
Manual Authoring
If the file referred to by --toc exists inside
--source-dir, it will be inserted into the EPUB container
as the table of contents (after templating, as mentioned in the previous section and detailed in Templating). For example, the source file
<ol>
<li><a href='chapter1.xhtml'>First Chapter</a></li>
<li><a href='chapter2.xhtml'>Another Chapter</a>
<ol>
<li><a href='chapter2.xhtml#section1'>Some Section</a></li>
</ol>
</li>
</ol>
will result in the table of contents
- First Chapter
- Another Chapter
See the
relevant section of the EPUB 3 specification for what this file must
adhere to.
Templating
There are two circumstances in which the contents of an input file
are inserted into a template before being added to the EPUB container:
XHTML files in --spine and the file in --toc
(the latter regardless of an automatically generated or manually
authored table of contents).
For an XHTML file in --spine, the templates
--template-xhtml and --template-html are used,
which default to
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'>
{html}
</html>
and
<head>
<title>{dc_title}</title>
{stylelinks}
</head>
<body>
{file}
</body>
respectively. These are Python
format strings for which the fields are
html: the contents of --template-html
after substitution,
dc_title: the (escaped) value of
--dc-title,
stylelinks: a concatenation of
<link> tags to each of the CSS stylesheets listed in
--stylesheets, and
file: the contents of the input file.
For the --toc file, the templates
--template-toc and --template-toc-nav are
used, which default to
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'
xmlns:epub='http://www.idpf.org/2007/ops'>
<head>
<title>{dc_title}</title>
</head>
<body>
{nav}
</body>
</html>
and
<nav epub:type='toc'>
{file}
</nav>
respectively. The fields are nav, the contents of
--template-toc-nav after substitution, as well as
dc_title and file with the same meaning as
before.
As an example, to disable all templating, one would set
--template-xhtml "{file}" --template-toc "{file}".