# epubsynth

A command-line program for generating EPUB documents.

## In a Nutshell

`epubsynth` generates EPUBs from source files as well as using provided metadata:

```
epubsynth \
    --output book.epub \
    --spine titlepage.xhtml \
            chapter1.xhtml \
            chapter2.xhtml \
            chapter3.xhtml \
    --stylesheets style.css \
    --resources fig1.png \
                fig2.jpg \
    --dc-title "A Book" \
    --dc-creator "Ann Author" \
    --dc-contributor "Ann Other Author"
```
Furthermore, `epubsynth` handles templating so that boilerplate does not need to be included in XHTML source files; their contents are inserted into the `<body>` tag.

This program is primarily intended to be used inside a shell script, or as part of a build system like `make`, rather than directly in an interactive shell. Accordingly, the command-line syntax is quite verbose, and includes metadata that is meant to be the same between invocations.

## Installation

To install and uninstall `epubsynth`, use `make install` and `make uninstall` respectively.

## Usage

The command-line interface for `epubsynth` consists of no positional arguments, only options, out of which `--output` and `--spine` are required:
```
epubsynth --output OUTPUT --spine NAME... [OPTION...]
```

Note that, while `epubsynth` aims for correctness in any boilerplate *it* generates, it does *not* validate input source files for correctness to their respective formats (e.g. XHTML), or check for broken links in these source files. For this reason, it is recommended to use a tool like [`epubcheck`](https://github.com/w3c/epubcheck) to validate the generated EPUB file.

### Output and Source Files

The file path of the generated EPUB is specified using the `--output` option (alias `-o`). An EPUB file is a container of content files (in fact, a ZIP archive), and these content files are specified using the options:

* `--spine NAME...` (alias `-s`): The sequence of XHTML and/or SVG files that define the reading order (required).
* `--stylesheets NAME...`: The CSS stylesheets to be included.
* `--toc=toc.xhtml`: The table of contents file. This must be XHTML type: EPUB 2 table of contents files (NCX type) are not supported.
* `--resources NAME...`: All other resources to be included, that are not better categorised into the preceding options.

The `NAME`s of the content files are the names/paths inside the EPUB container, which do not necessarily match the paths to the corresponding source files in the file system. The source paths can be configured using the option `--source-dir=.` (aliases `--src-dir` and `-d`). For example, `--source-dir src --resource fig1.png` inserts `src/fig1.png1` in the file system as `fig1.png` in the container. Additionally, the names `mimetype`, `package.opf`, and any name beginning with `META-INF` are reserved in the EPUB container, and therefore cannot be used.

XHTML files specified in `--spine`, by default, have their contents inserted into the `<body>` element of a template; see [Templating](#templating) to learn how to configure this behaviour.

Finally, note that `--toc` has a default value, and therefore cannot be blank. Unlike for all other input file options, `$SOURCE_DIR/$TOC` need not exist in the file system, and if it does not, the table of contents will be automatically generated (see [Table of Contents](#table-of-contents)). If it *does* exist, its contents will, by default, be inserted into the `<nav epub:type='toc'>` element of a template; see [Templating](#templating) to learn how to configure this behaviour.

### Document Metadata

Metadata in EPUB documents follows the [Dublin Core](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) vocabulary. These metadata can be set using command line options. The following options correspond to fields that are mandatory in EPUB:

* `--dc-title=Untitled`: Document title.
* `--dc-identifier=urn:uuid:$RANDOM_UUID`: Document identifier, either a URL or an [RFC 8141](https://www.rfc-editor.org/rfc/rfc8141) NID. Defaults to a random UUID, but because this field is expected to be consistent between document revisions, it is strongly recommended that you set this explicitly. To this end, `epubsynth` will emit a warning if this option is not set.
* `--dc-language=en`: Document language, as an [RFC 5646](https://www.rfc-editor.org/rfc/rfc5646) identifier.

Other options for optional metadata are

* `--dc-creator NAME`: Document author.
* `--dc-contributor NAME...`: List of non-author contributors.
* `--dc-coverage COVERAGE`: String describing the "[spatial or temporal topic of the resource, spatial applicability of the resource, or jurisdiction under which the resource is relevant.](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/coverage/)"
* `--dc-date DATE`: ISO 8601 string for the document date. Mutually exclusive with `--dc-date-now`.
* `--dc-date-now`: Set the document's `dc:date` to the current date and time. Mutually exclusive with `--dc-date`.
* `--dc-description DESCRIPTION`: Document description.
* `--dc-publisher PUBLISHER`: Document publisher.
* `--dc-relation RELATION...`: List of URIs (or other formal identifiers) of related resources. See [the relevant specification](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/relation/).
* `--dc-rights RIGHTS`: Statement about intellectual property rights associated with the document.
* `--dc-source SOURCE`: A source from which this document is derived, ideally formally identified. see [the relevant specification](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/source/).
* `--dc-subject SUBJECT`: Document subject/topic.
* `--dc-type TYPE`: The "[nature or genre of the resource.](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/type/)"

These cover all of the [`/elements/1.1/`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#section-3) namespace, except for `dc:format`, which is always set to `application/epub+zip` in EPUBs generated by `epubsynth`, and cannot be changed.

### Table of Contents

The table of contents for the EPUB document can be set in one of two ways: automatic generation and manual authoring.

#### Automatic Generation

If the file referred to by `--toc` does not exist inside `--source-dir`, a table of contents will be automatically generated. The option `--toc-headings` can be used to specify a list of headings, synchronised with the list of files in `--spine`, for the table of contents. For example,
```
--spine        titlepage.xhtml chapter1.xhtml chapter2.xhtml \
--toc-headings "Title Page"    "Chapter 1"    "Chapter 2"
```
will result in the table of contents

* Title
* Chapter 1
* Chapter 2

Files in `--spine` can be omitted from the table of contents by setting their corresponding value in `--toc-headings` to the empty string `""`, as well as by making `--toc-headings` shorter than `--spine`. For example,
```
--spine        titlepage.xhtml chapter1.xhtml chapter2.xhtml backpage.xhtml \
--toc-headings ""              "Chapter 1"    "Chapter 2"
```
will result in the table of contents

* Chapter 1
* Chapter 2

If `--toc-headings` is not set, the file names in `--spine` themselves are used as the headings. For example,
```
--spine chapter1.xhtml chapter2.xhtml
```
without `--toc-headings` will result in the table of contents

* chapter1.xhtml
* chapter2.xhtml

#### Manual Authoring

If the file referred to by `--toc` exists inside `--source-dir`, it will be inserted into the EPUB container as the table of contents (after templating, as mentioned in the [previous section](#output-and-source-files) and detailed in [Templating](#templating)). For example, the source file
```
<ol>
    <li><a href='chapter1.xhtml'>First Chapter</a></li>
    <li><a href='chapter2.xhtml'>Another Chapter</a>
    <ol>
        <li><a href='chapter2.xhtml#section1'>Some Section</a></li>
    </ol>
    </li>
</ol>
```
will result in the table of contents

* First Chapter
* Another Chapter
  * Some Section

See [the relevant section of the EPUB 3 specification](https://www.w3.org/TR/epub-33/#sec-nav-def-model) for what this file must adhere to.

### Templating

There are two circumstances in which the contents of an input file are inserted into a template before being added to the EPUB container: XHTML files in `--spine` and the file in `--toc` (the latter regardless of an automatically generated or manually authored table of contents).

For an XHTML file in `--spine`, the templates `--template-xhtml` and `--template-html` are used, which default to
```
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'>
    {html}
</html>
```
and
```
<head>
    <title>{dc_title}</title>
    {stylelinks}
</head>
<body>
    {file}
</body>
```
respectively. These are [Python format strings](https://docs.python.org/3/library/string.html#formatstrings) for which the fields are

* `html`: the contents of `--template-html` after substitution,
* `dc_title`: the (escaped) value of `--dc-title`,
* `stylelinks`: a concatenation of `<link>` tags to each of the CSS stylesheets listed in `--stylesheets`, and
* `file`: the contents of the input file.

For the `--toc` file, the templates `--template-toc` and `--template-toc-nav` are used, which default to
```
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'
      xmlns:epub='http://www.idpf.org/2007/ops'>
<head>
    <title>{dc_title}</title>
</head>
<body>
    {nav}
</body>
</html>
```
and
```
<nav epub:type='toc'>
    {file}
</nav>
```
respectively. The fields are `nav`, the contents of `--template-toc-nav` after substitution, as well as `dc_title` and `file` with the same meaning as before.

As an example, to disable all templating, one would set `--template-xhtml "{file}" --template-toc "{file}"`.
