super basic pandoc wiki-like thing
Fri 22 May 2015 , 0 comments
Tue 02 June 2015

To make this work we need some tools:

  • make to complie the thing: especially important for automating the process - sed to make some super basic index (table of contents)
  • pandoc to do… well all of the heavy lifting
  • github-pandoc.css to make the pages look less dull

As you can see there are not so many things we actually need in order to create quite a nice looking wiki-like static website which is perfect for creating and linking documents around at the expense of keeping everything very basic.

Bit first I’d like to start with a make tutorial because… well, I also had some hard time figuring things out and I guess it would be a good idea to put the findings somewhere…

make (a) mini-tutorial

make is a tool for automating the build process. It isn’t perfect, but it will make do (since I don’t really know any other tool to help me out with this build process :D). Things to know about:

  • it can store variables
  • it has a ridiculous amount of built-in functions… just take a look at the documentation… seriously, it basically has it’s own language
  • it has a target - dependency mechanism which is why we use it in the first place, so that’s good - aaaand it can run shell commands (without it we would have a pretty useless tool…)

Instead of repeating all of the other tutorials that you can find around the vastness of the internet, I’ll just paste the whole Makefile I use, annotated with loads of comments to make things easy to follow. Bit first, let’s look at the directory structure:

directory tree structure

.
└── src
    ├── ideas.md
    ├── Makefile
    ├── tech
    │   ├── technical_terms.md
    │   └── technologies.md
    └── web_platform.md

Yah… that’s pretty much it. As you can see there is no index.md or the like. We’ll see why when we’ll peek in the Makefile… and speaking of the Makefiles, here is it’s contents:

makefile

# start defining variables (everythin on the left of `=` sign)
# the $(...) construct are functions which, when put together can build really
# complex functionality:
#
# - the two callse to `wildcard` retreive all the files ending with `.md`
#   from the root folder as well as the subdirectories
# - `BROOT` is set to `..` because in this instance I wanted to have the build
#   directory be the parent directory
# - `$(dir ...)` returns the directory part of each file
# - `$(addprefix prefix, names...)` appends a prefix (as you can imagine) to
#   each file
# - `$(sort list)` sorts the list (of words separated by single space), but
#   what's not so obvious is that it also removes duplicates
#
# so putting all of this together we have the a sorted list without any
# duplicates of the directory names where the `*.md` files reside, with the
# appended prefix of `$(BROOT)`. Here `$(BROOT)` just retrieves the variables,
# in this case being `..`, the parent directory
#
SRC = $(wildcard *.md) $(wildcard **/*.md)
BROOT = ..
BDIRS = $(sort $(addprefix $(BROOT)/, $(dir $(wildcard **/*.md))))

# here, the `$(SRC:%.md=%.html)` instructs `make` to replace of the form
# `whatever.md` with `whatever.html`. The `%` symbol is a wildcard matching
# any number of characeters in a word.
#
OUT = index.html $(SRC:%.md=%.html)

# and finally these are the options used for pandoc to generate the HTML
# pages using `github-pandoc.css` for styiling and `--toc` to build the
# table of contents
#
OPTS = -c ~/.pandoc/github-pandoc.css --toc

# this line (and all of the others with `:`) are of the form
# `target: [dependencies]`, where `dependencies` are in brackets to signify
# that they are optional. This basically instructs `make` to execute all of
# the dependencies of `all` target (`directories`, `index`, `$(OUT)`)
#
all: directories index $(OUT)

# this in turn depends on `$(BDIRS)` variable (which as you remember) is the
# sorted list of directories
#
directories: $(BDIRS)

# this line is executed for each $(BDIR). As you can see this one doesn't
# define any dependencies so it means it just has to execute the instructions
# from inside of it. NOTE that the lines to be executed have to start with
# a TAB character. So this hear instructs `make` to create the build
# directories
# the `$(@D)` goes like so:
# - `$@` will be replaced by whatever it is in the left side of
#   `target: [dependencies]`. The `D` modifier instructs `make` to replace it
#   only with the directory part. So, in this example it will be expanded to
#   `../tech/` (because that's the directory in `src`)
#
$(BDIRS):
  mkdir -p $(@D)

# next this is executed because the `all` target also dependes on `index`.
# This is a bit tricky and I won't go detail of what sed does here, explaining
# every bit. I'll just give the general overview. Here, everything in `$()` is
# executed by `make` (those are functions), just like before. Again, going
# from inside-out, we have:
# - `filter-out`, a function of the form `$(filter-out pattern...,text)`.
#   If `pattern` matches in `text` then it is removed, that's basically it.
#   So in this case, the pattern is `$@.html`, thus it says to `make`:
#   "guy, remove for me this left part of the thing in `target: [dependencies]`
#   (so index) where we have appended the `.html` suffix to it, thus, guy...
#   remove for me `index.html` from the list of words stored in variable `OUT`"
# - `$(patsubst ...)` then takes each of those words from the `filter(ed)-out`
#   list and replaces `whatever.html` (`%.html`) with whatever (`%`).
#   So we effectively remove the file extension in this step
# - `$(sort ...)` just sorts the list as usual
# - as a bonus, the two `sed` commands replace every word in the list of words
#   with `- [word](word.html)\n` and also appends at the very beginning the
#   string `# table of contents\n\n`. Then we redirect, using the usual shell
#   `>` redirection operator, all of this output to `$@.md`.
#   Now we have a super basic `index.md` file with a table of contents
#   pointing to the other (yet to be created) HTML files (in sorted order)
#
index:
  echo '$(sort $(patsubst %.html,%,$(filter-out $@.html, $(OUT))))' | \
    sed 's/\b\([a-z0-9_\/]\+\)\b/- [\1](\1.html)\n/g;1s/^/# table of contents\n\n/' | \
    sed 's/^ *//g' > $@.md

# next this lines are executed because of all's dependency on `$(OUT)` (which
# includes all of our file paths with extension modified from `.md` to `.html`).
# The `index.html` file depends on `index.md` and luckily we have just
# created that file in the previous step... phew! Now this is really basic
# stuff, I'm not even going to go in detail here. I just want to point out
# this other strange thing that we haven't seen before: `$<`. This `$<`
# construct represents the first occurrence from the right side in
# `target: [dependencies]`, which makes it kind of the opposite of `$@`.
# Now that you know all of this you should have no trouble figuring out what
# this block does. Let me break it down for you just in case. In sequence this
# is happening:
# 1. `pandoc` generates a HTML file and using the `~/.pandoc/github-pandoc.css`
# sylesheet file. Simple right?
# 2. move the newly HTML generated file into the `$(BROOT)` directory
# 3.finally remove the temporary `index.md` file which we don't need any more
#
index.html: index.md
  pandoc -c ~/.pandoc/github-pandoc.css -o $@ $<
  mv $@ $(BROOT)/$@
  rm $<

# these lines are basically the same thing as the previous, except they
# execute for every individual `.md` file, without the removal process of
# course...
#
%.html: %.md
  pandoc $(OPTS) -o $@ $<
  mv $@ $(BROOT)/$@

# the `clean` target does exactly what it says: it will remove the
# build directory
#
clean:
  rm -f $(BROOT)/*.html
  rm -rf $(BDIRS)

# this last part tells `make` not to do any fancy business when invoked
# with `make target` (where `target` is one of `all`, `directories`,
# `index`, `clean`) and there are files around with the same name as this
# `target`
#
.PHONY: all directories index clean

That basically wraps it up for this process. By running make in the command line, we are able to generate a very simple, static HTML wiki-like structure.

the pandoc utility

This is a super handy command line application to have around. It can pretty much convert from everything to every other thing and back! Just take a look at it’s very extensive organized documentation. It has a bunch of plug-ins and it’s really simple to use.

For those of you who don’t know there is this other project called scholdoc which I’m really looking forward to. It is targeted towards academic writing. I hope this project makes it.

For the purpose of this demonstration, I used pandoc in a very simple manner. The only options passed it are:

  • -c stylesheet.css which tells it to include the stylesheet.css file in the HTML. The style I’m using here is something I came across by chance: github-pandoc.css. I think it does a pretty decent job considering what I was looking for
  • --toc as you can guess, this tells pandoc to generate a table of contents for us
  • -o outfile.html and finally the no-brainer: output the result in outfile.html

summary

If all you want is a very quick way of building a super simple relational HTML documents structure this can very well be a good start. For the time being this will do, but for more involved solutions maybe a static website/blog generator will fulfill your needs? If you’re interested in that take a look at pelican maybe? Obviously you’ll have some setting up to do, but I think it can be pretty handy to have around.

Anyway, for the time being I’ll stick with the super simple solution for building the static HTML files. If I’ll find a no-setup quick just run and give me a wiki utility I’ll be sure to update this.