Matthew’s Blog - Changing to Quarto

The fastpages framework for this blog has been excellent. There have been a few problems with it though:

Occasionally there can be problems rendering a post
The SVG images generated by things like graphviz get a bunch of encoded html added around them (github issue)
There can be random div tags scattered around

I’ve been idly watching the fastpages repo to see if these would get fixed. The SVG issue has a suggested solution to clean the notebook and re-execute. Since it can take hours or even days to run one of the posts I’m less keen on that. I still think it’s an excellent framework that has helped me a lot.

Quarto

There was recently a hackernews post about some work that Jeremy Howard et al have been doing. In that post Jeremy said that he had been using Quarto recently. I checked it out and it appears to be very slick.

I’m trying it out as a replacement framework for this website. It already seems promising - the SVG images now work as expected:

Code

import graphviz

graphviz.Source("""
digraph G {
    hello -> world
}
""")

It still renders math nicely \(\sum_{i} \frac{i}{3}\).

Pandas dataframes now have scrollbars:

Code

import pandas as pd

pd.DataFrame([{f"column {number}": number for number in range(20)}])

	column 0	column 1	column 2	column 3	column 4	column 5	column 6	column 7	column 8	column 9	column 10	column 11	column 12	column 13	column 14	column 15	column 16	column 17	column 18	column 19
0	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19

Finally it has sidebar support so I can add inline citations (Howard and Ruder 2018).

Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language Model Fine-Tuning for Text Classification.” https://arxiv.org/abs/1801.06146.

Problems

The new framework isn’t perfect. I’ve customized fastpages and my notebook use quite a bit which will always make it hard to match.

Code Blocks

I’ve been using a simple function to load code into notebooks from external files. It returns the code as a renderable Code object. This means that it looks identical to the code in the notebook.

This has been really helpful for sharing code between multiple posts. Unfortunately this does not play well with Quarto. The code will not render as if it were a cell, which at the very least means losing syntax highlighting. Furthermore I cannot collapse the output.

As the amount of code in each post has grown it may well become more appropriate to link the code directly. This would likely involve creating a repo for each individual project that I work on and linking to the code as it was when I worked on the post. This feels a little brittle and makes it harder to view.

Code

from importlib import import_module
from pathlib import Path
from typing import Any, Dict

from IPython.display import Code, display

PROJECT_ROOT = Path(".").resolve().parents[3]

def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
    module = import_module(module_name)
    for name in names:
        context[name] = getattr(module, name)
    path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
    display(Code(filename=path))

from importlib import import_module
from pathlib import Path
from typing import Any, Dict

from IPython.display import Code, display

PROJECT_ROOT = Path(__file__).resolve().parents[1]

def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
    module = import_module(module_name)
    for name in names:
        context[name] = getattr(module, name)
    path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
    display(Code(filename=path))

One of the problems here is that the cell directives that I used to control the rendering have changed. I’m using #| echo: false above to hide the actual code, but the rendered code is still poor.

Fixing this while keeping the same approach may involve javascript code markup, as the cells are significantly processed by Quarto as part of rendering the page. There is also a post processing step that can execute arbitrary python over the notebook. Either way, I need a way to share code between notebooks while keeping each notebook standalone.

I’ve attempted to fix this by using highlight.js to apply syntax highlighting to the cell output. Unfortunately it doesn’t seem to be possible to provide custom javascript files. I’ve opened a ticket about that.

The ticket received a response which suggested writing a lua filter. I’ve done that now and instead of preserving my jupyter_import approach I’ve moved to just handling the plain python imports. That means the following code:

from blog.jupyter_import import jupyter_import

becomes:

Code

# from src/main/python/blog/jupyter_import.py
from importlib import import_module
from pathlib import Path
from typing import Any, Dict

from IPython.display import Code, display

PROJECT_ROOT = Path(__file__).resolve().parents[1]

def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
    module = import_module(module_name)
    for name in names:
        context[name] = getattr(module, name)
    path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
    display(Code(filename=path))

Page Title

The title of this page is ../../../../blog - Changing.... That’s not great.

Ideally I would be able to strip that prefix entirely. I don’t really know where it is coming from though. On the plus side this doesn’t affect the published site.

This is fixed now - it was a setting in the _quarto.yml file. I’ve removed the title of the site however the removal of it has resulted in their being no way to navigate to the home page. This might be a good thing for me to fix in quarto itself.

Images

I’ve had to restructure the notebooks to maintain the URLs as much as possible. This has made it hard to keep the images. I’ve taken to moving all of the images into the folder for the day that the post was made.

This has been done now, but it was quite a bit of work.

Post Metadata

The metadata format for each post has changed. Since I now have over 100 posts, it’s quite a lot of work to fix that. I ended up writing a python script to directly change the notebooks:

Code

from pathlib import Path
import json

def translate(notebook: Path) -> None:
    data = json.loads(notebook.read_text())

    first_cell = data["cells"][0]
    if first_cell["cell_type"] == "raw":
        return

    source = first_cell["source"]
    title, description, *_ = source
    title = title[1:].strip()
    description = description[1:].strip()
    date_parts = [part.name for part in notebook.parents[:3]][::-1]
    date = "-".join(date_parts)
    alias = "/" + ("/".join(date_parts + [notebook.stem + ".html"]))
    source = ["---\n", f"title: {title}\n", f'date: "{date}"\n', f"description: {description}\n", "aliases:\n", f' - "{alias}"\n', "---"]
    first_cell["source"] = source
    first_cell["cell_type"] = "raw"

    notebook.write_text(json.dumps(data, indent=1))

It’s very hacky however it gets the job done.

Bibliography

When I use references I have to include:

bibliography: ../../../references.bib

In the post metadata. I was not able to find a way to make this a site wide setting unfortunately.

URLs

It’s hard to keep the original URL structure. I’ve had to resort to adding redirects for all the old posts. It would be nice if you could alter the path of a folder, as this change has added posts to the url.

Conclusion

Even though I’ve listed a lot of problems with this, and it has taken quite a while to fix, I am very happy with this change. The blog looks really slick now, and I’m hopeful that plotly will work better now.