Code
import graphviz
"""
graphviz.Source(digraph G {
hello -> world
}
""")
August 23, 2022
The fastpages framework for this blog has been excellent. There have been a few problems with it though:
Occasionally there can be problems rendering a post
The SVG images generated by things like graphviz get a bunch of encoded html added around them (github issue)
There can be random div tags scattered around
I’ve been idly watching the fastpages repo to see if these would get fixed. The SVG issue has a suggested solution to clean the notebook and re-execute. Since it can take hours or even days to run one of the posts I’m less keen on that. I still think it’s an excellent framework that has helped me a lot.
There was recently a hackernews post about some work that Jeremy Howard et al have been doing. In that post Jeremy said that he had been using Quarto recently. I checked it out and it appears to be very slick.
I’m trying it out as a replacement framework for this website. It already seems promising - the SVG images now work as expected:
It still renders math nicely \(\sum_{i} \frac{i}{3}\).
Pandas dataframes now have scrollbars:
column 0 | column 1 | column 2 | column 3 | column 4 | column 5 | column 6 | column 7 | column 8 | column 9 | column 10 | column 11 | column 12 | column 13 | column 14 | column 15 | column 16 | column 17 | column 18 | column 19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
Finally it has sidebar support so I can add inline citations (Howard and Ruder 2018).
The new framework isn’t perfect. I’ve customized fastpages and my notebook use quite a bit which will always make it hard to match.
I’ve been using a simple function to load code into notebooks from external files. It returns the code as a renderable Code object. This means that it looks identical to the code in the notebook.
This has been really helpful for sharing code between multiple posts. Unfortunately this does not play well with Quarto. The code will not render as if it were a cell, which at the very least means losing syntax highlighting. Furthermore I cannot collapse the output.
As the amount of code in each post has grown it may well become more appropriate to link the code directly. This would likely involve creating a repo for each individual project that I work on and linking to the code as it was when I worked on the post. This feels a little brittle and makes it harder to view.
from importlib import import_module
from pathlib import Path
from typing import Any, Dict
from IPython.display import Code, display
PROJECT_ROOT = Path(".").resolve().parents[3]
def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
module = import_module(module_name)
for name in names:
context[name] = getattr(module, name)
path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
display(Code(filename=path))
vs
from importlib import import_module
from pathlib import Path
from typing import Any, Dict
from IPython.display import Code, display
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
module = import_module(module_name)
for name in names:
context[name] = getattr(module, name)
path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
display(Code(filename=path))
One of the problems here is that the cell directives that I used to control the rendering have changed. I’m using #| echo: false
above to hide the actual code, but the rendered code is still poor.
Fixing this while keeping the same approach may involve javascript code markup, as the cells are significantly processed by Quarto as part of rendering the page. There is also a post processing step that can execute arbitrary python over the notebook. Either way, I need a way to share code between notebooks while keeping each notebook standalone.
I’ve attempted to fix this by using highlight.js to apply syntax highlighting to the cell output. Unfortunately it doesn’t seem to be possible to provide custom javascript files. I’ve opened a ticket about that.
The ticket received a response which suggested writing a lua filter. I’ve done that now and instead of preserving my jupyter_import
approach I’ve moved to just handling the plain python imports. That means the following code:
from blog.jupyter_import import jupyter_import
becomes:
# from src/main/python/blog/jupyter_import.py
from importlib import import_module
from pathlib import Path
from typing import Any, Dict
from IPython.display import Code, display
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def jupyter_import(context: Dict[str, Any], module_name: str, *names: str) -> None:
module = import_module(module_name)
for name in names:
context[name] = getattr(module, name)
path = PROJECT_ROOT / (module_name.replace(".", "/") + ".py")
display(Code(filename=path))
The title of this page is ../../../../blog - Changing...
. That’s not great.
Ideally I would be able to strip that prefix entirely. I don’t really know where it is coming from though. On the plus side this doesn’t affect the published site.
This is fixed now - it was a setting in the _quarto.yml
file. I’ve removed the title of the site however the removal of it has resulted in their being no way to navigate to the home page. This might be a good thing for me to fix in quarto itself.
I’ve had to restructure the notebooks to maintain the URLs as much as possible. This has made it hard to keep the images. I’ve taken to moving all of the images into the folder for the day that the post was made.
This has been done now, but it was quite a bit of work.
The metadata format for each post has changed. Since I now have over 100 posts, it’s quite a lot of work to fix that. I ended up writing a python script to directly change the notebooks:
from pathlib import Path
import json
def translate(notebook: Path) -> None:
data = json.loads(notebook.read_text())
first_cell = data["cells"][0]
if first_cell["cell_type"] == "raw":
return
source = first_cell["source"]
title, description, *_ = source
title = title[1:].strip()
description = description[1:].strip()
date_parts = [part.name for part in notebook.parents[:3]][::-1]
date = "-".join(date_parts)
alias = "/" + ("/".join(date_parts + [notebook.stem + ".html"]))
source = ["---\n", f"title: {title}\n", f'date: "{date}"\n', f"description: {description}\n", "aliases:\n", f' - "{alias}"\n', "---"]
first_cell["source"] = source
first_cell["cell_type"] = "raw"
notebook.write_text(json.dumps(data, indent=1))
It’s very hacky however it gets the job done.
When I use references I have to include:
bibliography: ../../../references.bib
In the post metadata. I was not able to find a way to make this a site wide setting unfortunately.
It’s hard to keep the original URL structure. I’ve had to resort to adding redirects for all the old posts. It would be nice if you could alter the path of a folder, as this change has added posts
to the url.
Even though I’ve listed a lot of problems with this, and it has taken quite a while to fix, I am very happy with this change. The blog looks really slick now, and I’m hopeful that plotly will work better now.