Publishing Jupyter Notebooks with Quarto

J.J. Allaire — Founder & CEO, Posit
JupyterCon 2023

“We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: 1) interactive computing; 2) computational narratives; and 3) the idea that Jupyter is more than software.”

Telling the Whole Story

  • Sources, assumptions, and constraints are often as important to understand as our metrics and visualizations
  • The insights we glean from data are often contextual and have important qualifications
  • Narrative becomes a crucial part of communicating about data
  • We need tools for storytelling!

Some History

1978 TeX Donald Knuth
1984 Literate Programming Donald Knuth
1988 Mathematica Notebooks Stephen Wolfram
2001 IPython Fernando Perez
2003 Emacs org-mode Carsten Dominik
2004 Markdown John Gruber
2005 Sage Notebook William Stein
2006 Pandoc John MacFarlane
2009 GitHub Flavored Markdown Tom Preston-Werner
2011 iPython Notebook Fernando Perez
2012 knitr Yihui Xie
2014 Project Jupyter Fernando Perez

Quarto — https://quarto.org

An open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Ground up re-write of a similar system (R Markdown) that was R specific.

  • Quarto is multi-language and designed to serve the entire scientific computing community.

  • Sponsored and developed primarily by Posit (RStudio / Tidyverse / Shiny)

Goal: Computational Documents

  • Documents that incorporate the source code required for their production.

  • Reproduciblity: Integrity of output in the face of changes.

  • Automation: Make it easier than not to work reproducibly.

Goal: Scientific Markdown

  • MS Word: Highly accessible but scales poorly with document complexity.
  • LaTeX: Much harder to use but absorbs complexity well over time.
  • Quarto: Evolution of markdown to give it the accessibility of Word with the scalability of LaTeX.

Goal: Single Source Publishing

https://coko.foundation/articles/single-source-publishing.html

Write semantically and re-purpose content across many mediums
(Web, Mobile, Print, Office, CMS, etc.)

Tools for Computational Narratives

To achieve these goals we need a tool that can:

  • Render executable content from Jupyter
  • Include code, math, diagrams, citations, crossrefs, etc.
  • Support a broad range of output formats
  • Be easily extended with new features and new formats

Variety of tools available: nbconvert, Jupyter Book, Myst-JS, Quarto

Compared to nbconvert, the next generation of tools focus more on production quality output, advanced features like citations and crossrefs, and complex project types like websites, blogs, and books.

Quarto Basics

How does Quarto work?

  • Computations: Jupyter1
  • Markdown: Pandoc w/ many enhancements
  • Output: Documents, presentations, websites, books, blogs

Render Notebook to HTML (default options)

Render Notebook to HTML (document level options)

Render Notebook to HTML (document and cell level options)

Render Notebook to MS Word https://quarto.org/docs/output-formats/ms-word.html

Render Notebook to PDF https://quarto.org/docs/output-formats/pdf-basics.html

Render Notebook to Revealjs https://quarto.org/docs/presentations/revealjs/

Render Notebook to Revealjs (show code with line by line highlighting)

Quarto Projects

https://quarto.org/docs/projects/quarto-projects.html

_quarto.yml
project:
  type: website

website:
  title: "Acme"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - about.qmd

format:
  html:
    theme: cosmo
    css: styles.css
  • So far our examples have been single documents or presentations

  • Quarto has a project system that enables you to produce collections of documents in various formats (websites, blogs, books, etc.)

  • _quarto.yml config file defines the behavior of projects

Website: Fastai Course https://course.fast.ai

_quarto.yml
project:
  type: website
  resources:
    - "www/*"

format:
  html:
    theme: cosmo
    css: styles.css
    toc: true

website:
  title: "Practical Deep Learning for Coders"
  description: "Learn Deep Learning with fastai and PyTorch, 2022"
  twitter-card: true
  open-graph: true
  reader-mode: true
  page-navigation: true
  repo-branch: master
  repo-url: https://github.com/fastai/course22
  repo-actions: [issue]
  navbar:
    search: true
    right:
      - icon: github
        href: https://github.com/fastai/course22
  sidebar:
    style: "floating"

metadata-files:
  - sidebar.yml

Blog: Aayush Agrawal https://aayushmnit.com/

_quarto.yml
project:
  type: website

website:
  title: "Aayush Agrawal"
  description: "Aayush's personal website"
  repo-url: https://github.com/aayushmnit/aayushmnit.github.io
  repo-actions: [edit, issue]
  repo-branch: main
  open-graph: true
  google-analytics: "G-7QN8N70N41"
  twitter-card:
    creator: "@aayushmnit"
    card-style: summary_large_image
  navbar:
    collapse-below: lg
    left:
      - icon: newspaper
        href: blog.qmd
        text: Blog
    right:
      - icon: github
        href: https://github.com/aayushmnit/
      - icon: rss
        href: blog.xml

format:
  html:
    theme: sandstone
    mainfont: Roboto
    css: styles.css

Book: Geocomputation with Python https://py.geocompx.org

_quarto.yml
project:
  type: book

book:
  title: "Geocomputation with Python"
  author: | 
    Michael Dorman, Anita Graser, 
    Jakub Nowosad, Robin Lovelace
  description: | 
    An introductory resource for working with geographic
    data in Python
  cover-image: https://geocompx.org/static/img/book_cover_py_tmp_small.png
  site-url: https://py.geocompx.org
  repo-url: https://github.com/geocompx/geocompy/
  repo-actions: [edit]
  sharing: [twitter, facebook, linkedin]
  chapters:
    - index.qmd
    - preface.qmd
    - 02-spatial-data.qmd
    - 03-attribute-operations.qmd
    - 04-spatial-operations.qmd
    - 05-geometry-operations.qmd
    - 06-raster-vector.qmd
    - 07-reproj.qmd

format:
  html: 
    theme: flatly
    template-partials: [toc.html,title-block.html]
    code-overflow: wrap   

Books

https://quarto.org/docs/books/

  • Inherit features of Quarto websites (navigation, search, mobile, etc.)

  • Support cross references across chapters

  • Produce multiple book formats from a single source

  • HTML

  • PDF (LaTeX)

  • MS Word

  • ePub

  • Asciidoc

Output Formats

Pandoc

Universal document converter

  • Created in 2006 by John MacFarlane (who is also the author of the CommonMark spec and CommonMark reference implementations in JavaScript, C, and Haskell)
  • CommonMark + many extensions for technical writing
  • Supports dozens of output formats (just about any format you can name)
  • Highly extensible (custom readers, custom writers, AST filters)

Pandoc Formats

Documents

  • HTML
  • PDF
  • MS Word
  • Open Office
  • ePub

Presentations

  • Revealjs
  • PowerPoint
  • Beamer

Markdown

  • CommonMark
  • GitHub (GFM)
  • Markua

Pandoc Formats (cont.)

Wikis

  • MediaWiki
  • DocuWiki
  • ZimWiki
  • Jira Wiki
  • XWiki

Other

  • JATS
  • ConTeXt
  • reST
  • Asciidoc
  • Org-mode
  • Textile
  • DocBook
  • InDesign
  • GNU Texinfo
  • FictionBook

Custom Formats

Because Quarto and Pandoc are based on a semantic AST, we can also publish to any content management system we need to. For example:

Hugo (Goldmark Markdown)
Docusaurus (MDX Markdown)
Confluence (Confluence XML)
O’Reilly Atlas     (Asciidoc)

Hugo: Goldmark Markdown https://quarto.org/docs/output-formats/hugo.html

Docusaurus: MDX Markdown https://quarto.org/docs/output-formats/docusaurus.html

Confluence: Confluence XML https://quarto.org/docs/prerelease/1.3/confluence.html

O’Reilly Atlas https://quarto.org/docs/prerelease/1.3/asciidoc-books.html

Books can be rendered to asciidoc, which is fully compatible with the production requirements of O’Reilly Atlas (used for Print, ePub, and Web books)

Notebooks Now

https://data.agu.org/notebooks-now/


  • Collaboration among participants in the open-science community, scientific publishers, and the developers of Jupyter Book, Myst-JS, and Quarto to create a standard for including notebooks in scientific publications.

  • Aim is to define a standard for scholarly articles that include notebooks, enabling them to be considered as part of peer review and included in archives.

Workflow

Render and Preview

Render to output formats:

# ipynb notebook
quarto render notebook.ipynb
quarto render notebook.ipynb --to docx

# plain text qmd
quarto render notebook.qmd 
quarto render notebook.qmd --to pdf

Live preview server (re-render on save):

# ipynb notebook
quarto preview notebook.ipynb
quarto preview notebook.ipynb --to docx

# plain text qmd
quarto preview notebook.qmd
quarto preview notebook.qmd --to pdf

Plain Text Notebooks w/.qmd Files

penguins.qmd
---
title: "Palmer Penguins"
author: Norah Jones
date: March 12, 2023  
format: html
jupyter: python3
---

```{python}
#| echo: false

import pandas as pd
df = pd.read_csv("palmer-penguins.csv") 
df = df[["species", "island", "year", \
         "bill_length_mm", "bill_depth_mm"]]
```

## Exploring the Data

See @fig-bill-sizes for an exploration of bill sizes.

```{python}
#| label: fig-bill-sizes
#| fig-cap: Bill Sizes by Species

import matplotlib.pyplot as plt
import seaborn as sns
g = sns.FacetGrid(df, hue="species", height=3)
g.map(plt.scatter, "bill_length_mm", "bill_depth_mm") \
  .add_legend()
```
  • Editable with any text editor (extensions for VS Code, Neovim, and Emacs)

  • Cells always run in the same order

  • Integrates well with version control

  • Cache output with Jupyter Cache or Quarto freezer

  • Lots of pros and cons visa-vi traditional .ipynb format/editors, use the right tool for each job

Quarto VS Code Extension

  • Render with integrated preview
  • Syntax highlighting for markdown and embedded languages
  • Completion for embedded languages (e.g. Python, R, Julia, LaTeX, etc.)
  • Completion for YAML options
  • Commands and key-bindings for running cells and selected line(s)
  • Live preview for diagrams

Rendering Pipeline

Notebook workflow (no execution occurs by default):

Plain text workflow (.qmd => .ipynb then execute cells):

Extending Quarto

Extensions

https://quarto.org/docs/extensions/

  • Filters & Shortcodes

  • Custom Formats

  • Revealjs Plugins

  • Project Types

  • Project Templates

Filters

https://quarto.org/docs/extensions/filters.html

  • Filters transform the document AST before final rendering

  • Can be used to modify, remove, or generate content

  • Can include target format specific logic / output

  • Example: Use the panflute library to increase the level of headings in a document.

Terminal
$ quarto render nb.ipynb --filter headers.py
headers.py
from panflute import *

def increase_header_level(elem, doc):
    if type(elem) == Header:
        if elem.level < 6:
            elem.level += 1

def main():
    return run_filter(increase_header_level)

if __name__ == "__main__":
    main()

What Can Filters Do?

  • Embedded languages (e.g. PlantUML, GraphViz)
  • Macro substitution (environment variables, config files, etc.)
  • Cross references and citations
  • Image conversion and filtering
  • Advanced formatting (e.g. callouts)

Quarto includes dozens of filters that implement its core functionality, but the system is open so you can add whatever features you require.

Filter Examples

Filter Description
lightbox Create lightbox treatments for images in your HTML documents.
molstar Shortcode to embed proteins and trajectories with Mol*.
social-share Add buttons to share articles on various social media platforms.
latex-environment Output divs as custom LaTeX environments.
qrcode Shortcode to embed QR codes using qrcodejs.
code-visibility Filter code and stream output included within a document.
authors-block Add author-related header block when rendering docx-documents.

Writing Filters

pandocfilters Python library from the creator of Pandoc
panflute Python library with improved API and more batteries included
Lua Filters Pandoc includes an embedded Lua interpreter for fast, zero-dependency filters
JSON Filters Write filters in any language via JSON representation over stdin/stdout

Integration w/ nbdev

https://nbdev.fast.ai

  • Interactively develop Python packages within Jupyter, including embedded tests/docs, CI, pypi and conda publishing

  • Version 2 of nbdev uses Quarto to produce documentation websites

Thank You!

Slides: https://jjallaire.quarto.pub/jupytercon-2023/

Resources

Questions?