Reproducible Manuscripts with Quarto

Bioconductor 2023

J.J. Allaire — CEO Posit, PBC

Overview

  • What is Quarto?

  • Notebooks, Markdown, and Scientific Publishing

  • Introducing Quarto Manuscripts

What is Quarto?

Quarto is the next generation of R Markdown

Quarto Overview

https://quarto.org

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Computations: Python, R, Julia, Observable JS
  • Markdown: Pandoc w/ many enhancements
  • Output: Documents, presentations, websites, books, blogs

Literate programming system in the tradition of Org-mode, Sweave, Weave.jl, R Markdown, iPyPublish, Jupyter Book, etc.

Origins

  • Open source project sponsored by Posit, PBC.
  • 10 years of experience with R Markdown convinced us that the core ideas were sound.
  • The number of languages and runtimes used for scientific discourse is very broad.
  • Quarto is a ground-up re-imagining of R Markdown that is fundamentally multi-language and multi-engine.

Goal: Computational Documents

  • Documents that incorporate the source code required for their production
  • Notebook and plain text flavors
  • Automation and reproducibility

Goal: Scientific Markdown

Goal: Single Source Publishing

https://coko.foundation/articles/single-source-publishing.html

Why a New System?

  • The number of languages and runtimes used for scientific discourse is very broad (and the Jupyter ecosystem in particular is extraordinarily popular).
  • Quarto is at its core multi-language and multi-engine (supporting Knitr, Jupyter, and Observable today and potentially other engines tomorrow).
  • On the other hand, R Markdown is heavily tied to R which limits the number of people it can benefit.
  • Quarto is RStudio’s attempt to bring R Markdown to everyone!

Quarto Engines

  • Knitr
  • Jupyter
  • Observable JS
  • Others possible…

Knitr Engine

For R, Quarto still uses Knitr under the hood. Consequently, the vast majority of existing Rmd files can be rendered unmodified.

Note that the standard syntax for chunk options has changed (old syntax still works):

```{{r}}
#| echo: false
#| fig-cap: "Cars"
plot(cars)
```

Knitr Engine

Jupyter Engine — ipynb

The Jupyter engine supports the use of Python, Julia, and any other language that has a Jupyter kernel.

Jupyter supports two input file formats:

  • Traditional notebooks (.ipynb)
  • Markdown w/ chunks (.qmd)

Hello Jupyter: https://quarto.org/#hello-quarto

Jupyter Engine: ipynb

You can also render Jupyter notebooks (.ipynb files) directly. Note that in this case no execution occurs by default:

Jupyter Engine: Tooling

Side-by-side preview for JupyterLab, VS Code, Emacs, etc.:

$ quarto preview notebook.ipynb

Quarto Projects

Directory that produces a more sophisticated output, possibly drawn from multiple input files:

  • Websites

  • Books

  • Blogs

  • Journal Articles

Quarto Journals

https://quarto.org/docs/journals/

Custom format system designed to accomodate the creation of articles for publishing in professional Journals:

  • The ability to flexibly adapt the native LaTeX templates provided by Journals for use with Pandoc.

  • The use of spans and divs to apply formatting (which enables targeting by CSS for HTML output and LaTeX macros/environments for PDF output).

  • A standardized schema for authors and affiliations so that you can express this data once and then have it automatically formatted according to the styles required for various Journals.

  • The use of Citation Style Language (CSL) to automate the formatting of citations and bibliographies according to whatever style is required by various Journals.

Example: https://quarto-journals.github.io/jss/

Notebooks, Markdown, & Scientific Publishing

Notebooks in Scientific Publishing

Note: .qmd ≈ Notebook

  • Providing notebooks as curated research outputs would greatly enhance transparency and reproducibility.

  • Unfortunately, the current peer-review and publications workflows across the sciences do not readily support notebooks as research outputs or encourage their use and curation.

What do we need?

  • An end-to-end scholarly publishing workflow that would treat notebooks (from both Jupyter and Quarto), as a primary element of the scientific record.

  • A publication process that elevates transparent and reproducible work by authors, where data and software, together with narrative, are documented and shared.

  • Extend new forms of credit to the wider research community, including research software engineers or RSEs.

There is more hope for this than you might imagine…

Notebooks Now

https://data.agu.org/notebooks-now/

  • Notebooks Now! Elevating notebooks into scholarly publishing (https://doi.org/10.5281/zenodo.6981363)
  • Funded through a grant from the Alfred P. Sloan Foundation to the American Geophysical Union (AGU)
  • Broad collaboration between open source communities, open science organizations, and software tool makers.

Notebooks Now: Steering Committee

Member Affiliation
Alberto Pepe Wiley/Atypon/Authorea
Lorena Barba GW Univeristy, JOSS (Editor)
Yanina Bellini rOpenSci, R Ladies, Latin R
Chris Holdgraf 2i2c, Project Jupyter
Kenton McHenry NCSA, U Illinois at Urbana
Fernando Perez UC Berkley, Project Juypter
Alison Presmanes Hill Voltron Data (formerly RStudio)
Karthik Ram UC Berkley, rOpenSci

Notebooks Now: Progress to Date

  • In person + virtual workshop in Washington D.C in Novmeber of 2022 (hosted by AGU)
  • Various working groups established:
    • Pre-submission (authoring)
    • Submission and metadata
    • Peer review
    • Publication, production, and post-production
  • Two initial implementations planned: Quarto and Myst (markdown dialect used in JupyterBook)

Introducing Quarto Manuscripts

Manuscript Basics

  • New project type (manuscript) that provides a framework for writing and publishing scholarly articles.

  • Produce manuscripts in multiple formats (including LaTeX or MS Word formats required by journals), and give readers easy access to all of the formats through a website.

  • Use one or more notebooks or .qmd documents as the source of content and computations, and then publish these computations alongside the manuscript, allowing readers to dive into your code.

Getting Started

Multiple Formats from a Single Source

Rich Front Matter

Writing Tools

Embedding Computations

Trying Out Manuscripts

Software Requirements:

Talk to Us!

Thank You!

Slides:

https://jjallaire.quarto.pub/reproducible-manuscripts-with-quarto/

Learning more:

Questions?