Hao Ye Health Science Center Libraries, University of Florida (updated: 2021-02-16)

Intro

  • Motivations R packages are the primary means of bundling R code and facilitate many use cases, including:
    • re-using code or data across projects
    • writing open source software
    • creating reproducible analyses
  • Learning Outcomes By the end of the workshop, participants will be able to:
    • create working R packages with code and data
    • write documentation using roxygen2
    • describe additional package-related tools (e.g. pkgdown, testthat)

Creating R Packages

  • Setup Before getting started, we want some tools that help make creating packages easier.

    install.packages(c("devtools", "roxygen2", 
                       "testthat", "knitr"))
  • Creating a new R package

    devtools::create("myDemoPkg")
    myDemoPkg
    ├── .gitignore
    ├── .Rbuildignore
    ├── DESCRIPTION
    ├── myDemoPkg.Rproj
    ├── NAMESPACE
    └── R/
    
    • package names have to start with a letter and can only contain letters, numbers, and periods.
  • What are these files?

    • .gitignore tells git to NOT track certain files
    • .Rbuildignore tells R to NOT include certain files when building the package
    • DESCRIPTION contains basic metadata about the package
    • myDemoPkg.Rproj is the RStudio project
    • NAMESPACE lists the objects that are loaded with the package
    • R/ contains R function definitions
  • Building the Package

    • R Console
    devtools::install()

    OR

    • RStudio Build Pane A cropped screenshot of the 'Build' Pane in the RStudio interface, showing a panel with tabs for 'Plots', 'Packages', 'Help, 'Build', and 'Viewer' on the left and buttons with the minimize and maximize icons on the right. The 'Build' tab is selected, showing 3 additional buttons at the top of the tab: 'Install and Restart' (with a hammer icon), 'Check' (with a checklist icon), and 'More' (with a gear icon and an arrow indicating a pulldown menu). There is a magenta oval around the 'Install and Restart' button.
  • Fully Operational!

    • The package can be built and installed.
    • Next Steps:
      • syncing to GitHub
      • adding DESCRIPTION info
      • adding data and code
      • adding documentation

Syncing to GitHub

  • Why GitHub?
    • Cloud-based backup of your new package

      • private repos are only visible to whom you give access to
    • Supports installation from GitHub

      remotes::install_github("{username}/{repo}")
      # remotes::install_github("ha0ye/myDemoPkg")
      • private repos, too, IF you have a personal access token (PAT) set up appropriately
  • Setup (local)
    • Install Git and register a GitHub account - https://happygitwithr.com/install-intro.html
    • Create a Git repo for the project and make an initial commit
    usethis::use_git()
  • Setup (GitHub, via usethis)
    • Using a personal access token (PAT) that is setup with permissions to create repos
    usethis::use_github()
    OR
  • Setup (GitHub, manual)
    • create a new repository
      • ideally same name as your package folder
      • choose public or private
      • SKIP initialization
    • RStudio, click “New Branch” on the “Git”Pane A cropped screenshot of the 'Git' Pane in the RStudio interface, showing a panel with tabs for 'Files', 'Connections', 'Git', and 'Tutorial'. The 'Git' tab is selected, showing another row of buttons, including a button with an icon of a white diamond adjoined to 2 purple rectangles by straight lines. This button is centered in a magenta oval.
  • Setup (GitHub, manual)
    • Click “Add Remote”
      • use origin for the “Remote Name”
      • copy-paste URL from GitHub (shown after creating new repo) for “Remote URL” A cropped screenshot of the 'Add Remote' dialog box in the RStudio interface, showing text boxes labeled 'Remote Name' and 'Remote URL', and buttons for 'Add' and 'Cancel'. The 'Remote Name' text box is filled with 'origin' and the 'Remote URL' text box is filled with 'git@github.com:ha0ye/myDemoPkg.git'
  • Setup (GitHub, manual)
    • Choose the same “Branch Name” as your Git panel shows. A cropped screenshot of the 'Git' Pane in the RStudio interface, showing a panel with tabs for 'Files', 'Connections', 'Git', and 'Tutorial'. The 'Git' tab is selected, showing another row of buttons, including a text label of 'Master' with a pulldown menu icon. The screenshot also contains a dialog box, labeled 'New Branch', with a textbox 'Branch Name' also containing 'Master', a selection menu labeled 'Remote', a button to 'Add Remote...', a checked check box labeled 'Sync box with remote', and two buttons labeled 'Create' and 'Cancel'.
    • Select “Overwrite” at the warning about a local branch already existing.
  • Demo
  • Workflow
    • update the package
    • commit the changes
    • push to GitHub

Updating DESCRIPTION

  • Parts of a DESCRIPTION
    • package name
    • 1 sentence title
    • version number
    • package authors
    • 1 paragraph description
    • license
    • dependencies
  • Editing DESCRIPTION
    • most fields can be filled out by hand (just once)
    • a license determines how you allow other people to use it
      • usethis has defaults built-in:
      usethis::use_mit_license()  
      • see also https://choosealicense.com/
  • Adding Dependencies
    • For functions other than those in the base package, the package needs to be listed as a dependency, and the package referred to by {pkg}::{fun}.
    • The use_package function will add a package to the dependencies in DESCRIPTION.
    usethis::use_package("utils")

Data and Code

  • Code
    • Create functions in files within the R/ folder.
    f <- function(df)
    {
      names(df)
    }
  • Exporting functions
    • NAMESPACE needs to include the names of objects to be loaded alongside the package.
    • Adding a special comment allows us to call devtools::document() to modify NAMESPACE:
    #' @export
    f <- function(df)
    {
      names(df)
    }
  • Adding datasets
    • Use usethis::use_data() to export a dataset to a file and add that file to the package:
    dat <- data.frame(x = 1:3, y = 5:7)
    usethis::use_data(dat) # no quotes

Documentation

  • roxygen2

    • R expects documentation to be written in a specific format, .rd, and stored in the man/ folder
      • this is a pain.
    • roxygen2 adopts the idea of doxygen for R:
      • code and documentation appear together (specific comments get turned into docs)
      • easier to maintain consistency
  • Example

    #' Get the column names of a data.frame
    #' @param df A data.frame
    #' @return a character vector
    #' @export
    f <- function(df)
    {
      names(df)
    }
  • Data

    • Datasets can be documented, usually within data.R:
    #' Example data.frame
    #'
    #' @format A data frame with 3 rows and 2 variables:
    #' \describe{
    #'   \item{x}{some numbers}
    #'   \item{y}{some other numbers}
    #' }
    "dat"

Extras

  • Some other useful add-ons
    • writing tests for your functions?
      check out testthat
    • want a nice website for your package?
      check out pkgdown
    • want tests and the pkgdown website to run automatically on github? check out github actions
    usethis::use_github_action("check-release")

Research Compendia

  • Research Compendia ### Q: How do you share a data analysis reproducibly? ### A: Turn it into an R package!
    • write functions to do analysis
    • include data, or code to retrieve data
    • write-up workflow as a package vignette
      • see https://r-pkgs.org/vignettes.html
  • Benefits
    • Your project follows the common structure of an R package.
      • dependencies have to be listed
      • functions preferred over scripts
      • tests are more naturally created
    • example - https://github.com/ha0ye/portalDS

Summary

  • Summary
    • even simple packages can be handy
      • shared code for your lab
      • custom ggplot themes for yourself
    • structuring your work in packages promotes re-use and reproducibility ## Thanks
  • Let me know what content you’d like to see
  • Contact me for additional questions or consultation requests!
  • Check back in on the libguide for more modules and contact info:
    • https://guides.uflib.ufl.edu/reproducibility