Hao Ye Health Science Center Libraries, University of Florida (updated: 2022-10-03)

Intro

  • Motivations R packages are the primary means of bundling R code. Packages have many use cases, including:
    • re-using code or data across projects
    • writing open source software
    • creating reproducible analyses
  • Learning Outcomes By the end of the workshop, participants will be able to:
    • create working R packages with code and data
    • write documentation using roxygen2
    • describe additional package-related tools (e.g. pkgdown, testthat)

Creating R Packages

  • Setup Before getting started, there are some tools to install to make creating packages easier.

    install.packages(c("devtools", 
                       "usethis"
                       "roxygen2", 
                       "knitr"))

    I recommend using RStudio as a development environment.

  • Creating a new R package

    devtools::create("myDemoPkg")
    • package names have to start with a letter and can only contain letters, numbers, and periods.
      myDemoPkg
      ├── .gitignore
      ├── .Rbuildignore
      ├── DESCRIPTION
      ├── myDemoPkg.Rproj
      ├── NAMESPACE
      └── R/
      
  • What are these files?

    • .gitignore tells git to NOT track certain files
    • .Rbuildignore tells R to NOT include certain files when building the package
    • DESCRIPTION contains basic metadata
    • myDemoPkg.Rproj is the RStudio project
    • NAMESPACE lists the objects to load with the package
    • R/ contains the R code
  • Building the Package

    • R Console

      devtools::install()

      OR

    • RStudio Build Pane A cropped screenshot of the 'Build' Pane in the RStudio interface, showing a panel with tabs for 'Plots', 'Packages', 'Help, 'Build', and 'Viewer' on the left and buttons with the minimize and maximize icons on the right. The 'Build' tab is selected, showing 3 additional buttons at the top of the tab: 'Install and Restart' (with a hammer icon), 'Check' (with a checklist icon), and 'More' (with a gear icon and an arrow indicating a pulldown menu). There is a magenta oval around the 'Install and Restart' button.

  • Demo

  • Fully Operational!

    • Next Steps:
      • syncing to GitHub
      • adding DESCRIPTION info
      • adding data and code
      • adding documentation

Syncing to GitHub

  • Why GitHub?
    • note: other cloud services exist (e.g. GitLab) – we restrict ourself to GitHub in this course

    • Cloud-based backup of your new package

      • private repos are only visible to whom you give access to
    • Supports installation from R

      remotes::install_github("{username}/{repo}")
      # remotes::install_github("ha0ye/myDemoPkg")
  • Setup (local)
  • Setup (GitHub, via usethis)
  • Setup (GitHub, manual)
    • create a new repository
      • ideally same name as your package folder
      • choose public or private
      • SKIP initialization
    • RStudio, click “New Branch” on the “Git”Pane A cropped screenshot of the 'Git' Pane in the RStudio interface, showing a panel with tabs for 'Files', 'Connections', 'Git', and 'Tutorial'. The 'Git' tab is selected, showing another row of buttons, including a button with an icon of a white diamond adjoined to 2 purple rectangles by straight lines. This button is centered in a magenta oval.
  • Setup (GitHub, manual)
    • Click “Add Remote”
      • use origin for the “Remote Name”
      • copy-paste URL from GitHub (shown after creating new repo) for “Remote URL” A cropped screenshot of the 'Add Remote' dialog box in the RStudio interface, showing text boxes labeled 'Remote Name' and 'Remote URL', and buttons for 'Add' and 'Cancel'. The 'Remote Name' text box is filled with 'origin' and the 'Remote URL' text box is filled with 'git@github.com:ha0ye/myDemoPkg.git'
  • Setup (GitHub, manual)
    • Choose the same “Branch Name”.
      (Select “Overwrite” at the warning about a local branch already existing.) A cropped screenshot of the 'Git' Pane in the RStudio interface, showing a panel with tabs for 'Files', 'Connections', 'Git', and 'Tutorial'. The 'Git' tab is selected, showing another row of buttons, including a text label 'Main' with a pulldown menu icon. The screenshot also contains a dialog box, labeled 'New Branch', with a textbox 'Branch Name' also containing 'Main', a selection menu labeled 'Remote', a button to 'Add Remote...', a checked check box labeled 'Sync box with remote', and two buttons labeled 'Create' and 'Cancel'.
  • Demo
  • Workflow
    • work on the package
    • commit the changes
    • push to GitHub

Updating DESCRIPTION

  • Parts of a DESCRIPTION
    • package name
    • 1 sentence title
    • version number
    • package authors
    • 1 paragraph description
    • license
    • dependencies
  • Editing DESCRIPTION
    • plaintext file
    • you only need to fill out most fields once
    • a license determines how you allow other people to use it
      • usethis has defaults built-in:
      usethis::use_mit_license()  
  • Adding Dependencies
    • Any packages that you use code from, other than the base package, need to be listed as dependencies.
    • Functions from those other packages need to be referred to by {pkg}::{fun}.
    • The use_package function will add a package to the dependencies in DESCRIPTION.
    usethis::use_package("utils")

Data and Code

  • Code
    • Create functions in files within the R/ folder.
    f <- function(df)
    {
      names(df)
    }
  • Exporting functions
    • NAMESPACE needs to include the names of objects to be loaded alongside the package.
    • Rather than modify the NAMESPACE file directly, it is preferable to use devtools and roxygen2 to create NAMESPACE for us.
    • We add specially formatted comments around our code, and then call devtools::document() to generate the documentation files.
  • Updating NAMESPACE
    • Adding #' @export right before an object will include it in NAMESPACE:
    #' @export
    f <- function(df)
    {
      names(df)
    }
  • Adding datasets
    • Use usethis::use_data() to export a dataset to a file and add that file to the package:
    dat <- data.frame(x = 1:3, y = 5:7)
    usethis::use_data(dat) # no quotes

Documenting Code

  • roxygen2

    • R expects documentation to be written in a specific format, .rd, and stored in the man/ folder
      • this is a pain.
    • roxygen2 adopts the idea of doxygen for R:
      • code and documentation appear together (specific comments get turned into docs)
      • easier to maintain consistency
  • Example

    #' Get the column names of a data.frame
    #' @param df A data.frame
    #' @return a character vector
    #' @export
    f <- function(df)
    {
      names(df)
    }
  • Data

    • Datasets can be documented, usually within the data.R file:
    #' Example data.frame
    #'
    #' @format A data frame with 3 rows and 2 variables:
    #' \describe{
    #'   \item{x}{some numbers}
    #'   \item{y}{some other numbers}
    #' }
    "dat"

Extras

  • Some other useful add-ons
    • writing tests for your functions?
      check out testthat
    • want a nice website for your package?
      check out pkgdown
    • want tests and the pkgdown website to run automatically on github? check out github actions
    usethis::use_github_action("check-release")

Research Compendia

  • Research Compendia ### Q: How do you share a data analysis reproducibly? ### A: Turn it into an R package!
  • Benefits
    • Your project follows the common structure of an R package.
      • dependencies have to be listed
      • functions preferred over scripts
      • tests are more naturally created
    • example - https://github.com/ha0ye/portalDS

Summary

  • Summary
    • even simple packages can be handy
      • shared code for your lab
      • custom ggplot themes for yourself
    • structuring your work in packages promotes re-use and reproducibility ## Thanks
  • Let me know what content you’d like to see
  • Contact me for additional questions or consultation requests!
  • Check back in on the libguide for more modules and contact info: