Natya Hans Academic Research Consulting and Services, University of Florida (updated: 2024-02-16)
from “Refactoring: Improving the Design of Existing Code” by Martin Fowler
R
.What are functions?
mean()
# computes arithmetic mean of input
celsius_to_fahrenheit <- function(x) {
9/5 * x + 32
}
Why write your own functions?
You can repeat code … ```r df <- data.frame( a = rnorm(10), b = rnorm(10), c = rnorm(10))
rescale all the columns to [0, 1] dfa < − (dfa - min(dfa))/(max(dfa) - min(dfa))dfb <- (dfb − min(dfb)) / (max(dfb) − min(dfa)) dfc < − (dfc - min(dfc))/(max(dfc) - min(df$c)) ```
Or define a function! ```r rescale_to_01 <- function(x) { (x - min(x)) / (max(x) - min(x)) }
rescale the columns of df to [0, 1] dfa < − rescaleto01(dfa) dfb < − rescaleto01(dfb) dfc < − rescaleto01(dfc)
or with dplyr df <- df %>% mutate( across(c(“a”, “b”, “c”), rescale_to_01)) ```
The DRY Principle: “Don’t Repeat Yourself”
Workflow structure modified from “Reproducible research best practices @JupyterCon” (version 18) by Rachael Tatman, https://www.kaggle.com/rtatman/reproducible-research-best-practices-jupytercon
Example Code .small[
library(tidyverse) source("analysis_functions.R") source("plotting_functions.R") data_raw]
Notes
Tips for writing functions
Tip 1: Naming Things Function names should be verbs ```r
bad row_adder() permutation()
good add_row() permute() ``` examples from https://style.tidyverse.org/functions.html#naming
Tip 2: Plan for flexibility
plot_abundance_histogram <-
function(data_proc, filename,
width = 6, height = 6) {
# {{code}}
}
data_proc
and filename
are required inputsTip 3: How to subdivide
Notes on Tip 3
More Notes on Tip 3
bad if (class(x)[[1]] == “numeric” || class(x)[[1]] == “integer”)
good if (is.numeric(x)) ``` examples from https://speakerdeck.com/jennybc/code-smells-and-feels?slide=36
Tip 4: Use data structures Most programming languages let you create data structures to store complex data.
R
, you can create a list to include data, settings, and results. This list can be returned from a function:Developers commenting their code pic.twitter.com/jKURCVR9ds
— Ricardo Ferreira (@riferrei) February 6, 2021
(via “Notes on Programming in C”, Rob Pike) There is a famously bad comment style: c i=i+1; /* Add one to i */
and there are worse ways to do it: c /********************************** * * * Add one to i * * * **********************************/ i=i+1;
Don’t laugh now, wait until you see it in real life. * Comment Dos & Don’ts * Do use comments to describe why, not how * Do document the inputs, outputs, and purpose of each function * Do store notes/references in comments * Don’t turn code off and on with comments * Comment Dos & Don’ts Do use comments to describe why, not how Bad: r * Set lib as (1, [2/3 * n]) lib <- c(1, floor(2/3 * n))
Good: r * set aside 2/3 of data to train model lib <- c(1, floor(2/3 * n))
* Comment Dos & Don’ts Do document the inputs, outputs, and purpose of each function r plot_abundance_histogram <- function(data_proc, filename, width = 6, height = 6) { # {{code}} }
* What kind of object is data_proc
, what fields/columns are used to make the plot? * Comment Dos & Don’ts Do store notes/references in comments r * cholesky algorithm from Rasmussen & * Williams (2006, algorithm 2.1) R <- chol(Sigma) alpha <- backsolve(R, forwardsolve(t(R), y_lib - mean_y)) L_inv <- forwardsolve(t(R), diag(NROW(x_lib))) Sigma_inv <- t(L_inv) %*% L_inv
* Comment Dos & Don’ts Don’t turn code off and on with comments – you will not remember why it was commented out (was it buggy?, did you want to test somethign?) r * cat("The value of x is", x)
- use conditional logic insteadr if (DEBUG_MODE) { cat("The value of x is", x) }
Set DEBUG_MODE
at the top of the script.
What are “code smells”?
Some common code smells
TMI = too much indentation .tiny[
get_some_data] example from https://speakerdeck.com/jennybc/code-smells-and-feels?slide=42
Simplify the logic! .tiny[
get_some_data data] example from https://speakerdeck.com/jennybc/code-smells-and-feels?slide=43
Magic Numbers Magic numbers are values in the code where the meaning of the number is derived from context.
Example
The correlation between mpg and cyl is -0.852162
Example (improved)
var_1 <- "mpg"
var_2 <- "cyl"
cat("The correlation between", var_1,
"and", var_2, "is",
cor(mtcars[,var_1], mtcars[,var_2]))
The correlation between mpg and cyl is -0.852162
Example (as a function)
f <- function(df = mtcars, var_1 = "mpg",
var_2 = "cyl") {
cat("The correlation between", var_1,
"and", var_2, "is",
cor(df[,var_1], df[,var_2]))
}
f()
The correlation between mpg and cyl is -0.852162
General Strategies
Let me know what content you’d like to see
Contact me for additional questions or consultation requests!
Email: nhans@ufl.edu
Check back in on the libguide for more modules and contact info:
Original slides courtesy of Hao Ye