Hosting Shiny

Chapter 4 Examples

Shiny apps come in many different shapes and forms. We will not be able to represent this vast diversity, but instead we wanted some apps that can be used to showcase common patterns, and that can also fit onto the pages of a printed book reasonably well.

We will use 3 Shiny apps as examples, all 3 are implemented in both R and Python:

  • faithful: a “Hello Shiny!” app displaying the Old Faithful geyser waiting times data as a histogram with a slider that allows to adjust the number of bins used in the histogram — this app demonstrates the very basics of of reactivity, and it is very short.
  • bananas: an app that classifies the ripeness of banana fruits based on the color composition (green, yellow, brown) — this app demonstrates a more complex use case with dependencies, and the app also relies on a machine learning model, thus it better reflects real world use cases.
  • lbtest: an app to test load balancing when scaling Shiny apps to multiple instances.

Let’s learn about the example apps.

4.1 Old Faithful

This is the classic “Hello Shiny!” app that you can see in R by trying shiny::runExample("01_hello"). The app displays the Old Faithful geyser waiting times data as a histogram with a slider that allows to adjust the number of bins used in the histogram (Fig. 4.1). The R version of the app was originally written by the Shiny package authors (Chang et al. 2024).

The “Hello Shiny!” in R has no dependencies other than shiny. The Old Faithful app in Python has more requirements besides shiny, because the Python standard library does not have the geyser data readily available, and you need e.g. matplotlib (Hunter 2007) for the histogram. We wrote the Python version as a mirror translation of the R version, so that you can see the similarities and the differences.

The `faithful` example Shiny app.

Figure 4.1: The faithful example Shiny app.

In R, the data set datasets::faithful (R Core Team 2024) contains waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. We got the Python data set from the Seaborn library seaborn.load_dataset("geyser") (Waskom 2021).

The source code for the different builds of the Old Faithful Shiny app is at https://github.com/h10y/faithful. You can download the GitHub repository az a zip file from GitHub, or clone the repository with git clone https://github.com/h10y/faithful.git.

4.2 Bananas

The bananas app was born out of a “stuck-in-the-house” COVID-19 project when one of the authors bought some green bananas at the store and took daily photographs of each fruit. Later, the data set was used as part of teaching a workshops. The motivation for the app is that it follows a workflow that is fairly common in all kinds of data science projects:

  1. Have a question to answer: Is my banana ripe?
  2. Collect data: Go to the store, buy bananas, set up a ring light and take pictures every day over 3 weeks.
  3. Compile the training data: Classify colour pixels and calculate the relative proportions, score pictures according to ripeness status.
  4. Run exploratory data analysis: Let’s explore and visualize the data set.
  5. Train a classification model: Estimate the the banana ripeness class and probability given the colour composition.
  6. Build a “scoring engine”: Given some colour inputs for a new fruit, tell me the probability for the ripeness classes.
  7. Build a user interface: Let a non technical user to do the data exploration and classification as part of a web application.

4.2.1 The Bananas Data Set

The data set tracks the ripening colour composition of banana fruits daily over a 3-week period. The full data set can be found in the GitHub repository and R package bananas (install.packages("bananas", repos = "https://psolymos.r-universe.dev")). The subset used in the book and the Shiny app constitutes the 6 fruits that were kept at room temperature.

The table has the following fields:

  • fruit: the identifier of the fruit,
  • day: number between 0 and 20, the number of days since the first set of photographs,
  • ripeness: the ripeness class of the fruit based in Péter’s personal judgement (Under, Ripe, Very, Over),
  • green, yellow, brown: colour composition, these 3 values add up to 1 (100%).

The colour composition was determined based on colour mapping the pixel values of the banana fruits and converting the pixel based 2-dimensional area to proportions.

The following summary presents the ripeness and the percentage values of green, yellow, brown colours.

Figure 4.2 shows the change in colour composition over the 3 weeks of the experiment. You can see that the proportion of green colour went down, parallel to that the yellow colour proportion peaked around day 5. Yellow started decreasing after that while the proportion of brown started increasing.

We can also present the same information according to the ripeness classes (Fig. 4.3). You can see that the under-ripe class is characterized by high green proportion and the absence of brown. The ripe class is characterized by the highest proportion of yellow. Very ripe bananas have higher proportion of brown while yellow colour is still the most common. Over ripe bananas are mostly brown.

Colour composition of the bananas over time.

Figure 4.2: Colour composition of the bananas over time.

Colour composition of the bananas by ripeness class.

Figure 4.3: Colour composition of the bananas by ripeness class.

4.2.2 Model Training

We chose Support Vector Machines (SVM) to model a multi-level response variable (Under, Ripe, Very, Over) as a function of the green, yellow, and brown colours.

We used the e1071 package (Meyer et al. 2023) in R, and the SVM model’s prediction accuracy was 90.8%. We saved the trained model object as an R binary .rds file:

library(e1071)

# Read the bananas data
x <- read.csv("bananas.csv")
x$ripeness <- factor(x$ripeness, c("Under", "Ripe", "Very", "Over"))

# Multinomial classification with Support Vector Machines
m <- svm(ripeness ~ green + yellow + brown,
  data = x,
  probability = TRUE
)

# Two-way table to test prediction accuracy
table(x$ripeness, predict(m))
sum(diag(table(x$ripeness, predict(m)))) / nrow(x)

# Predict ripeness class
predict(m, data.frame(green = 1, yellow = 0, brown = 0), 
  probability = TRUE)
predict(m, data.frame(green = 0, yellow = 1, brown = 0), 
  probability = TRUE)
predict(m, data.frame(green = 0, yellow = 0, brown = 1), 
  probability = TRUE)
predict(m, data.frame(green = 0.1, yellow = 0.2, brown = 0.7), 
  probability = TRUE)

# Save the model object
saveRDS(m, "bananas-svm.rds")

We can fit a similar SVM model in Python using scikit-learn (sklearn) (Pedregosa et al. 2011):

import pandas as pd
from joblib import dump
from sklearn import svm

# Global
x = pd.read_csv('bananas.csv')

# Train SVM
x.loc[x.ripeness == 'Under', 'target'] = 0
x.loc[x.ripeness == 'Ripe', 'target'] = 1
x.loc[x.ripeness == 'Very', 'target'] = 2
x.loc[x.ripeness == 'Over', 'target'] = 3
data_X = x[['green', 'yellow', 'brown']].to_numpy()
data_y = x.target.values
svm_model = svm.SVC(probability = True)
svm_model.fit(data_X, data_y)

#' Predict ripeness class
svm_model.predict_proba([[1, 0, 0]])
svm_model.predict_proba([[0, 1, 0]])
svm_model.predict_proba([[0, 0, 1]])
svm_model.predict_proba([[0.1, 0.2, 0.7]])

# Write model object to file
dump(svm_model, 'bananas-svm.joblib')

4.2.3 The Shiny App

The Shiny app consists of a ternary plot showing the daily colour composition of each banana fruit, alongside the new point to be classified (in red), as shown in Figure 4.4. The three numeric inputs on the left hand side of the plot control the position of the red dot. The classification results based on these inputs are shown on the right hand side of the ternary plot. You can see probabilities of under-ripe, ripe, very ripe, and over-ripe classes, and the class with highest probability is assigned as a label.

The `bananas` example Shiny app.

Figure 4.4: The bananas example Shiny app.

The source code for the different builds of the Bananas Shiny app is at https://github.com/h10y/bananas. You can download the GitHub repository az a zip file from GitHub, or clone the repository with git clone https://github.com/h10y/bananas.git.

4.3 Load Balancing Test

Shiny apps can run multiple sessions in the same app instance. A common problem when scaling the number of replicas for Shiny apps is that traffic might not be sent to the same session and thus the app might randomly fail. This app is used to determine if the HTTP requests made by the client are correctly routed back to the same R or Python process for the session.

Both the Python and the R version of the app registers a dynamic route for the client to try to connect to. The JavaScript code on the client side will repeatedly hit the dynamic route. The server will send a 200 OK status code only if the client reached the correct Shiny session, where it originally came from (Fig. 4.5).

The `lbtest` example Shiny app.

Figure 4.5: The lbtest example Shiny app.

The original Python app was written by Joe Cheng and is from the rstudio/py-shiny GitHub repository. We wrote the R version to mirror the Python version.

This app will be useful when the deployment includes load balancing between multiple replicas. For such deployments, session affinity (or sticky sessions) needs to be available. This app can be used to test such setups. If the test fails, it will stop before the counter reaches 100 and will say Failure! If the app succeeds 100 times, you’ll see Test complete. The app is not useful for testing a single instance deployment, or with Shinylive, because these setups won’t fail, but you can still try it.

The source code for the different builds of the load balancing test Shiny app is at https://github.com/h10y/lbtest. You can download the GitHub repository az a zip file from GitHub, or clone the repository with git clone https://github.com/h10y/lbtest.git.

4.4 Summary

This is the end of Part I. We covered all the fundamentals that the rest of the book builds upon. In the next part, we’ll cover all the technical details of Shiny hosting that happens on your local machine.

We recommend getting the example repositories mentioned in this chapter available on your computer. This way you will be able to follow all the examples from the following chapters and won’t have to copy paste the text from the book to files. Visit the GitHub organization h10y which stands for hostingshiny (there are 10 letters between the first h and the last y): https://github.com/h10y/.

References

Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2024. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Hunter, J. D. 2007. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering 9 (3): 90–95. https://doi.org/10.1109/MCSE.2007.55.
Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch. 2023. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Waskom, Michael L. 2021. “Seaborn: Statistical Data Visualization.” Journal of Open Source Software 6 (60): 3021. https://doi.org/10.21105/joss.03021.