Code
pacman::p_load(tidyverse, FunnelPlotR, plotly, knitr)Cai Jingheng
January 30, 2024
February 6, 2024
Funnel plot is a specially designed data visualisation for conducting unbiased comparison between outlets, stores or business entities. By the end of this hands-on exercise, you will gain hands-on experience on:
plotting funnel plots by using funnelPlotR package,
plotting static funnel plot by using ggplot2 package, and
plotting interactive funnel plot by using both plotly R and ggplot2 packages.
In this exercise, four R packages will be used. They are:
readr for importing csv into R.
FunnelPlotR for creating funnel plot.
ggplot2 for creating funnel plot manually.
knitr for building static html table.
plotly for creating interactive funnel plot.
In this section, COVID-19_DKI_Jakarta will be used. The data was downloaded from Open Data Covid-19 Provinsi DKI Jakarta portal. For this hands-on exercise, we are going to compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan) as at 31st July 2021, DKI Jakarta.
The code chunk below imports the data into R and save it into a tibble data frame object called covid19.
# A tibble: 5 × 7
`Sub-district ID` City District `Sub-district` Positive Recovered Death
<dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl>
1 3172051003 JAKARTA UT… PADEMAN… ANCOL 1776 1691 26
2 3173041007 JAKARTA BA… TAMBORA ANGKE 1783 1720 29
3 3175041005 JAKARTA TI… KRAMAT … BALE KAMBANG 2049 1964 31
4 3175031003 JAKARTA TI… JATINEG… BALI MESTER 827 797 13
5 3175101006 JAKARTA TI… CIPAYUNG BAMBU APUS 2866 2792 27
FunnelPlotR package uses ggplot to generate funnel plots. It requires a numerator (events of interest), denominator (population to be considered) and group. The key arguments selected for customisation are:
limit: plot limits (95 or 99).
label_outliers: to label outliers (true or false).
Poisson_limits: to add Poisson limits to the plot.
OD_adjust: to add overdispersed limits to the plot.
xrange and yrange: to specify the range to display for axes, acts like a zoom function.
Other aesthetic components such as graph title, axis labels etc.
The code chunk below plots a funnel plot.

A funnel plot object with 267 points of which 0 are outliers.
Plot is adjusted for overdispersion.
A funnel plot object with 267 points of which 0 are outliers. Plot is adjusted for overdispersion.
Things to learn from the code chunk above.
group in this function is different from the scatterplot. Here, it defines the level of the points to be plotted i.e. Sub-district, District or City. If Cityc is chosen, there are only six data points.
By default, data_typeargument is “SR”.
limit: Plot limits, accepted values are: 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.
The code chunk below plots a funnel plot.

A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
A funnel plot object with 267 points of which 7 are outliers. Plot is adjusted for overdispersion.
Things to learn from the code chunk above. + data_type argument is used to change from default “SR” to “PR” (i.e. proportions). + xrange and yrange are used to set the range of x-axis and y-axis
The code chunk below plots a funnel plot.
funnel_plot(
numerator = covid19$Death,
denominator = covid19$Positive,
group = covid19$`Sub-district`,
data_type = "PR",
xrange = c(0, 6500),
yrange = c(0, 0.05),
label = NA,
title = "Cumulative COVID-19 Fatality Rate by Cumulative Total Number of COVID-19 Positive Cases", #<<
x_label = "Cumulative COVID-19 Positive Cases", #<<
y_label = "Cumulative Fatality Rate" #<<
)
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
A funnel plot object with 267 points of which 7 are outliers. Plot is adjusted for overdispersion.
Things to learn from the code chunk above.
label = NA argument is to removed the default label outliers feature.
title argument is used to add plot title.
x_label and y_label arguments are used to add/edit x-axis and y-axis titles.
In this section, you will gain hands-on experience on building funnel plots step-by-step by using ggplot2. It aims to enhance you working experience of ggplot2 to customise speciallised data visualisation like funnel plot.
To plot the funnel plot from scratch, we need to derive cumulative death rate and standard error of cumulative death rate.
Next, the fit.mean is computed by using the code chunk below.
The code chunk below is used to compute the lower and upper limits for 95% confidence interval.
number.seq <- seq(1, max(df$Positive), 1)
number.ll95 <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll999 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
dfCI <- data.frame(number.ll95, number.ul95, number.ll999,
number.ul999, number.seq, fit.mean)In the code chunk below, ggplot2 functions are used to plot a static funnel plot.
p <- ggplot(df, aes(x = Positive, y = rate)) +
geom_point(aes(label=`Sub-district`),
alpha=0.4, color= "#3459e6") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll999),
size = 0.4,
colour = "grey40") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul999),
size = 0.4,
colour = "grey40") +
geom_hline(data = dfCI,
aes(yintercept = fit.mean),
size = 0.4,
colour = "grey40") +
coord_cartesian(ylim=c(0,0.05)) +
annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") +
annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") +
ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
xlab("Cumulative Number of COVID-19 Cases") +
ylab("Cumulative Fatality Rate") +
theme_light() +
theme(plot.title = element_text(size=12),
legend.position = c(0.91,0.85),
legend.title = element_text(size=7),
legend.text = element_text(size=7),
legend.background = element_rect(colour = "grey60", linetype = "dotted"),
legend.key.height = unit(0.3, "cm"))+
theme_minimal()
p
The funnel plot created using ggplot2 functions can be made interactive with ggplotly() of plotly r package.
---
title: "Hands-on Exercise 4D: Funnel Plots for Fair Comparisons"
author: "Cai Jingheng"
date: "Jan,30,2024"
date-modified: "last-modified"
toc: true
number-sections: true
format:
html:
code-fold: true
code-tools: true
warning: false
---
## **Overview**
Funnel plot is a specially designed data visualisation for conducting unbiased comparison between outlets, stores or business entities. By the end of this hands-on exercise, you will gain hands-on experience on:
- plotting funnel plots by using **funnelPlotR** package,
- plotting static funnel plot by using ggplot2 package, and
- plotting interactive funnel plot by using both **plotly R** and **ggplot2** packages.
## **Installing and Launching R Packages**
In this exercise, four R packages will be used. They are:
- **readr** for importing csv into R.
- **FunnelPlotR** for creating funnel plot.
- **ggplot2** for creating funnel plot manually.
- **knitr** for building static html table.
- **plotly** for creating interactive funnel plot.
```{r}
pacman::p_load(tidyverse, FunnelPlotR, plotly, knitr)
```
## **Importing Data**
In this section, COVID-19_DKI_Jakarta will be used. The data was downloaded from [Open Data Covid-19 Provinsi DKI Jakarta portal](https://riwayat-file-covid-19-dki-jakarta-jakartagis.hub.arcgis.com/). For this hands-on exercise, we are going to compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan) as at 31st July 2021, DKI Jakarta.
The code chunk below imports the data into R and save it into a tibble data frame object called *covid19*.
```{r}
covid19 <- read_csv("data/COVID-19_DKI_Jakarta.csv") %>%
mutate_if(is.character, as.factor)
head(covid19,5)
```
## **FunnelPlotR methods**
[**FunnelPlotR**](https://nhs-r-community.github.io/FunnelPlotR/) package uses ggplot to generate funnel plots. It requires a `numerator` (events of interest), `denominator` (population to be considered) and `group`. The key arguments selected for customisation are:
- `limit`: plot limits (95 or 99).
- `label_outliers`: to label outliers (true or false).
- `Poisson_limits`: to add Poisson limits to the plot.
- `OD_adjust`: to add overdispersed limits to the plot.
- `xrange` and `yrange`: to specify the range to display for axes, acts like a zoom function.
- Other aesthetic components such as graph title, axis labels etc.
### **FunnelPlotR methods: The basic plot**
The code chunk below plots a funnel plot.
```{r}
funnel_plot(
numerator = covid19$Positive,
denominator = covid19$Death,
group = covid19$`Sub-district`
)
```
```
A funnel plot object with 267 points of which 0 are outliers. Plot is adjusted for overdispersion.
```
Things to learn from the code chunk above.
- `group` in this function is different from the scatterplot. Here, it defines the level of the points to be plotted i.e. Sub-district, District or City. If Cityc is chosen, there are only six data points.
- By default, `data_type`argument is "SR".
- `limit`: Plot limits, accepted values are: 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.
### **FunnelPlotR methods: Makeover 1**
The code chunk below plots a funnel plot.
```{r}
funnel_plot(
numerator = covid19$Death,
denominator = covid19$Positive,
group = covid19$`Sub-district`,
data_type = "PR", #<<
xrange = c(0, 6500), #<<
yrange = c(0, 0.05) #<<
)
```
```
A funnel plot object with 267 points of which 7 are outliers. Plot is adjusted for overdispersion.
```
Things to learn from the code chunk above. + `data_type` argument is used to change from default "SR" to "PR" (i.e. proportions). + `xrange` and `yrange` are used to set the range of x-axis and y-axis
### **FunnelPlotR methods: Makeover 2**
The code chunk below plots a funnel plot.
```{r}
funnel_plot(
numerator = covid19$Death,
denominator = covid19$Positive,
group = covid19$`Sub-district`,
data_type = "PR",
xrange = c(0, 6500),
yrange = c(0, 0.05),
label = NA,
title = "Cumulative COVID-19 Fatality Rate by Cumulative Total Number of COVID-19 Positive Cases", #<<
x_label = "Cumulative COVID-19 Positive Cases", #<<
y_label = "Cumulative Fatality Rate" #<<
)
```
```
A funnel plot object with 267 points of which 7 are outliers. Plot is adjusted for overdispersion.
```
Things to learn from the code chunk above.
- `label = NA` argument is to removed the default label outliers feature.
- `title` argument is used to add plot title.
- `x_label` and `y_label` arguments are used to add/edit x-axis and y-axis titles.
## **Funnel Plot for Fair Visual Comparison: ggplot2 methods**
In this section, you will gain hands-on experience on building funnel plots step-by-step by using ggplot2. It aims to enhance you working experience of ggplot2 to customise speciallised data visualisation like funnel plot.
### **Computing the basic derived fields**
To plot the funnel plot from scratch, we need to derive cumulative death rate and standard error of cumulative death rate.
```{r}
df <- covid19 %>%
mutate(rate = Death / Positive) %>%
mutate(rate.se = sqrt((rate*(1-rate)) / (Positive))) %>%
filter(rate > 0)
```
Next, the *fit.mean* is computed by using the code chunk below.
```{r}
fit.mean <- weighted.mean(df$rate, 1/df$rate.se^2)
```
### **Calculate lower and upper limits for 95% and 99.9% CI**
The code chunk below is used to compute the lower and upper limits for 95% confidence interval.
```{r}
number.seq <- seq(1, max(df$Positive), 1)
number.ll95 <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll999 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
dfCI <- data.frame(number.ll95, number.ul95, number.ll999,
number.ul999, number.seq, fit.mean)
```
### **Plotting a static funnel plot**
In the code chunk below, ggplot2 functions are used to plot a static funnel plot.
```{r}
p <- ggplot(df, aes(x = Positive, y = rate)) +
geom_point(aes(label=`Sub-district`),
alpha=0.4, color= "#3459e6") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll999),
size = 0.4,
colour = "grey40") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul999),
size = 0.4,
colour = "grey40") +
geom_hline(data = dfCI,
aes(yintercept = fit.mean),
size = 0.4,
colour = "grey40") +
coord_cartesian(ylim=c(0,0.05)) +
annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") +
annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") +
ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
xlab("Cumulative Number of COVID-19 Cases") +
ylab("Cumulative Fatality Rate") +
theme_light() +
theme(plot.title = element_text(size=12),
legend.position = c(0.91,0.85),
legend.title = element_text(size=7),
legend.text = element_text(size=7),
legend.background = element_rect(colour = "grey60", linetype = "dotted"),
legend.key.height = unit(0.3, "cm"))+
theme_minimal()
p
```
### **Interactive Funnel Plot: plotly + ggplot2**
The funnel plot created using ggplot2 functions can be made interactive with `ggplotly()` of **plotly** r package.
```{r}
fp_ggplotly <- ggplotly(p,
tooltip = c("label",
"x",
"y"))
fp_ggplotly
```
## **References**
- [funnelPlotR](https://nhs-r-community.github.io/FunnelPlotR/) package.
- [Funnel Plots for Indirectly-standardised ratios](https://nhs-r-community.github.io/FunnelPlotR/articles/funnel_plots.html).
- [Changing funnel plot options](https://nhs-r-community.github.io/FunnelPlotR/articles/changing_funnel_plot_options.html)
- [ggplot2](https://ggplot2.tidyverse.org/) package.