htmlwidgets - plotly

htmlwidgets - plotly #

The Bigger Picture #

In this document we learn how to create interactive charts with plotly. Simply put, we are learning how to transform tidy data into visually clear graphs. In the overall context of the workflow, this falls into the category of transforming our data into data visualisation.

 

What is plotly? #

LinkedIn Learning 3.1

library("tidyverse")
library("plotly")
  • An htmlwidget used to make interactive graphs and charts of several types
    • Bar graphs
    • Scatterplots
    • Line plots
    • Pie charts
    • Bubble charts
    • Geoplots
    • And many more
  • The package is bound to the plotly library in JavaScript
  • Plotly heavily depends on the pipeline (%>%) operator to create charts

Upgrading ggplot graphs with plotly #

LinkedIn Learning 3.2

Let’s say we have a nice graph created with ggplot2:

library("ggplot2")

load("tidy_ACORN-SAT_data/station_data.rdata")

temp_chart <- station_data %>%
  filter(Station.name == "Sydney") %>%
  ggplot(aes(x = year,
             y = average.temp)) +
  geom_point(color = "red") +
  scale_x_discrete(breaks = seq(1910, 2010, by = 10)) +
  ggtitle("Sydney's Average Yearly Temperature") +
  xlab("Year") +
  ylab("Average Temperature (°C)")

temp_chart

There is a useful command ggplotly() to automatically convert this to an interactive plotly widget:

ggplotly(temp_chart)

Try using your mouse to test the interactivity!

Creating Bar Charts #

LinkedIn Learning 3.3

Why do we use bar charts? #

  • Bar charts one useful form of quantitative data visualisation.
  • For each pair of variables, a bar represents one observation, meaning we can easily compare observations by seeing which one is taller
  • For example, it becomes easy to visualise the heights of some different coloured aliens with a bar graph!
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   name = col_character(),
##   colour = col_character(),
##   height = col_double(),
##   weight = col_double()
## )

How do we make bar charts? #

For our examples we will use the same data from the Australian Environmental-Economic Accounts (2016), now including data from 2008-2014. The data relates to water consumption by state.

load(file = "tidy_EnvAcc_data/consumption.rdata")
consumption
## # A tibble: 48 x 3
##    State year    water_consumption
##    <chr> <chr>               <dbl>
##  1 NSW   2008–09              4555
##  2 VIC   2008–09              2951
##  3 QLD   2008–09              3341
##  4 SA    2008–09              1179
##  5 WA    2008–09              1361
##  6 TAS   2008–09               466
##  7 NT    2008–09               160
##  8 ACT   2008–09                48
##  9 NSW   2009–10              4323
## 10 VIC   2009–10              2904
## # … with 38 more rows
  • We begin by piping our data into a function, plot_ly()
consumption %>%
  plot_ly()

Already our chart is “interactive”: we can use our mouse to select specific areas of the empty graph, however we have no data.

  • add_trace() is a key plotly function that configures our graphs
  • The function has literally hundreds of arguments (one can read about each of them here)
  • We can be content learning a few
consumption %>%
  group_by(year) %>%
  summarise(consumption_total = sum(water_consumption)) %>%
  plot_ly() %>%
  add_trace(x = ~year,
            y = ~consumption_total,
            type = "bar")
  • Notice that we use a tilde (“~”) when calling on columns from our data
  • Plotly requires this syntax in order to understand what variables refer to our data columns

We have made a generic bar graph based on total water consumption per year, but what if we want a more advanced bar graph?

  • We can pipe our existing graph into layout()
  • This function also has many arguments (also found here!)
  • We select arguments depending on the type of graph we are creating

Now we want a stacked bar chart, showing the breakdown of which state consumed water in a given year

  • We change the barmode layout option to “stack”
  • We add in a color argument of trace to colour stack sections by variable
consumption %>%
  plot_ly() %>%
  add_trace(x = ~year,
            y = ~water_consumption,
            type = "bar",
            color = ~State) %>%
  layout(barmode = "stack")
  • If we wish to display proportion rather than quantity, we use the layout() argument barnorm
consumption %>%
  plot_ly() %>%
  add_trace(x = ~year,
            y = ~water_consumption,
            type = "bar",
            color = ~State) %>%
  layout(barmode = "stack",
         barnorm = "percent")

Mouse over the graph to see information!

  • But what if we want a horizontal bar chart?
consumption %>%
  plot_ly() %>%
  add_trace(x = ~water_consumption,
            y = ~year,
            type = "bar",
            color = ~State) %>%
  layout(barmode = "stack",
         barnorm = "percent")

It is an exercise for the careful reader to deduce what part of the code was modified!

Creating Scatter Charts #

LinkedIn Learning 3.4

Why do we use scatter charts? #

  • Scatter charts are used to plot many data points for two variables
  • These charts often aim to show the relationship between these variables by demonstrating trends in the data

How do we make scatter charts? #

For our examples we will use data from the ABARES Agricultural Census of 2015-2016. The data relates to the average climate-adjusted productivity of all cropping farms between 1977 and 2015.

load("tidy_ABARES_data/farm_data.rdata")
head(farm_data, n=5)
## # A tibble: 5 x 4
##   year  Total.factor.productivity Climate.effect Climate.adjusted.TFP
##   <chr>                     <dbl>          <dbl>                <dbl>
## 1 1978                       95.9           89.7                 103.
## 2 1979                      113.           113.                  102.
## 3 1980                      112.           106.                  103.
## 4 1981                       84.2           92.5                 101.
## 5 1982                      104.           105.                  101.
  • When we plot our data, our code looks very similar to before
  • We set the type to “scatter”
  • We use the new mode argument to set to “markers”
    • This simply specifies that we want points on our plot
    • We could set it to “lines” or “lines+markers” if we wished to connect them
  • Here we have coloured the points by climate-adjusted TFP, however we would normally see points coloured by a categorical variable
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~Climate.effect,
            y = ~Total.factor.productivity,
            mode = "markers",
            color = ~Climate.adjusted.TFP)

Creating Line Charts #

Why do we use line charts? #

  • Line charts are like scatterplots, except our x-axis variable is usually time
  • We use line charts to measure how one variable changes over time
  • It is visually useful so we can see any patterns in the data over time

How do we make line charts? #

LinkedIn Learning 3.4

Using the same data as before, let’s make some line charts.

  • Once again our code looks very similar to before - this is the versatility of plotly
  • We note that line charts are simply scatter charts with a different mode
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~year,
            y = ~Total.factor.productivity,
            mode = "lines")
  • If we want to play around with colour, we can use the marker and line arguments
  • Each take a list of options, and we can change their colour with color this way
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~year,
            y = ~Total.factor.productivity,
            mode = "lines",
            marker = list(color = "Blue"),
            line = list(color = "Red"))

Creating Bubble Charts #

Why do we use bubble charts? #

  • Bubble charts are useful in similar ways to scatter charts, but they convey even more information
  • Bubble charts have an additional element - size - that can be used to visually represent a third variable
  • A common usage is to visualise data to do with countries or cities, with the size element reflecting population

How to we make bubble charts? #

LinkedIn Learning 3.4

  • Bubble charts are simply scatter plots with resized markers
  • Here we plot TFP against climate-adjusted TFP, with the climate effect as the size.
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~Climate.adjusted.TFP,
            y = ~Total.factor.productivity,
            mode = "markers",
            size = ~Climate.effect)

Additional options #

LinkedIn Learning 3.7

  • We can add a title with the layout option title
  • We can change the names of our axes with layout() options
    • xaxis and yaxis are layout options
    • Each of them take the value of a list, and that list contains sub-options
    • We want a list where the sub-option title is set to whatever we like
  • We can add text if we hover over our chart with the text argument of add_trace()
    • This usually involves some variable
    • We use paste() or paste0() to insert text and a variable
    • Remember the tilde (“~”)!
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~year,
            y = ~Total.factor.productivity,
            mode = "lines",
            text = ~paste0("Climate effect is: ", Climate.effect)) %>%
  layout(title = "Total Factor Productivity over Time",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Total Factor Productivity"))
  • If we hover over the data, we get our coordinates as well as what we specified
  • If we want more precise control over what is displayed, we use the hoverinfo argument
    • This allows us to say we just want the text argument active
    • Note the argument takes a vector
farm_data %>%
  plot_ly() %>%
  add_trace(type = "scatter",
            x = ~year,
            y = ~Total.factor.productivity,
            mode = "lines",
            text = ~paste0("Climate effect is: ", Climate.effect),
            hoverinfo = c("text")) %>%
  layout(title = "Total Factor Productivity over Time",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Total Factor Productivity"))

Combining Multiple Plots #

LinkedIn Learning 3.8

We can plot several graphs which share one particular axis. In this example (using farm_data) we share the x-axis, but the y-axis is just as achievable.

  1. We will create a vector of the variables we want to plot
    • They must be referred to by column name only
    • The vector must be a vector of strings
variables = c("Total.factor.productivity",
              "Climate.effect",
              "Climate.adjusted.TFP")
  1. We will create a function which plots one chart with plotly, using a generic dummy variable name (eg var1)
    • When it comes to the axis we share, we use the shared variable
    • For the other, we use as.formula(paste0("~", var1))
    • This is because we need a way of writing ~var for every variable, and then passing this into the y argument
    • We also use the name argument of add_trace() to tell the function the name of our variables. This helps lapply() decide what to put on our legend in the final plot
plot1 <- function(var1) {
  farm_data %>%
    plot_ly() %>%
    add_trace(type = "scatter",
              x = ~year,
              y = as.formula(paste0("~", var1)),
              name = paste0(var1),
              mode = "lines") %>%
    layout(xaxis = list(title = "Year"),
           yaxis = list(title = "Index"))
}
  1. We will use lapply() to create the same plot of all our variables at once
plots <- lapply(variables, plot1)
  1. We use the subplot function with very particular arguments to force these plots onto one graph
subplot(plots,
        shareX = TRUE,
        nrows = length(plots),
        titleY = TRUE)
  • It is possible to tweak nrows to easily plot graphs next to each other, but this is visually weaker

Creating Scattergeo Plots #

LinkedIn Learning 3.5

We return to our ACORN-SAT data as an example

  • To make a map, we begin with the plot_geo() function instead of plot_ly()
  • We set x to be longitude, and y to be latitude
load("tidy_ACORN-SAT_data/station_data.rdata")

station_data %>%
  filter(year == 2000) %>%
  plot_geo() %>%
  add_trace(x = ~Longitude,
            y = ~Latitude,
            color = ~average.temp)
## No scattergeo mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

Note that if we wish we may still add the argument size to our data. There is also an opacity option which takes a numeric value.

  • There are even more options with the layout() function
    • They control what our map displays
  • We use the geo argument, and once again we must format it as a list
load("tidy_ACORN-SAT_data/station_data.rdata")

station_data %>%
  filter(year == 2000) %>%
  plot_geo() %>%
  add_trace(x = ~Longitude,
            y = ~Latitude,
            color = ~average.temp) %>%
  layout(geo = list(showlakes = TRUE,
                    showrivers = TRUE,
                    showcountries = TRUE))
## No scattergeo mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

Creating Choropleths #

LinkedIn Learning 3.5

For our examples let us use a global surface temperature dataset obtained through Kaggle. We consider temperatures for just one day.

temperature_data <- read_csv("global_temp/GlobalLandTemperaturesByCountry.csv") %>%
  filter(dt == "2000-01-01")

To make a choropleth is relatively simple, but not extensively customiseable

  1. We pipe our data into plot_geo
  2. We specify how we determine location with the locationmode argument of add_trace()
  • locationmode can take the value of “ISO-3,” “USA-states” or “country names”
  • The last option is usually the most useful
  • ISO-3 is a method which codes regions by 3 letter codes
  1. We specify what column of our data contains our regional names with locations
  2. We specify which variable we wish to colour by using the z argument
temperature_data %>%
  plot_geo() %>%
  add_trace(locationmode = "country names",
            locations = ~Country,
            z = ~AverageTemperature)

What happens if we don’t specify z?

  • We’ll remove z and simply specify that color refers to temperature
temperature_data %>%
  plot_geo() %>%
  add_trace(locationmode = "country names",
            locations = ~Country,
            color = ~AverageTemperature)
## No scattergeo mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
  • We learn that plotly makes a scattergeo plot with points centred on their respective country
  • plotly can use locationmode and locations to create scattergeo plots without longitudes and latitudes!

Custom Colours #

LinkedIn Learning 3.6

Specifying custom categorical colours #

  • The LinkedIn Learning tutorial specifies a lengthy method for custom categorical colours, yet a simpler and less finicky solution exists
  • We use mutate() and factor() to strictly order how we plot our variable-to-colour (here, State)
  • We then specify our colours, in order, and the colour mapping will be automatic
consumption %>%
  mutate(State = factor(State, 
                        levels = c("NSW", "VIC", "QLD", "SA", "WA", "TAS", "NT", "ACT"))) %>%
  plot_ly() %>%
  add_trace(x = ~year,
            y = ~water_consumption,
            type = "bar",
            color = ~State,
            colors = c("Red", "Yellow", "Pink", "Green", "Purple", "Orange", "Blue", "Violet"))
  • Here, we specify the levels of our State variable (which we have made a factor)
    • The order is strictly “NSW,” “VIC,” “QLD,” “SA,” “WA,” “TAS,” “NT,” “ACT”
  • Then we colour by State using the color argument of add_trace()
  • Finally we specify the colours we wish to use in a vector using the colors argument of add_trace()
    • The order is strictly “Red,” “Yellow,” “Pink,” “Green,” “Purple,” “Orange,” “Blue,” “Violet”
    • So we have “NSW” mapped to “Red,” “VIC” mapped to “Yellow,” etc

Specifying custom continuous colours #

  • We use colorRampPalette() to define a palette
  • We then specify that this is how we determine colour
palette <- colorRampPalette(c('yellow', 'red'))(length(temperature_data$AverageTemperature))

temperature_data %>%
  plot_geo() %>%
  add_trace(locationmode = "country names",
            locations = ~Country,
            z = ~AverageTemperature,
            colors = palette)