Collin K. Berke, Ph.D.
  • Home
  • About
  • Now
  • Blog
  • Today I Learned

On this page

  • Background
  • Data description
  • What’s the trend for grant funding?
  • What words are used most often within descriptions of funded projects?
  • An attempt using Tableau

Exploring R Consortium ISC Grants

data wrangling
data visualization
tidytuesday
plotly
Tableau
A contribution to the 2024-02-20 #tidytuesday social data project
Author

Collin K. Berke, Ph.D.

Published

February 26, 2024

Photo by Markus Winkler
library(tidyverse)
library(plotly)
library(skimr)
library(tidytext)
library(here)
library(scales)

Background

I’ve never really contributed to tidytuesday. Recently, I’ve been trying to spark some inspiration, so I thought contributing to this social data project would be a good start. I used this post as an opportunity to get more comfortble using plotly and Tableau for creating data visualizations.

data_isc_grants <- 
  read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-02-20/isc_grants.csv')

Data description

The data represents information about past projects funded by the R Consortium Infrastructure Committee (ISC) Grant Program. The purpose of these grants is to support projects contributing to the R community. Learn more about the most recent round of funding by checking out their blog post announcing this round of grants.

The data includes columns like: year, group (i.e., funding cycle), title, funded (i.e., funding amount), and summary. Before creating some data visualizations, let’s do some quick exploratory analysis.

glimpse(data_isc_grants)
Rows: 85
Columns: 7
$ year        <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
$ group       <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
$ title       <chr> "The future of DBI (extension 1)", "Secure TLS Communications for R", "volcalc…
$ funded      <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000, 25000, 15000, 20000…
$ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark Padgham", "Jon Harmon…
$ summary     <chr> "This proposal mostly focuses on the maintenance and support for {DBI}, the {D…
$ website     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
skim(data_isc_grants)
Data summary
Name data_isc_grants
Number of rows 85
Number of columns 7
_______________________
Column type frequency:
character 4
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
title 0 1.00 4 120 0 85 0
proposed_by 0 1.00 8 63 0 66 0
summary 0 1.00 31 2210 0 85 0
website 33 0.61 21 224 0 48 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2019.14 2.08 2016 2017 2019 2021 2023 ▇▆▇▅▅
group 0 1 1.40 0.49 1 1 1 2 2 ▇▁▁▁▅
funded 0 1 13781.14 11325.80 0 6000 10000 16000 62400 ▇▂▁▁▁

What’s the trend for grant funding?

Let’s take a look at the funding trend by funding cycle (i.e., fall and spring).

Code
data_by_year_grp <- data_isc_grants |>
  mutate(group = case_when(
    group == 1 ~ "Spring", 
    group == 2 ~ "Fall")
  ) |>
  group_by(year, group) |>
  summarise(funded = sum(funded), .groups = "drop") |>
  arrange(group, year) |> 
  pivot_wider(names_from = group, values_from = funded)
Code
plot_ly(
  data_by_year_grp, 
  x = ~year, 
  y = ~Fall, 
  name = "Fall", 
  type = 'scatter', 
  mode = 'lines',
  line = list(width = 5),
  text = ~paste(
    "Funding awarded: $", comma(Fall),
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
add_trace(
  y = ~Spring,
  name = "Spring",
  text = ~paste(
    "Funding awarded: $", comma(Spring),
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
layout(
  title = list(
    text = "<b>Funding trend for R Consortium ISC grants by funding round</b>",
    xanchor = "center",
    yanchor = "top",
    font = list(family = "arial", size = 24)
  ),
  xaxis = list(title = ""),
  yaxis = list(title = "Funding amount ($US)")
)

What words are used most often within descriptions of funded projects?

Now, let’s explore the words used within descriptions most often in awarded grant applications.

Code
data_word_fund_trend <- data_isc_grants |>
  mutate(
    summary = str_remove_all(str_to_lower(summary), "[[:punct:]]"),
    summary = str_remove_all(summary, "[0-9]"),
  ) |>
  unnest_tokens(word, summary) |>
  anti_join(get_stopwords()) |>
  group_by(year) |>
  count(word) |>
  arrange(word, year) |>
  group_by(word) |>
  mutate(
    n_cume = cumsum(n)
  )
Code
top_words <- data_word_fund_trend |>
  ungroup() |>
  summarise(top = quantile(n_cume, .99)) |>
  pull(top)

data_top_words <- data_word_fund_trend |>
  filter(n_cume >= top_words) |>
  distinct(word)

plot_ly(
  data = data_word_fund_trend, 
  x = ~year,
  y = ~n_cume,
  mode = "lines",
  line = list(color = "#d3d3d3", width = 3),
  type = "scatter",
  mode = "lines",
  name = "",
  text = ~paste(
    "Word: ", word,
    "<br>Cumulative mentions: ", n_cume,
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
add_lines(
  data = data_word_fund_trend |> semi_join(data_top_words),
  x = ~year,
  y = ~n_cume,
  line = list(color = "#0C2D48", width = 3),
  type = "scatter",
  mode = "lines",
  name = ""
) |>
layout(
  title = list(
    text = "<b>Aiming for RConsortium grant funding? Consider using these words</b>",
    xanchor = "center",
    yanchor = "top",
    font = list(family = "arial", size = 24)
  ),
  xaxis = list(title = ""),
  yaxis = list(title = "Cumulative mentions"),
  showlegend = FALSE
)

An attempt using Tableau

To learn more about using Tableau, I took this week’s data as an opportunity to learn more. Here’s what I came up with.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@misc{berke2024,
  author = {Berke, Collin K},
  title = {Exploring {R} {Consortium} {ISC} {Grants}},
  date = {2024-02-26},
  langid = {en}
}
For attribution, please cite this work as:
Berke, Collin K. 2024. “Exploring R Consortium ISC Grants.” February 26, 2024.