Collin K. Berke, Ph.D.
  • Home
  • About
  • Now
  • Blog
  • Today I Learned

On this page

  • Background
  • Data description
  • Data wrangling
  • Creating a Box and Whisker plot
  • An attempt using Tableau

Exploring data from the Fiscal Sponsor Directory

data wrangling
data visualization
tidytuesday
plotly
Tableau
A contribution to the 2024-03-12 #tidytuesday social data project
Author

Collin K. Berke, Ph.D.

Published

March 22, 2024

Image generated using the prompt ‘Hexagonal pattern in a grey palette’ with the Bing Image Creator

Background

I’m a little behind on this submission. My time to focus on #tidytuesday contributions has been limited recently. Nevertheless, here’s my submission for the 2024-03-12 data set.

This week’s data comes from the Fiscal Sponsor Directory. In short, this directory is a listing of groups supporting non-profits through the fiscal sponsorship of projects. I was unfamilar with this space, so I found the Fiscal Sponsor Directory’s About Us page helpful.

Why the Fiscal Sponsorship Directory this week? Well, the organizer of #tidytuesday is the R4DS Online Learning Community, a group I actively participate in. This group has been on the search for a new fiscal sponsor recently. The aim, thus, was to lean on the community to create data visualizations that may be helpful in identifying another fiscal sponsor for the group. So, below is what I came up with.

Before getting to my contribution, let’s take a moment to explore the data.

library(tidyverse)
library(here)
library(skimr)
library(tidytext)
library(plotly)
data_sponsor_dir <- read_csv(
  here(
    "blog", 
    "posts",
    "2024-03-22-tidytuesday-2024-03-12-fiscal-sponsor-directory",
    "fiscal_sponsor_directory.csv"
  )
)
Rows: 370 Columns: 12
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): details_url, name, website, fiscal_sponsorship_fee_description, eligibility_criteria, p...
dbl (3): year_501c3, year_fiscal_sponsor, n_sponsored

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data description

This week was tough. The data was pretty dirty, and I relied heavily on string processing for data wrangling. I think, however, I’ve come up with something that is a little more informative than just a listing of the different sponsorship groups.

glimpse(data_sponsor_dir)
Rows: 370
Columns: 12
$ details_url                        <chr> "https://fiscalsponsordirectory.org/?page_id=599", "htt…
$ name                               <chr> "1st Note Music Foundation", "50CAN, Inc.", "The Abunda…
$ website                            <chr> "www.1stnote.org", "50can.org", "abundancenc.org", "acc…
$ year_501c3                         <dbl> 2012, 2011, 2006, 2014, 2007, 1992, 2008, 2002, 1989, 1…
$ year_fiscal_sponsor                <dbl> 2012, 2016, 2007, 2017, 2013, 1997, 2009, 2018, 2004, 1…
$ n_sponsored                        <dbl> 2, 10, 20, 6, 2, 1, NA, 7, 1, 15, 130, 60, 5, 13, 20, 1…
$ fiscal_sponsorship_fee_description <chr> "We charge a 7% administrative fee for most grants and …
$ eligibility_criteria               <chr> "Type of service: Music related projects", "Aligned mis…
$ project_types                      <chr> "Arts and culture: Music Instruments to kids", "Educati…
$ services                           <chr> "Auditing: Grants", "Auditing|Bill paying|Bookkeeping/a…
$ fiscal_sponsorship_model           <chr> "Model C, Preapproved Grant Relationship", "Model A, Di…
$ description                        <chr> "1st Note Music Foundation Inc. is a nonprofit public s…
skim(data_sponsor_dir)
Data summary
Name data_sponsor_dir
Number of rows 370
Number of columns 12
_______________________
Column type frequency:
character 9
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
details_url 0 1.00 47 49 0 370 0
name 0 1.00 3 64 0 368 0
website 7 0.98 7 70 0 362 0
fiscal_sponsorship_fee_description 20 0.95 2 901 0 324 0
eligibility_criteria 7 0.98 21 2039 0 303 0
project_types 9 0.98 9 1244 0 324 0
services 14 0.96 9 1852 0 307 0
fiscal_sponsorship_model 86 0.77 7 499 0 64 0
description 31 0.92 70 1665 0 337 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year_501c3 6 0.98 1997.62 17.18 1903 1985.75 2001 2012 2022 ▁▁▂▅▇
year_fiscal_sponsor 15 0.96 2005.15 13.68 1957 1998.00 2009 2016 2023 ▁▂▂▅▇
n_sponsored 13 0.96 42.68 90.87 0 4.00 12 45 850 ▇▁▁▁▁
head(data_sponsor_dir$project_types)
[1] "Arts and culture: Music Instruments to kids"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[2] "Education"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[3] "Children, youth and families|Economic development|Education|Environment/sustainable growth"                                                                                                                                                                                                                                                                                                                                                                                                                                                     
[4] "Arts and culture|Children, youth and families|Education|Festivals and events|Health|GBTQ|Mental health|Open space/Parks|People or communities of color/minorities| Social services|Other: Do you have a project idea that increases access to music and recreation? Our mission is to foster connection and understanding through pleasurable experiences like music and recreation to inspire the creation of safe, inclusive, equitable communities."                                                                                         
[5] "Arts and culture|Children, youth and families: Mentor Develop Programs, Youth Empowerment Programs, Grant-funded Programs. Family Empowerment and Educational Programs.|Education: Student Success Strategy Programs; Adult Re-Entry Into Education Projects.|Festivals and events: African Diaspora History Festivals and Events|People or communities of color/minorities: Diversity Impact Programs. African-American and African Diaspora Immigrant Projects|Youth development: Youth Empowerment, Job Readiness, Early Career Development,"
[6] "Arts and culture|Children, youth and families|Disaster relief|Education|Festivals and events|People or communities of color/minorities|Women|Youth development"                                                                                                                                                                                                                                                                                                                                                                                 

Data wrangling

The first step in the data wrangling process was to clean up the string data in the project_types column. I wanted to use this as a dimension to filter out fiscal sponsor potentially relevant to the the R4DS community. Take note, I used a regular expression to remove string values after the other in the column. These free text responses would have made it harder to filter data on this dimension.

# Remove any string text after 'other' 
data_sponsor_dir <- data_sponsor_dir |>
  select(
    details_url, 
    name, 
    year_fiscal_sponsor,
    n_sponsored,
    project_types,
    website
  ) |>
  mutate(
    project_types = str_to_lower(project_types),
    project_types = str_remove(project_types, ":.*")
  ) 

The next step was to tokenize project_types’ categories into it’s own rows. I did this by using the unnest_tokens() function from the tidytext package.

# What are the unique categories?
data_sponsor_cat <- data_sponsor_dir |>
  unnest_tokens(cat, project_types, token = 'regex', pattern = "\\|") 

head(unique(data_sponsor_cat$cat), n = 10)
 [1] "arts and culture"               "education"                     
 [3] "children, youth and families"   "economic development"          
 [5] "environment/sustainable growth" "festivals and events"          
 [7] "health"                         "gbtq"                          
 [9] "mental health"                  "open space/parks"              
Note

While wrangling this data, I learned you can pass a regex pattern to tokenize when using unnest_tokens(). I found this to be useful in this case.

Outputted above are 10 example categories found in the project_types column. Looking through all of these categories, the following might be fiscal sponsors whose purpose aligns with that of the R4DS community. These include the following categories:

  • conference/event management
  • education environment/sustainable growth
  • education
  • economic development
  • organizational development

However, before I go about creating a visualization using these categories, let’s see if there’s enough data within each category to make the visualization informative.

cat_filter <- c(
  "education",
  "education environment/sustainable growth",
  "economic development",
  "conference/event management",
  "organizational development"
)

data_sponsor_cat |>
  filter(cat %in% cat_filter) |>
  count(cat, sort = TRUE)
# A tibble: 5 × 2
  cat                                          n
  <chr>                                    <int>
1 education                                  186
2 economic development                       144
3 organizational development                   6
4 conference/event management                  1
5 education environment/sustainable growth     1

Indeed, some categories don’t have enough data. Really, the only two categories worth plotting would be ‘economic development’ and ‘education’. So, let’s filter for just these two categories. Let’s also drop NA values for simplicity sake.

cat_filter <- c(
  "education",
  "economic development"
)

data_cat_filter <- data_sponsor_cat |>
  filter(cat %in% cat_filter) |>
  mutate(cat = str_to_title(cat)) |>
  arrange(name, cat) |>
  drop_na() |>
  select(-details_url)

Creating a Box and Whisker plot

Given I had a numeric variable, n_sponsored, I thought a Box and Whisker plot split by the two categories would be informative. It would certainly help identify fiscal sponsors who support many or very little projects based on the types of projects they support. Another thing I had to do was log the n_sponsored column. When I first plotted the untransformed variables, it was challenging to see the distribution of values. Logging n_sponsored made it easier to see the values. However, the hover tool provides the untransformed value for each fiscal sponsor in the data set.

Warning

There will be duplicates in this visualization, as some sponsors will support both education and economic development focused projects.

plot_ly(type = "box") |>
  add_boxplot(
    data = data_cat_filter |> filter(cat == "Education"),
    x = ~log(n_sponsored),
    y = ~cat,
    boxpoints = "all",
    name = "Education",
    color = I("#189AB4"),
    marker = list(color = "#189AB4"),
    line = list(color = "#000000"),
    text = ~paste(
      "Sponsor: ", name,
      "<br>Projects: ", n_sponsored,
      "<br>Website: ", website
    ),
    hoverinfo = "text"
  ) |>
  add_boxplot(
    data = data_cat_filter |> 
      filter(cat == "Economic Development"),
    x = ~log(n_sponsored),
    y = ~cat,
    boxpoints = "all",
    name = "Economic development",
    color = I("#191970"),
    marker = list(color = "#191970"),
    line = list(color = "#000000"),
    text = ~paste(
      "Sponsor: ", name,
      "<br>Projects: ", n_sponsored,
      "<br>Website: ", website
    ),
    hoverinfo = "text"
  ) |>
layout(
  title = "<b>Distribution of the number of projects (logged) supported by fiscal sponsors",
  yaxis = list(title = ""),
  xaxis = list(title = "Projects sponsored on log scale")
)
Warning: Can't display both discrete & non-discrete data on same axis

Not bad. The only thing I ran out of time on was related to the hover tool. I really wanted separate hovers, one for the five number summary in the box and whisker plot and one for the individual data points. Unfortunately, I wasn’t able to figure out how to do this with the time I had. Oh well, what resulted was still a useful data visualization, given where we started with the data.

So there you have it. Not the cleanest data to work with. Nonetheless, we came up with a visualization we could still learn something from.

An attempt using Tableau

To continue developing my skills and to practice using other data visualization tools, I created this same visualization using Tableau. You can check out this version of the visualization here.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@misc{berke2024,
  author = {Berke, Collin K},
  title = {Exploring Data from the {Fiscal} {Sponsor} {Directory}},
  date = {2024-03-22},
  langid = {en}
}
For attribution, please cite this work as:
Berke, Collin K. 2024. “Exploring Data from the Fiscal Sponsor Directory.” March 22, 2024.