Collin K. Berke, Ph.D.
  • Home
  • About
  • Now
  • Blog
  • Today I Learned

On this page

  • Background
  • But first, some space banjo music
  • Setup and data import
    • Use wbstats package to obtain GDP
  • Explore the data
  • Scatter plot of objects and GDP
  • Explore the correlation
  • Use simple linear regression to explore launched objects and GDP
  • Why is the USA so much further away from what is predicted?
  • Wrap up

Exploring objects launched into space and gross domestic product

data wrangling
data visualization
tidytuesday
plotly
regression
A contribution to the 2024-04-23 #tidytuesday social data project
Author

Collin K. Berke, Ph.D.

Published

May 3, 2024

Image generated using the prompt ‘boxy robot throwing a satellite into outer space from earth in a pop art style’ with the Bing Image Creator

Background

3… 2… 1… blastoff 🚀. This week’s #tidytuesday dataset focuses on annual objects launched into space by various entities.

This data is maintained by the United Nations Office for Outer Space Affairs, and it is made available via the Online Index of Objects Launched into Outer Space. Objects include things like satellites, probes, landers, crewed spacecrafts, and space station flight elements launched into Earth orbit or beyond. Although this list aims to be comprehensive, it only includes launches submitted to the UN by participating nations. In addition, joint launches count as one launch for each country (i.e., counts when examined by country may be duplicated). Initially, Our World in Data processed this data and created an annual trend for each country.

Since this data is focused on country, my interest peaked by asking the following question: what is the relationship between a country’s Gross Domestic Product (GDP), a broad indicator or a country’s economic output, and objects launched into space? To answer this question, I create a scatter plot and quantify this relationship using a simple linear regression in this post.

But first, some space banjo music

Seeing as we’re exploring objects launched into space, I felt a little music was in order. Here’s some space banjo ambient for your listening pleasure.

Deep Space Banjo🪕 - Ambient Spacefolk Chillwave by Timber Zeal

Setup and data import

library(tidyverse)
library(wbstats)
library(here)
library(skimr)
library(janitor)
library(plotly)
library(scales)
library(psych)

First, let’s import the #tidytuesday dataset. While we’re importing, I’ll also go ahead and use janitor’s clean_names() function to clean up the dataset’s variable names in one step. Here’s the code needed to do this:

data_space_objs <- read_csv(
  here(
    "blog/posts/",
    "2024-04-25-tidytuesday-2024-05-03-space-launches",
    "outer_space_objects.csv"
  )
) |>
clean_names()
Rows: 1175 Columns: 4
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, num_objects

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Use wbstats package to obtain GDP

The original dataset didn’t contain Gross Domestic Product (GDP). As such, I had to supplement it with additional data from the World Bank. The world bank makes data containing an estimate of GDP available via an API. In fact, the wbstats R package provides an intuitive interface to access data via this API. Here’s the code I used to return data from the API using the wbstats package:

# Interested in looking at:
#   * Gross Domestic Product (GDP)
wb_variables <- c(
  "gdp" = "NY.GDP.MKTP.CD"
)

data_wb <- wb_data(
  wb_variables,
  start_date = 1957, 
  end_date = 2023
) |>
select(
  code = iso3c, 
  year = date, 
  country, 
  gdp,
  starts_with("tax")
)

Explore the data

Now with the data available, let’s do some data exploration. Here I’ll use dplyr’s glimpse() function to get a sense of the data’s structure and column names.

glimpse(data_space_objs)
Rows: 1,175
Columns: 4
$ entity      <chr> "APSCO", "Algeria", "Algeria", "Algeria", "Algeria", "Angola", "Angola", "Arab…
$ code        <chr> NA, "DZA", "DZA", "DZA", "DZA", "AGO", "AGO", NA, NA, NA, NA, NA, NA, NA, NA, …
$ year        <dbl> 2023, 2002, 2010, 2016, 2017, 2017, 2022, 1985, 1992, 1996, 1999, 2006, 2008, …
$ num_objects <dbl> 1, 1, 1, 3, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 2, 1, 1, …
glimpse(data_wb)
Rows: 13,888
Columns: 4
$ code    <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW"…
$ year    <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973…
$ country <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "…
$ gdp     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Since there’s a common column between these datasets, code, we’ll do a left join to only include GDP data for countries that launched a space object. While exploring the data, though, I noticed that the data_space_objs data had entities other than countries. In addition, some of the World Bank data had NA values present in the GDP variable. Indeed, an argument could be made to apply imputation methods to address these missing values. However, I’m just going to drop any missing values to make things easy. I do this by using the drop_na() function from dplyr.

data_space_wb <- data_space_objs |>
  left_join(data_wb, by = c("year", "code")) |>
  drop_na(c(code, gdp))

With data wrangling complete, we can quickly get a sense of the shape of our data with skimr’s skim() function. What becomes immediately apparent is both the num_objects and gdp variables exhibit a distribution that is skewed to the right.

skim(data_space_wb)
Data summary
Name data_space_wb
Number of rows 871
Number of columns 6
_______________________
Column type frequency:
character 3
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
entity 0 1 4 20 0 91 0
code 0 1 3 3 0 91 0
country 0 1 4 20 0 91 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2.004220e+03 1.498000e+01 1960 1995 2008 2.017000e+03 2.02200e+03 ▁▁▃▃▇
num_objects 0 1 1.375000e+01 8.800000e+01 1 1 2 4.000000e+00 1.93900e+03 ▇▁▁▁▁
gdp 0 1 1.698805e+12 3.179395e+12 2583335724 216531050429 554212916092 1.782499e+12 2.54397e+13 ▇▁▁▁▁

We can further confirm this visually by creating some histograms. I’ll do this using R’s base hist() function.

hist(data_space_wb$num_objects)

hist(data_space_wb$gdp)

Scatter plot of objects and GDP

Let’s get a sense of the relationship between space objects launched and a country’s GDP. To do this, we’ll create a scatter plot using plotly.

vis_space_scatter <- plot_ly(
  data = data_space_wb
) |>
add_trace(
  x = ~gdp,
  y = ~num_objects,
  type = "scatter",
  mode = "markers",
  marker = list(
    color = "#006cd8",
    size = 10,
    line = list(
      color = "#00008c",
      width = 2
    )
  ),
  text = ~paste(
    "Year: ", year,
    "<br>Country: ", entity,
    "<br>Objects launched: ", comma(num_objects),
    "<br>GDP: ", comma(gdp)
  ),
  hoverinfo = "text"
) |>
plotly::layout(
  title = "<b>A country's GDP is positively related to the number of space objects launched",
  xaxis = list(title = "Gross Domestic Product (GDP)"),
  yaxis = list(
    title = "Objects launced into space", 
    range = c(0, NULL),
    tickformat = ","
  )
) 

vis_space_scatter

Given the distribution of the data, it’s challenging to see the individual values. As such, I decided to recreate the plot by log transforming both GDP and the number of objects launched into space.

plot_ly(
  data = data_space_wb
) |>
add_trace(
  x = ~log(gdp),
  y = ~log(num_objects),
  type = "scatter",
  mode = "markers",
  marker = list(
    color = "#006cd8",
    size = 10,
    line = list(
      color = "#00008c",
      width = 2
    )
  ),
  text = ~paste(
    "Year: ", year,
    "<br>Country: ", entity,
    "<br>Objects launched: ", comma(num_objects),
    "<br>GDP: ", comma(gdp)
  ),
  hoverinfo = "text"
) |>
plotly::layout(
  title = "<b>A country's GDP is positively related to the number of space objects launched",
  xaxis = list(title = "Gross Domestic Product (GDP) (logged)"),
  yaxis = list(
    title = "Objects launced into space (logged)", 
    range = c(0, NULL),
    tickformat = ","
  )
) 

Log transforming these variables now allows us to more easily view the individual values for each country.

Explore the correlation

Visual inspection points to a positive relationship between these two variables. We can use psychs’s pairs.panels() function to create a quick visualization and value quantifying this relationship.

pairs.panels(data_space_wb[c("num_objects", "gdp")])

The output provides further evidence of the presence of a positive correlation between these two variables. Now, let’s go one step further and use a simple linear regression to further explore this relationship.

Use simple linear regression to explore launched objects and GDP

Given this is a simple linear regression, I’ll use stats’ lm() function to specify the model. Given the scale of the values, I also went ahead and set the scipen option to avoid printing the output in scientific notation.

# Set the `scipen` object to avoid printing in scientific notation 
options(scipen=999)
space_gdp_mdl <- lm(num_objects ~ gdp, data = data_space_wb)

Using the space_gdp_model object, we can use summary() to output information about our model. We’ll also use this information to interpret the results.

summary(space_gdp_mdl)

Call:
lm(formula = num_objects ~ gdp, data = data_space_wb)

Residuals:
    Min      1Q  Median      3Q     Max 
-198.06   -8.62    6.39   10.77 1570.79 

Coefficients:
                        Estimate           Std. Error t value             Pr(>|t|)    
(Intercept) -11.6127829980763799   2.8486337637951760  -4.077            0.0000499 ***
gdp           0.0000000000149303   0.0000000000007906  18.885 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 74.14 on 869 degrees of freedom
Multiple R-squared:  0.291, Adjusted R-squared:  0.2902 
F-statistic: 356.6 on 1 and 869 DF,  p-value: < 0.00000000000000022

It’s interesting to see the R-squared value is .291, which is fairly large. This was kind of unexpected, given the complexities inherent within a country’s economy, how GDP results in the funding of space projects, and the technology and infrastructure needed to launch objects into space. Indeed, I was expecting a much smaller R-squared value. It’s also important to recognize this could be some statistical artifact, as there’s a wide discrepancy between countries. Some countries launched space objects in the single digits, while only a few launched hundereds or even thousands for some years.

In addition to the R-squared value, we can use the coefficients to draw further conclusions about this relationship. For instance, we can get a sense of how much GDP a country might require before launching a single object into space. Using our model estimates, it seems a country needs to have a GDP of nearly $66B dollars to begin considering launching objects into space. Another way to look at this $66B estimate is that if countries want to send more objects into space, they need to improve their GDP by this much to launch one additional object into space. Indeed, there are many factors that go into a a country’s ability to launch an object into space. However, the results from this model still give a very general estimate of the economic output a country needs to have before considering these types of projects.

Now that we have the model, we can go ahead and use predict() to append model predictions to the original data set. We can then plot those values on our original scatter plot to get a better sense of what this relationship looks like. The following code will do this for us:

data_space_wb$obj_pred <- predict(space_gdp_mdl, data_space_wb)
vis_space_scatter |> 
  add_trace(
    data = data_space_wb,
    x = ~gdp, 
    y = ~obj_pred, 
    type = "scatter", 
    mode = "lines", 
    showlegend = FALSE,
    line = list(width = 5),
    text = ~paste(
      "Prediction: ", obj_pred
    ),
    hoverinfo = "text"
  ) |>
plotly::layout(
  title = "<b>A country's GDP is positively related to the number of space objects launched",
  xaxis = list(title = "Gross Domestic Product (GDP)"),
  yaxis = list(
    title = "Objects launced into space",
    range = c(0, NULL),
    tickformat = ","
  )
) 

Why is the USA so much further away from what is predicted?

Exploring the plot, I began to question why the US doesn’t fall within what was expected from our model. My hunch is this is due to the rise in commerical space flight here in the US. In fact, here’s a couple references I came across that go into more detail about the booming commercial space industry. One such reference even goes so far to state private space flight has lead us into the fourth industrial revolution. Learn more:

  • U.S. private space launch industry is out of this world
  • How space exploration is fueling the Fourth Industrial Revolution
  • The commercial space age is here

Indeed, it’s reasonable to assume that if the US can shuttle contracts to private space companies rather than funding whole space programs to launch objects into space, then you’ll likely launch more objects than would be expected. In other words, the US government gets more bang for its buck working with commercial space companies. It’s also important to recognize that the commerical space industry makes launching objects into space more viable for companys and startups, like Varda Space Industries, who’s using space vehicles to manufacture pharmaceuticals (seriously listen to this interesting report from Marketplace).

Wrap up

In this post, we explored data representing objects launched into space from the United Nations Office for Outer Space Affairs. Specifically, we explored and found a relationship between a country’s gross domestic product and the number of objects it launches into space. This was done by creating a scatter plot and using the results from a simple linear regression. Surprisingly, it was interesting to see how the US far and away exceeded the predictions of our model. I posited and provided a few sources that attributes this result to the rise of the commercial space flight industry here in the US. I did all this while also peppering in some poorly delivered space puns, with a backdrop of some space banjo music.

I hope you enjoyed this post as much as I did writing it. This was a fun little data set. Check out the #tidytuesday GitHub repo for other fun data sets to explore.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@misc{berke2024,
  author = {Berke, Collin K},
  title = {Exploring Objects Launched into Space and Gross Domestic
    Product},
  date = {2024-05-03},
  langid = {en}
}
For attribution, please cite this work as:
Berke, Collin K. 2024. “Exploring Objects Launched into Space and Gross Domestic Product.” May 3, 2024.