Will higher public spending for family lead to higher fertility rate?

economics
R
Media
Author

Mitsuo Shiota

Published

April 27, 2023

Modified

June 14, 2023

Code
library(tidyverse)
library(lubridate)

theme_set(theme_light())

library(OECD)
library(countrycode)

library(janitor)
library(ggrepel)

Assertions in the Economic and Fiscal Advisory Council material

The headline of Asahi Shinbun article on April 27, 2023 (Japanese) caught my eye, which is “5 trillion increase in public expenditure for family is estimated to increase population by 0.9 to 1.8 million persons in 2060.” This article let me know the estimate on page 3 titled “Possible utilization of EBPM to analyze policy effects” in the Economic and Fiscal Advisory Council material presented by 4 private-sector members on April 26, 2023 (Japanese).

It shows the figure below, and warns us that correlation is not causality in the footnote. Title is “Correlation between public expenditure for family and fertility rate in OECD countries.” X-axis is percent ratio of public expenditure for family to GDP in 2017, and Y-axis is fertility rate in 2021. Red point is Japan. I find 34 points in the figure.

Quoted figure titled correlation between public expenditure for family and fertility rate

Figure 1: Quoted figure titled correlation between public expenditure for family and fertility rate

Then, they make a causal statement, “1 percent of GDP (5 trillion yen) increase in public expenditure for family leads to 0.05 to 0.1 increase in fertility rate. This estimate is based on 30 OECD countries data since 2010”.

I have got confused. Do they think this figure, which does not show confidence interval, suggests correlation? And, even if it does, do they think they can morph correlation into causality by more samples from multiple years? So I have decided to get data from OECD.stat.

Replicate the quoted figure

Code
# OECD 38 countries
oecd_countries <- c("ISR", "LUX", "NLD", "GRC", "ISL", "LVA", "SVN", "FRA", "POL", "TUR", "ESP", "BEL", "MEX", "IRL", "CHE", "FIN", "CHL", "CAN", "KOR", "NZL", "AUS", "HUN", "USA", "GBR", "AUT", "SWE", "DNK", "SVK", "CZE", "COL", "JPN", "NOR","CRI", "LTU", "DEU", "EST", "PRT", "ITA")

# "OAVG" OECD average
family <- get_dataset(
  "FAMILY",
  filter = list(c(oecd_countries, "OAVG"), "TOTAL", c("FAM1", "FAM3")),
  start_time = 1990, end_time = 2021
  ) |> 
  clean_names() |> 
  mutate(
    obs_value = parse_number(obs_value),
    time = parse_number(time)
    )

# "OECD" OECD average
family_expenditure <- get_dataset(
  "SOCX_AGG",
  filter = list("10", "5", "0", "0", "PCT_GDP", c(oecd_countries, "OECD")),
  start_time = 1980, end_time = 2021
  ) |> 
  clean_names() |> 
  mutate(
    obs_value = if_else(obs_value == "NaN", NA, obs_value),
    obs_value = parse_number(obs_value),
    time = parse_number(time)
    )

# fertility rate vs public family expenditure; point
fer_vs_exp <- family |> 
  filter(ind == "FAM1", time == 2021) |> 
  select(cou, fertility = obs_value) |> 
  mutate(cou = if_else(cou == "OAVG", "OECD", cou)) |> 
  left_join(family_expenditure |> 
              filter(time == 2017) |> 
              select(cou = country, fam_exp = obs_value),
            by = "cou") |> 
  mutate(
    country = if_else(cou == "OECD",
                      "OECD average",
                      countrycode(cou, "iso3c", "country.name"))
  )

fer_vs_exp2 <- fer_vs_exp |>
  filter(cou != "OECD")

I plot the replicated chart below with confidence interval. There are 38 points, as this chart includes all 38 OECD countries. It is uncertain whether the slope is positive or negative.

Code
fer_vs_exp2 |> 
  ggplot(aes(fam_exp, fertility)) +
  geom_point(aes(color = cou == "JPN"), show.legend = FALSE) +
  geom_smooth(method = "lm", formula = y ~ x) +
  geom_text_repel(aes(color = cou == "JPN", label = country),
                  show.legend = FALSE) +
  scale_color_manual(values = c("gray50", "red")) +
  coord_cartesian(xlim = c(0, 4), ylim = c(0, 3)) +
  labs(x = "Ratio of public expenditure for family to GDP\n(percent, 2017)",
       y = "Fertility rate\n(percent, 2021)",
       title = "No clear correlation between public expenditure for family\nand fertility rate in one year",
       caption = "Source: OECD.stat")

Figure 2: No clear correlation in one year

It is difficult to say there is a positive or negative correlation.

Code
fer_exp_model <- lm(fertility ~ fam_exp, data = fer_vs_exp2)
summary(fer_exp_model)

Call:
lm(formula = fertility ~ fam_exp, data = fer_vs_exp2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.70589 -0.13334 -0.02066  0.09695  1.41083 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.44241    0.14142  10.200 3.65e-12 ***
fam_exp      0.06662    0.06362   1.047    0.302    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3104 on 36 degrees of freedom
Multiple R-squared:  0.02956,   Adjusted R-squared:  0.002604 
F-statistic: 1.097 on 1 and 36 DF,  p-value: 0.302

Narrower confidence interval from multiple years data?

I turn to multiple years data from 2010 to 2021 for fertility rate, and 2006 to 2017 for public expenditure for family, as I guess they assume 4 year ago public expenditure for family effects fertility rate this year. Confidence interval is now much narrower, as the sample size is larger.

Code
family_expenditure_4yr_ago <- family_expenditure |>
  select(cou = country, time, fam_exp = obs_value) |> 
  mutate(time = time + 4)

fer_vs_exp_multi <- family |> 
  filter(ind == "FAM1") |> 
  select(cou, time, fertility = obs_value) |> 
  mutate(cou = if_else(cou == "OAVG", "OECD", cou)) |> 
  left_join(family_expenditure_4yr_ago,
            by = c("cou", "time")) |> 
  mutate(
    country = if_else(cou == "OECD",
                      "OECD average",
                      countrycode(cou, "iso3c", "country.name"))
    )

fer_vs_exp_multi2 <- fer_vs_exp_multi |>
  filter(cou != "OECD", time >= 2010, time <= 2021)

fer_vs_exp_multi2 |> 
  ggplot(aes(fam_exp, fertility)) +
  geom_path(aes(color = country)) +
  geom_smooth(method = "lm", formula = y ~ x) +
  coord_cartesian(xlim = c(0, 4), ylim = c(0, 3)) +
  labs(x = "Ratio of public expenditure for family to GDP\n(percent, 4 years ago)",
       y = "Fertility rate\n(percent, current year)",
       color = NULL,
       title = "Apparent positive correlation between public expenditure for family and\nfertility rate from 2010-2021 data. Is it real?",
       subtitle = "Each colored line is a country.",
       caption = "Source: OECD.stat") +
  theme(legend.position = "none")

Figure 3: Apparent positive correlation in 2010-2021
Code
fer_exp_multi_model <- lm(fertility ~ fam_exp, data = fer_vs_exp_multi2)

The slope is estimated as 0.047, which is close to their assertion of 0.05 to 0.1 based on 30 countries. P-value is less than 0.01.

Code
summary(fer_exp_multi_model)

Call:
lm(formula = fertility ~ fam_exp, data = fer_vs_exp_multi2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.79849 -0.20412 -0.05127  0.12143  1.46186 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.55612    0.03751   41.48  < 2e-16 ***
fam_exp      0.04748    0.01649    2.88  0.00416 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3288 on 445 degrees of freedom
  (9 observations deleted due to missingness)
Multiple R-squared:  0.0183,    Adjusted R-squared:  0.0161 
F-statistic: 8.296 on 1 and 445 DF,  p-value: 0.004164

It’s a bad idea to treat cross-country multiple-year data as independent observations

Should I agree with them? Absolutely not. Each colored line does not show a positive slope. The positive slope of the fitted line comes mainly from cross-country differences. And there is no positive correlation between public spending for family and fertility rate either in Japan or OECD average in the figure below.

Code
oecd_jpn <- fer_vs_exp_multi |> 
  filter(cou %in% c("OECD", "JPN")) |> 
  filter(!is.na(fam_exp), !is.na(fertility))

oecd_jpn |> 
  ggplot(aes(fam_exp, fertility)) +
  geom_path(aes(color = country)) +
  geom_label_repel(aes(label = time),
             data = oecd_jpn |> 
               filter(time %in% c(1990, 2021))) +
  coord_cartesian(xlim = c(0, 4), ylim = c(0, 3)) +
  labs(x = "Ratio of public expenditure for family to GDP\n(percent, 4 years ago)",
       y = "Fertility rate\n(percent, current year)",
       color = NULL,
       title = "No positive correlation over years between public spending for\nfamily and fertility rate either in Japan or OECD average",
       caption = "Source: OECD.stat") +
  theme(legend.position = "top")

Figure 4: 1990-2021 in Japan and OECD average

Causality does not come from data

Still they can argue fertility rate would have fallen deeper without an increase in public expenditure for family. This counterfactual argument should be based on some assumed causal mechanism. Increasing the number of samples does not morph correlation into causality, anyway. Do not try to increase the number of samples in a weird way.