How to demean your panel data?

Share:

When you are working with panel data, many methodologies require your data to be cross-sectionally independent. However, almost always your raw data will have cross sectional dependence. Unfortunately, many packages and procedures for panel data analysis in R, Stata, Eviews and Gretl have not incorporated the option of automatically incorporate the cross sectionally demeaning the data (“xtunitroots” package in Stata and some functions in the “plm” package in R are among the exceptions). Therefore, you need to demean by yourself.

You can cross-sectionally demean your data in several ways, I just want to share here a R code to do it which I really believe that this is strightforward and easy to understand for practitioners with very basic programming skills or who are new in R. By the way, For this, we will be using a panel with data from 1985-2010 where the cross sections are the Cuban provinces. The panel includes the total investment in million pesos in the province. The data can be downloaded here. By the way, investment is one of the very few variables for which the Cuban Statistical Office publishes regional data for “long enough” time. By “long enough”, I mean that the time dimension allows for applying panel unit roots.

Back to business we start by loading the R packages and the data

library(dplyr)
library(tidyr)
library(plm)
investment <- read.csv("~/Desktop/panels/investment_panel.csv")

Now we obtain the means of each year

investment <- investment %>% group_by(year) %>% mutate(across(investment, ~ mean(.), .names = "{col}_mean"))

If you want to demean several variables, lets say you have investment, population, and gdp, the line would be,

investment <- investment %>% group_by(year) %>% mutate(across(c(investment,population,gdp), ~ mean(.), .names = "{col}_mean"))

And finally, we substract these means from the values of investment to obtain the demeaned investment series (investment_cross).

investment$investment_cross <- investment$investment - investment$investment_mean
investment <- as.data.frame(investment)

Now we transform the data to make it a pdata.frame so we can use the “cipstest” function of the “plm” package which contains several procedures for panel data.

invest <- pdata.frame(investment, drop.index = TRUE, index = c("province", "year"))

We did this because we will see if we demeaned the data corretly. If we did a good job, the result of the CIPS unit root test (introduced in Pesaran (2007) using the not demeaned data (investment) with the demeaned option (model = “dmg”) as you can see here,

cipstest(invest$investment, lags = 2, type = "drift", model = "dmg")

should yield the same result as if we use the demeaned data (investment_cross) with the not demeaned option (model = “mg”)

cipstest(invest$investment_cross, lags = 2, type = "drift", model = "mg")

As usually, I uploaded a YouTuBe videos in English and Spanish explaining the code an ilustrating that the results are the same. The resolution of both videos is not good ad the beginning but after it improves. Finally, you can download the R code here.