Code to account for autocorrelation in ADF unit root tests

In my undergraduate studies, I learnt that the unit root tests had low power. What is worse, if there is serial correlation in the specification of the Augmented Dickey Fuller (ADF) test, the result may be biased. As the title of this post indicates, I will talk about how the ADF test is done in, as far as I know, all the statistical software. At the end of this post you can download a code that will allow you to correct the residual serial correlation in the ADF tests results.

We know the general formula of the ADF test:

Y_t = α + tβ + θY_t-1 + π₁Y_t-1 + π₂Y_t-2 + … + π_pY_t-p + ε_t

I will not discuss long about it. What matters for this post is that the lagged differences of Y appear in the formula to correct the residual correlation (it will bias the test). Also, the error term should be homoskedastic. By adding lagged differences to the formula, the power of the test gets lower. This is a trade off, but Monte Carlo simulations show that it is better to correct the serial correlation.

I performed ADF tests in Eviews during years and I wondered why the output of the test only showed the Durbin Watson statistic. To be sure about the independece of the errors, I conducted Breusch Godfrey tests. When I migrated to R and used ur.df, the problem persisted.

What do all the statistical software that I know do?

They both select p through the Akaike Information Criteria (AIC) or another criterion.

This procedure corrected the residual correlation in most cases. However, sometimes the Breusch Godfrey tests showed huge first, second, third or fourth order correlation (in yearly series). In this cases, I used to add lagged differences until getting non significant p-values in the Breusch Godfrey tests or until a number of lags that would compromise the reliability of the ADF test.

“How many observations are available?” and “How long is the time span of the series?” are important questions. In a small sample, the ADF test will be affected by the size and by the serial correlation when I don’t correct it. But the lack of observations will allow me to add just few lagged differences.

As a recurrent practice in my posts, I will try to keep the codes as simple to understand as I can (or better, I think I can). My main purpose is that anyone that want to start learning coding, this codes can help them to improve their coding skills. In fact, this is one of the easiest codes that I will post in this blog. But even if you are not able to understand the code, you can work with it for a while and get familiar and confident with it. On the other hand, the readers with coding experience, can easily check by them selves what the code is doing.

Getting ready to use the code

In order to use the code, you only require extremely basic knowledge of R. In fact, you only need to know how to load the data in R (it should be a vector or a matrix of one column, the series that you want to apply the unit root test) and also to install the R packages aod and vars. How to do this?

In Windows or Mac go to the menu of RStudio and click on Tools and after on Install Packages… Then, in “Packages (separate multiple with space or comma):” write “urca” (without the quotes) and RStudio will install the urca package.
Just in case, go to the low-right side panel of the screen of RStudio and click in the Packages window. Then, search there the urca package and verify that it is checkmarked.
To load your data go the menu of RStudio and click on File and after on Import Dataset.
Very probably your data will be imported as data.frame. Imagine that you called mydata the data.frame that you imported in R. To make it a matrix you only need to write the following command in the R console:
1. mydata<-as.matrix(mydata)

Explanation of the code

Back to my long lasting “concerns” about the serial correlation, 2 days ago I made my life easier. I wrote a code. It shows the output of the ADF test and the result of the Breusch Godfrey test for residual correlation up to the desired order.

To be more specific, I wrote a code that given a time series for which you set:

The maximum lag (up to which lag the AIC or other criterion will find p).
The information criterion that will choose p.
Up to which order of serial correlation you want to get the result.
At which significance level you want to check for residual correlation.
Up to how many lags you consider “reasonable” to augment the ADF formula in order to correct serial correlation. This number is not the same of the mentioned in the first point.

I called the function adfnocorr as you can see here:

adfnocorr<-function(x, type = “drift“, lags = 7, selectlags = “AIC“, order =5, order.by = NULL, q = 10, pvalu = 0.01)

where the arguments are:

x, the database with the series which you want to perform the unit root tests. It should contain only one series.
type, lags, selectlags are the arguments of the ur.df function of the urca R package (you can see the documentation of this package if you need more information).
order and order.by are arguments of the bgtest R function. Here, order accounts for upto wich order of autocorrelation you want to account in the ADF test.
q is the maximun lag that you are willing to reach when trying to correct the autocorrelation. You should set q larger that lags.
pvalu is the significance level at which you want to perform the Breusch Godfrey test

You will get the ADF output (ur.df is used in the code, then you will get exactly the output of ur.df) with the number of lags that will correct the residual correlation, and the probability associated with the Breusch Godfrey test statistics from the first order until the order you set in the 3rd point.

Final thoughts

This code has an additional benefit. When I perform unit root tests with the ur.df function in R, sometimes I forget to delete the NAs in the series and the ur.df function asked me to remove the NAs. By using this code, you will not get the error message.

Now, You can download the code under this link.

You also can see a video explaining how to use the code here.

Please, comment if you find any issue in the code or to share your ideas about all this.

The ADF test and the issue of residual correlation: a recipe to deal with this.

What do all the statistical software that I know do?

Getting ready to use the code

Explanation of the code

Final thoughts

Related