The objective of this project is to examine whether there exists a lasting association between exports and imports in the United States (US), as this is a crucial consideration for ensuring sustainable trade imbalances. To investigate this, the Autoregressive Distributed Lag (ARDL) modelling approach to cointegration is employed. The analysis reveals that exports and imports in the US are indeed cointegrated, with the coefficient of import on export being nearly equivalent to unity. These findings suggest that the US adheres to the weak form of its intertemporal budget constraint.
Deficit sustainability is testable from this benchmark model [1]:
\begin{equation*} EX_{t}=\alpha_{0}+\alpha_{1} IM_{t}+\varepsilon_{t}, \end{equation*}where EX, IM and e are export, import and errors, respectively.
For an economy to meet its intertemporal budget constraint, it is imperative that the error term ($\varepsilon_{t}$) be stationary, as per the necessary condition (weak form). Failure to fulfill this requirement would indicate a malfunctioning economy that cannot meet its budget constraint, ultimately leading to a debt default, according to [2].
It is worth mentioning that in a weak sense, deficit sustainability implies that exports and imports move in tandem in the long run, but the coefficient of imports on exports ($\alpha_{1}$) is less than 1. This indicates that the economy must import more than \$1 worth of goods to generate \$1 worth of exports. Assuming that the two series move together (i.e., are cointegrated), the closer the coefficient is to 1, the greater the budget's sustainability.
From the Cambridge dictionary, the meaning of "sustainable" is
1- able to continue over a period of time: That sort of extreme diet is not sustainable over a long period. Solutions put in place now must be sustainable.
2- Environment specialized causing, or made in a way that causes, little or no damage to the environment and therefore able to continue for a long time:
A large international meeting was held with the aim of promoting sustainable development in all countries. The website encourages sustainable fashion through swapping.
Let's we start getting data from Federal Reserve Economic Data (FRED):
FRED Database is an extensive compilation of macroeconomic data for the United States, as well as for an increasing number of countries worldwide. This particular illustration involves downloading a small set of macroeconomic data directly from FRED and consolidating it into a single DataFrame. The data available in FRED can be accessed in CSV format using the following url pattern: https://fred.stlouisfed.org/graph/fredgraph.csv?id=CODE, where CODE represents the series code. The dataset used in this example comprises Exports of Goods and Services (EXPGS) and Imports of Goods and Services (IMPGS).
Both of data has measured units: Billions of Dollars, Seasonally Adjusted Frequency: Quarterly
The period of the data is 1947:Q1-2022:Q4:
from pandas import read_csv
from pandas.plotting import scatter_matrix
from pandas import DataFrame
codes = ['EXPGS','IMPGS']
names = ['Exports of Goods and Services','Imports of Goods and Services']
# r to disable escape
base_url = r'https://fred.stlouisfed.org/graph/fredgraph.csv?id={code}'
The list of series is
data = []
for code in codes:
print(code)
url = base_url.format(code=code)
data.append(read_csv(url))
EXPGS IMPGS
Subsequently, the data is combined into a unified DataFrame by creating a dictionary in which the codes are the keys and the corresponding Series obtained from each DataFrame downloaded are the values. In this block, the zip function is utilized to effortlessly concatenate two lists into a singular iterable.
time_series={}
for code, d in zip(codes,data):
d.index = d.DATE
time_series[code]=d[code]
merged_data = DataFrame(time_series)
print(merged_data)
EXPGS IMPGS DATE 1947-01-01 18.394 7.519 1947-04-01 19.497 8.203 1947-07-01 19.433 7.663 1947-10-01 17.636 8.347 1948-01-01 16.917 9.624 ... ... ... 2021-10-01 2733.037 3647.745 2022-01-01 2811.225 3927.908 2022-04-01 3038.844 4074.409 2022-07-01 3064.971 3955.795 2022-10-01 3003.247 3857.366 [304 rows x 2 columns]
merged_data.tail()
EXPGS | IMPGS | |
---|---|---|
DATE | ||
2021-10-01 | 2733.037 | 3647.745 |
2022-01-01 | 2811.225 | 3927.908 |
2022-04-01 | 3038.844 | 4074.409 |
2022-07-01 | 3064.971 | 3955.795 |
2022-10-01 | 3003.247 | 3857.366 |
# Calculate logarithm to base 2
import pandas as pd
import numpy as np
data = np.log2(merged_data)
data.tail()
EXPGS | IMPGS | |
---|---|---|
DATE | ||
2021-10-01 | 11.416289 | 11.832789 |
2022-01-01 | 11.456983 | 11.939545 |
2022-04-01 | 11.569307 | 11.992375 |
2022-07-01 | 11.581658 | 11.949752 |
2022-10-01 | 11.552307 | 11.913400 |
Let's have a closer look at the series. Trends in US's exports and imports are plotted by
import matplotlib.pyplot as plt
# code
# Visualizing The Open Price of all the stocks
# to set the plot size
plt.figure(figsize=(16, 8), dpi=150)
# using plot method to plot open prices.
# in plot method we set the label and color of the curve.
data['EXPGS'].plot(label='Export')
data['IMPGS'].plot(label='Import')
# adding title to the plot
plt.title('Trends in Export and Import for US, 1947-2022')
# adding Label to the x-axis
plt.xlabel('Time')
# adding legend to the curve
plt.legend()
<matplotlib.legend.Legend at 0x1aa8c230e20>
Until 70s, export exports were more than imports. However, after 70s, the equilibrium has been changing against imports. Clearly, these two series are not covariance stationary. I also consider adf unit root test. In ADF unit root test, under the $H_{0}$, the series has a unit root.
from statsmodels.tsa.stattools import adfuller
#perform Augmented Dickey-Fuller test
resultadfExport = adfuller(data.EXPGS)
print('ADF Statistic: %f' % resultadfExport[0])
print('p-value: %f' % resultadfExport[1])
print('Critical Values:')
for key, value in resultadfExport[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic: -0.385619 p-value: 0.912472 Critical Values: 1%: -3.452 5%: -2.871 10%: -2.572
p-value is not smaller than type I error rate 0.05, so I cannot reject $H_{0}$. What about the stationarity for first differences of the series:
dEXPGS = data.EXPGS.diff()
dEXPGS=dEXPGS.dropna(axis=0)
resultadfdEXPGS = adfuller(dEXPGS)
print('ADF Statistic: %f' % resultadfdEXPGS[0])
print('p-value: %f' % resultadfdEXPGS[1])
print('Critical Values:')
for key, value in resultadfdEXPGS[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic: -9.519914 p-value: 0.000000 Critical Values: 1%: -3.452 5%: -2.871 10%: -2.572
p-value is smaller than type I error rate 0.05, so I can reject $H_{0}$. Below you see the analysis of import:
resultadfIMPGS = adfuller(data.IMPGS)
print('ADF Statistic: %f' % resultadfIMPGS[0])
print('p-value: %f' % resultadfIMPGS[1])
print('Critical Values:')
for key, value in resultadfIMPGS[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic: -2.093199 p-value: 0.247258 Critical Values: 1%: -3.453 5%: -2.872 10%: -2.572
dIMPGS = data.IMPGS.diff()
dIMPGS=dIMPGS.dropna(axis=0)
resultadfdIMPGS = adfuller(dIMPGS)
print('ADF Statistic: %f' % resultadfdIMPGS[0])
print('p-value: %f' % resultadfdIMPGS[1])
print('Critical Values:')
for key, value in resultadfdIMPGS[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic: -4.818964 p-value: 0.000050 Critical Values: 1%: -3.453 5%: -2.872 10%: -2.572
After determine the integration properties of the series let's look cointegration with ARDL approach. Residual-based tests of cointegration encounter a significant challenge in that the researcher must be certain that the underlying regressors in the model are I(1). However, due to the generally low power of unit root tests, the process of testing whether the underlying variables are I(1) may introduce a level of uncertainty into the analysis. To address this issue, an alternative approach has been proposed by [3], which involves estimating an error correction model of an ARDL form for the relevant variables. For example, if I aim to test the presence of a long-term relationship between EX and IM, and I am unsure whether these variables are I(1) or I(0), this approach may prove useful. More specifically, this approach offers a notable benefit in that it exhibits superior small-sample properties compared to both the Johansen and Juselius approach and the Engle and Granger approach.
I do not give more details of ARDL approach but you can find more info from here: https://davegiles.blogspot.com/2015/01/ardl-modelling-in-eviews-9.html.
First of all, we determine lag length of ARDL model by BIC criteria:
from statsmodels.datasets.danish_data import load
from statsmodels.tsa.api import ARDL
from statsmodels.tsa.ardl import ardl_select_order
sel_res = ardl_select_order(data.EXPGS, 4, data[["IMPGS"]], 4, ic="bic", trend="ct")
print(f"The optimal order is: {sel_res.model.ardl_order}")
The optimal order is: (3, 1)
c:\Users\P70085977\Anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq) c:\Users\P70085977\Anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq)
According to the BIC, our best model is ARDL(3,1):
ARDLres = sel_res.model.fit()
ARDLres.summary()
Dep. Variable: | EXPGS | No. Observations: | 304 |
---|---|---|---|
Model: | ARDL(3, 1) | Log Likelihood | 479.826 |
Method: | Conditional MLE | S.D. of innovations | 0.049 |
Date: | Wed, 22 Mar 2023 | AIC | -943.652 |
Time: | 15:08:48 | BIC | -913.995 |
Sample: | 10-01-1947 | HQIC | -931.784 |
- 10-01-2022 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0893 | 0.026 | 3.446 | 0.001 | 0.038 | 0.140 |
trend | 0.0003 | 0.000 | 1.659 | 0.098 | -6.49e-05 | 0.001 |
EXPGS.L1 | 0.9762 | 0.047 | 20.634 | 0.000 | 0.883 | 1.069 |
EXPGS.L2 | 0.0905 | 0.063 | 1.439 | 0.151 | -0.033 | 0.214 |
EXPGS.L3 | -0.1883 | 0.042 | -4.473 | 0.000 | -0.271 | -0.105 |
IMPGS.L0 | 0.6293 | 0.044 | 14.325 | 0.000 | 0.543 | 0.716 |
IMPGS.L1 | -0.5275 | 0.047 | -11.261 | 0.000 | -0.620 | -0.435 |
Secondly, we access unrestricted Error Correction Mechanism (UECM):
from statsmodels.tsa.api import UECM
ecm = UECM.from_ardl(sel_res.model)
ecm_res = ecm.fit()
ecm_res.summary()
c:\Users\P70085977\Anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq)
Dep. Variable: | D.EXPGS | No. Observations: | 304 |
---|---|---|---|
Model: | UECM(3, 1) | Log Likelihood | 479.826 |
Method: | Conditional MLE | S.D. of innovations | 8.343 |
Date: | Wed, 22 Mar 2023 | AIC | -943.652 |
Time: | 15:03:03 | BIC | -913.995 |
Sample: | 10-01-1947 | HQIC | -931.784 |
- 10-01-2022 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0893 | 0.026 | 3.446 | 0.001 | 0.038 | 0.140 |
trend | 0.0003 | 0.000 | 1.659 | 0.098 | -6.49e-05 | 0.001 |
EXPGS.L1 | -0.1216 | 0.018 | -6.858 | 0.000 | -0.157 | -0.087 |
IMPGS.L1 | 0.1018 | 0.016 | 6.525 | 0.000 | 0.071 | 0.132 |
D.EXPGS.L1 | 0.0978 | 0.042 | 2.309 | 0.022 | 0.014 | 0.181 |
D.EXPGS.L2 | 0.1883 | 0.042 | 4.473 | 0.000 | 0.105 | 0.271 |
D.IMPGS.L0 | 0.6293 | 0.044 | 14.325 | 0.000 | 0.543 | 0.716 |
From here, we then get cointegration vector:
ecm_res.ci_summary()
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -0.7342 | 0.169 | -4.356 | 0.000 | -1.066 | -0.402 |
trend | -0.0029 | 0.002 | -1.730 | 0.084 | -0.006 | 0.000 |
EXPGS.L1 | 1.0000 | 0 | nan | nan | 1.000 | 1.000 |
IMPGS.L1 | -0.8369 | 0.051 | -16.350 | 0.000 | -0.938 | -0.736 |
This vector means
\begin{equation*} \widehat{EX_{t-1}}=0.73+0.0029trend + 0.84 IM_{t-1}. \end{equation*}Do not worry about $t-1$, it is almost same as $t$ when $t=T$. So we now we can talk about sustainability of trade imbalances. But first we have to be aware of the validity of cointegration. Let's look at t-statistic and p-value: -16.35 and 0.000<0.05 so the system is cointegrated. Secondly, I look F-test:
ecm_res.bounds_test(case=4, cov_type='nonrobust', cov_kwds=None, use_t=True, asymptotic=True, nsim=10000, seed=None)
c:\Users\P70085977\Anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: No frequency information was provided, so inferred frequency QS-OCT will be used. self._init_dates(dates, freq)
BoundsTestResult Stat: 16.57512 Upper P-value: 8.56e-10 Lower P-value: 1.58e-10 Null: No Cointegration Alternative: Possible Cointegration
After concluding the existence of a long-term relationship, let's perform another analysis: Which form of intertemporal budget constraint valid for US: weak or strong? To do this, we must test $H_{0}:\alpha_{1}=1$. Overall, our test statistic is calculated as
\begin{equation*} t_{stat}=\frac{\widehat{\alpha_{1}}-1}{s.e.\widehat{\alpha_{1}}}. \end{equation*}t_stat=(-ecm_res.ci_params.IMPGS-1)/ecm_res.ci_bse.IMPGS
print(t_stat)
-3.1860211325857444
The number of degrees of freedom of this test is equal to (No. Observations: 304 − 2 = 308), and the 95 per cent critical value of the t-distribution with 308 degrees of freedom for a two-sided test is equal to
# Import Library
import scipy.stats
q=1-.05/2
# To find the T critical value
tc975=scipy.stats.t.ppf(q,df=308)
tc025=scipy.stats.t.ppf(1-q,df=308)
print(tc975)
print(tc025)
1.9676960046163405 -1.9676960046163405
Because of t-stat=-3.18<-1.96, I reject the null hypothesis that $\alpha_{1}=1$. Finally, I conclude that US satisfies only the weak form of its intertemporal budget constraint.
[1] Narayan, P. K., & Narayan, S. (2019). The sustainability of Fiji's budget deficit: an econometric analysis. [2] Hakkio, C. S., & Rush, M. (1991). Cointegration: how short is the long run?. Journal of international Money and Finance, 10(4), 571-581. [3] Pesaran, M. H., Shin, Y., & Smith, R. J. (2001). Bounds testing approaches to the analysis of level relationships. Journal of applied econometrics, 16(3), 289-326.