Back to Research & Publication
When I first arrived here as an international student about two years ago, I was shocked to see that the shelves for many steel products in Ikea were empty. After some research, I found out that the US government has imposed tariffs on steel products coming from China because of the Trade War. Since then, I have been longing to know how the Trade War would affect our life and how it would influence the US economy. It is a very broad topic. Additionally, with the facts that most of the relevant data are not published by the government and people are more concerned about the coronavirus than everything else recently, I need to break down the topic and only focus on what is important. After several rounds of research, I found out that GDP (also known as total output in economics) is the best indicator of an economy. Some of the most relevant elements that had been affected by the Trade War are imports from China, exports to China, as well as the unemployment rate. This is because with restrictions and tariffs on imported goods from China, the US businesses are likely to have more job vacancies locally, driving down the unemployment rate as a result. It is hard to reach a conclusion on how the Trade War has affected the US economy based on limited data online and my limited time and knowledge. But it is worthwhile to see how these three elements are correlated to the GDP.
Most of the data on the research were obtained from the FRED Economic Research website. The data cleaning process was more complicated than predicted since they are all time series data. Some of them were gathered monthly while the others were gathered quarterly. Different notations and units were also used in the tables. To make further analysis easier, I combined the data and created several versions for different models.
This is an example of what a cleaned data looks like. Logarithm was taken for both import and export goods to better show the trend as it gives fewer fluctuations. Because all the US and China started to trade in 1985, all the following analysis and modeling involving import and export are based on data since then.
Otherwise, analysis and modeling are based on data since 1949.
In the graph above, the blue dots represent the trend of imports from China to the US since 1985 and the orange dots represent the trend of exports from the US to China since 1985. It is clear that they both showed an increasing trend until around 2018, they both decreased a little due to the Trade War.
Here is another graph showing the trend of the net import from China since 1985. This was done by subtracting exports to China from imports from China. The increasing trend shows that the US has been very reliant to Chinese goods and services.
Since GDP = C (consumption) + I (investment) + G (government spending) + NX (export - import), there must be a strong relationship between import, export and total output.
The graph on the left shows the increasing real GDP in blue and potential GDP in orange. The graph on the right shows the unemployment rate in blue and the natural rate of unemployment in orange. I would also like to know the relationship between them.
Unemployment gap is shown in blue and the output gap is shown in orange. Obviously, there is a negative relationship between the two. In economics, there is a model named the Okun’s Law that explains the relationship.
The Okun’s Law observed a relationship between unemployment and losses in a country’s production, where U is the unemployment rate, Un is the natural rate of unemployment, Y is the real GDP, Yp is the potential output, and c is the constant which varies across countries. In economics, Okun's law is an empirically observed relationship between unemployment and losses in a country's production. The "gap version" states that for every 1% increase in the unemployment rate, a country's GDP will be roughly an additional 2% lower than its potential GDP.
The simplest way to find out what the constant c is to divide all the differences between U and Un by the difference between all Y and Yp. There should be a negative relationship between them. There is a significantly negative number in 2018.
The mean of this result is also indeed a negative number.
But after plotting for both sets of data, some outliers show up, indicating that the previous result of c could be largely affected by those outliers. A better model needs to be built in order to investigate the relationship between output gap and unemployment gap.
The adjusted line in this plot shows that there is a somewhat linear relationship between output gap and unemployment gap. To see if a linear regression model is applicable, more tests are needed.
Two density plots are drawn based on both sets of data. Both of them are approximately normally distributed, which furthermore indicates that there is a probability that the relationship between them is linear.
A very strong correlation of -0.869 reinforces the linear model assumption.
Using the lm function in R, a linear function of
Output Gap = -1.2937 * Unemployment Gap - 0.2039
is modelled.
This model seems to be very nice. The p-value for the intercept has statistical significance in T test at 0.001 level. Standard errors for both the intercept and the coefficient are also very small. The p-value for the F test is statistically significant at 0.05 level. The R-squared value of 0.7563 is quite high, indicating that the regression models well and fits the data.
To further examine the prediction accuracy of the model, I separated the data into training and test samples, and generated the following diagnostic measures.
All the measures have values very similar to those of the original regression model and the result shows that it has a correlation accuracy of about 88.2%, which is very high. This also indicates that the actuals and predicted values have similar directional movement. Hence, the regression model is quite accurate.
The k-fold cross validation also provides some insight into the accuracy of the prediction model. Split the data into ‘k’ mutually exclusive random sample portions. Keeping each portion as test data, I built the model on the remaining (k-1 portion) data and calculated the mean squared error of the predictions. This was done for each of the ‘k’ random sample portions. Then finally, the average of these mean squared errors (for ‘k’ portions) was computed. This metric can be used to compare different linear models.
All the lines on the graph are very close to one another and parallel. Also, the symbols of the same color are not over dispersed. This validation further proves the validity of the prediction model.
total output = -232 * unemployment rate + 0.2659 * import from china - 0.2656 * export to china + 11186.4007
The p-values for all the coefficients and the intercept have very little statistical significance in T-test. And the standard errors for them vary a lot. The p-value for the F test is statistically significant at 0.05 level. But the R-squared value of 0.9179 is quite high.
To further examine the prediction accuracy of the model, I separated the data into training and test samples, and generated the following diagnostic measures.
Similar to the result in model 1, all the measures have values very similar to those of the original regression model and the result shows that it has a correlation accuracy of about 96.3%, which is even higher than that of the previous model. But the accuracy of this model still remains unclear.
The k-fold cross validation result shows that all the symbols with different colors vary a lot around the lines, indicating that although this model has some significance in statistics, it may not be a very accurate one.
The R-squared value increased between model 1 and model 2. But this may be due to the fact that model 2 has more predictors. So, the increase in adjusted R-squared values may be more determinant in comparison.
Model 2 predicts the relationship between total output and unemployment rate, import and export to China, while model 1 only predicts the relationship between output gap and unemployment gap. Both T-test and k-fold cross validation shows that model 1 is a better prediction model, even though in practice, model 2 may be more useful and easier to be computed. The reason why model 2 seems to be less accurate is that there is some degree of correlation between all the predictors (unemployment rate, export and import to China), leading to multicollinearity and leading to useless T-test results.
Although model 2 has more relevant predictors and measures a more direct relationship between total GDP and the factors, model 1 has more statistically significant evidence proving its accuracy. Therefore, the prediction model of Output Gap = -1.2937 * Unemployment Gap - 0.2039 better fits the data.
As predicted by the model, there is a negative relationship between unemployment rate and output, a positive relationship between import and total output, and a negative relationship between export and GDP. Although total trade between the US and China decreased in quantity and value, the US economy has not been hugely affected since the US is also trading with many other countries in the world for substitutes. The unemployment rate in the US also declined, indicating that more job vacancies have been created domestically since the start of the trade war and simulated the economic performance.
Prabhakaran, Selva. “Linear Regression”.
http://r-statistics.co/Linear-Regression.html
Office of the United States Trade Representative. “The People’s Republic of China”.
https://ustr.gov/countries-regions/china-mongolia-taiwan/peoples-republic-china
Federal Reserve Economic Data. “Trade between the US. and China: Steady as she goes?”
https://fredblog.stlouisfed.org/2020/02/trade-between-the-u-s-and-china-steady-as-she-goes/?utm_source=series_page&utm_medium=related_content&utm_term=related_resources&utm_campaign=fredblog
Spring 2020
Simin Na