Introduction to Econometrics coursework For the assignment I will examine whether or not a linear regression model is suitable for estimating the relationship between Human development index (HDI) and its components. Linear Regression is a statistical technique that correlates the change in a variable to other variable/s, the representation of the relationship is called the linear regression model. Variables are measurements of occurrences of a recurring event taken at regular intervals or measurements of different instances of similar events that can take on different possible values.A dependent variable is a variable whose value depends on the value of other variables in a model. Hence, an independent variable is a variable whose value is not dependent on other variables in a model. The dependent variable here is HDI and this will be regressed against the independent variables which include Life expectancy at birth, Mean years of schooling, expected years of schooling and Gross National Income per capita Hence we can model this into Yi = b0 + b1 xi + b2 xi + b3 xi + b4 xi + ??? where Y is HDI, ? 0 is a constant, ? ? 2 ? 3 ? 4 are the coefficients and ? denotes for random/error term.
R2 is how much your response variable (y) is explained by your explanatory variable (x). The value of R2 ranges between 0 and 1, and the value will determine how much of the independent variable impacts on the dependent variable. The R2 value will show how reliable the regression represents the actual data in forecasting population values of Human Development. R2=1-(? e2/? y2) where ? y2 is Total sum of squares (TSS) and ? y2 is Residual sum of squares (RSS)The closer the R2 value is to the 1 value the more reliable the regression line is as an index, and if it is equal to 1 it represents a perfect fit.
For my data, I have regressed my dependent variable against all my independent variables and computed the R2 to be 0. 9933 (99. 33%), which shows a strong correlation between the dependent and independent variable and therefore the points lie very close to the regression line. The adjusted R2 attempts to yield a more honest value to estimate R2. Adjusted R2 is computed using the formula 1-((1- R2)*(N-1)/(N-k-1)).
When the number of observations (N) is small and the number of independent variables (k) is large, there will be a much greater difference between R2 and adjusted R2 (because the ratio of (N-1)/(N-k-1) will be much less than 1). By contrast, when the number of observations is very large compared to the number of independent variables, the value of R2 and adjusted R2 will be much closer because the ratio of (N-1)/(N-k-1) will approach 1, therefore the adjusted R2 provides an accurate fit compared to R2. The computed adjusted R2 value is 0. 9922 (99. 2%) which is slightly lower than the R2 value and so in turn it does exhibit a high value. The matrix method is used to compute the coefficient for each independent variable.
The correlation coefficient is computed by vR2, so we get v0. 9933= 0. 9966. This graph is verifying the relationships between the HDI values against all the independent variables are very strongly positively correlated. Point estimate From the data set I will compute point estimates from the data set that will justify whether or not the independent variables against the dependent variable are good at predicting Human development index.
From the table I can deduce that the majority of the observations percentage error lies between ±0-5% with a few being above ±5%. The negative sign before the percentage error should not matter. The few percentage error that is above ±5% would suggest the model is not suitable but I have calculated the average percentage error which lies between ±0-5% which is low and considered acceptable, from this I can derive that the linear regression is suitable is estimating the relationship HDI and it components.I have dropped two explanatory variables to see if the R2 or adjusted R2 reduces significantly.
The table above shows when I have dropped the explanatory variables “mean years of schooling” and “life expectancy at birth”, the value of R2 has dropped to 0. 9325 and the value of the adjusted R2 has fallen to 0. 9275. From the new values I can deduce the regressions still represent a strong variation of the HDI based on the explanatory variables.