What is a measure of the strength of the relationship between two variables?
Definition: The statistical measures which show a relationship between two or more variables are called Measures of Relationship. Correlation and Regression are commonly used measures of relationship. In this blog, we will understand the Covariance measure and its calculations steps. Part 2 of this blog will explain the calculation of Correlation. Show (Related read: Linear Regression Blog Series)
CovarianceCovariance is the measure of the joint variability of two random variables (X, Y). For Example – Income and Expense of Households. The households having higher Income (say X) will have relatively higher Expenses (say Y) and vice-versa. This kind of relationship between two variables is called joint variability and is measured through Covariance and Correlation. Covariance is represented as Cov(X, Y). (Wikipedia link). The covariance can Positive, Negative, or Zero. Positive Covariance: If the variable(X) takes a higher value, the value of the corresponding variable(Y) is also higher and vice-versa.E.x. Income and Expense of Household. As X takes a higher value, the corresponding values of Y is on the higher side Negative Covariance: If the variable(X) takes a higher value, the value of the corresponding variable(Y) is low and vice-versa.Example: Price and Demand. As the Price of a commodity increases, its Demand decreases. Zero Covariance or No Covariance: There is no linear relationship between variable(X) and variable(Y).Note: The Zero Covariance means the covariance will be zero or near zero
Formula
Hands-on ExampleTo understand the concept of covariance, it is important to do some hands-on activity. A sample survey data of 15 households is given below. The fields are Monthly Income, Monthly Expense, and Annual Income details of the households. Mthly_HH_IncomeMthly_HH_ExpenseAnnual_HH_Income50008000642006000700079920100004500112800100002000972001250012000147000140008000196560150001600016740018000200002160001900090002188802000090002208002000018000278400220002500027984023400500029203224000105003168002400010000244800
Scatter PlotA scatter plot is best used to visually see the linear relationship between X and Y. From the above two scatter plots we can see that Monthly Income has positive covariance with both the variables, Annual Income, and Monthly Expense.However, the linearity between Monthly Income and Annual Income appears to be much strong as compared to the relationship between Monthly Income and Monthly Expense. The strength of the linear relationship between two continuous variables is measured by a statistical measure called Correlation
Covariance CalculationsLet us denote Monthly Household Income as X and Monthly Household Expense as Y. Then the covariance of Monthly Income and Expense is: Cov(X,Y) = sum( (X - mean(x)) * (Y - mean(y)) ) / (n - 1)
Mean calculation # Calculating mean(X) mean(x) = (5000+6000+10000+10000+12500+14000+15000+18000+19000+20000+20000+22000+23400+24000+24000) / 15 mean(x) = 242900 / 15 mean(x) = 16193.33 # Calculating mean(Y) mean(y) = (8000+7000+4500+2000+12000+8000+16000+20000+9000+9000+18000+25000+5000+10500+10000) / 15 mean(y) = 164000 / 15 mean(y) = 10933.33
Intermediate covariance calculation steps Monthly Inc. (Y)X – mean(x)Y – mean(y)(X – mean(x)) * (Y – mean(y))50008000-11193.33-2933.3332833777.7860007000-10193.33-3933.3340093777.78100004500-6193.33-6433.3339843777.78100002000-6193.33-8933.3355327111.111250012000-3693.331066.67-3939555.56140008000-2193.33-2933.336433777.781500016000-1193.335066.67-6046222.2218000200001806.679066.6716380444.441900090002806.67-1933.33-5426222.222000090003806.67-1933.33-7359555.5620000180003806.677066.6726900444.4422000250005806.6714066.6781680444.442340050007206.67-5933.33-42759555.5624000105007806.67-433.33-3382888.8924000100007806.67-933.33-7286222.22 Sum(X – mean(x)) * (Y – mean(y))
Final covariance calculation step n = 15
mean(x) = 16193.33
mean(y) = 10933.33
sum( (X - mean(x)) * (Y - mean(y)) ) = 223293333.33
#Therefore the Covariance of Sample monthly Household Income and Expence is
Cov(X,Y) = sum( (X - mean(x)) * (Y - mean(y)) ) / (n - 1)
Cov(X,Y) = 223293333.33 / (15 - 1) => 223293333.33 / 14
Cov(X,Y) = 15949523.81
Cov(Monthly Income , Monthly Expense) = 15949523.81
Interpretation of Covariance
Disadvantage of Covariance
Application of Variance-Covariance: Beta of StockThe variance-covariance measures do not have any business meaning by themselves. However, these measures are used in calculations of other test statistics like ANOVA, R-Squared, hypothesis testing, statistical inference, and more. One practical application of Variance-Covariance is in calculating the Beta of Stock. Beta is a concept that measures the expected move in a stock relative to movements in the overall market. (Investopedia article on Beta of Stock)
Correlation
Formula
Positive, Negative, Zero CorrelationsThe two variables(X, Y) can have Positively Correlation, Negatively Correlation, or Zero correlation.
Hands-on ExampleLet’s calculate the correlation coefficient between two variables (monthly Income, Monthly Expense) for 15 sample household Survey data given in the below table. Mthly_HH_IncomeMthly_HH_ExpenseAnnual_HH_Income50008000642006000700079920100004500112800100002000972001250012000147000140008000196560150001600016740018000200002160001900090002188802000090002208002000018000278400220002500027984023400500029203224000105003168002400010000244800
Correlation Calculations
The correlation between monthly Income and monthly Expense is 0.396. Therefore, there is a Low Positive correlation between Monthly Household Income (X), and the Monthly Household Expense (Y). Is a measure of the strength of the relationship between two variables psychology?A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between variables. The correlation coefficient is usually represented by the letter r. The number portion of the correlation coefficient indicates the strength of the relationship.
What is a measure of the relationship between two variables termed?Correlation is a statistical technique that is used to measure and describe a relationship between two variables. Usually the two variables are simply observed, not manipulated. The correlation requires two scores from the same individuals. These scores are normally identified as X and Y.
|