Take the 2-minute tour ×
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It's 100% free, no registration required.

Let ${(x_1,x_2,z_1,z_2)}$ be real-valued vectors of equal length with

$${\hat{r}_p(x_1,x_2) \approx \hat{r}_p(z_1,z_2) > 0}$$

$${\hat{r}_p(x_1,z_1) \approx \hat{r}_p(x_2,z_2) > 0}$$

where $\hat{r}_p$ denotes the estimated Pearson correlation coefficient. Now perform two simple linear regressions

$$x_1 = \hat\beta_{0,1}+\hat\beta_{1,1}z_{1}+\hat{e}_1$$ $$x_2 = \hat\beta_{0,2}+\hat\beta_{1,2}z_{2}+\hat{e}_2$$

(The coefficients have two subscripts to make clear that these two regression equations do not share their coefficients.)

The purpose of the regressions is to extract the residuals $\hat{e}_i$ as versions of the $x_i$ adjusted for/uncorrelated with the corresponding $z_i$.

My question is whether general statements can be made about the magnitude of ${\hat{r}_p(\hat{e}_1,\hat{e}_2)}$ in relation to that of ${\hat{r}_p(x_1,x_2)}$, that is, is the estimated correlation between the $x_i$ expected to grow or shrink by "adjusting for" (removing linear correlation with) the $z_i$?

share|improve this question
1  
Doesn't this question answer itself (in the negative)? If you make the magnitudes of the $\epsilon_i$ small, then you can induce any correlation you please between them without appreciably changing any of the relationships among the $X_i$ and $Z_i$. –  whuber Oct 5 '12 at 12:53
    
I can't follow. The $\epsilon_i$ are the residuals from the linear regressions. They are determined by the model fit. What choice do I have regarding their magnitude? –  miura Oct 5 '12 at 13:49
1  
You are asking a theoretical question. You can construct data by specifying the $X_i$, $Z_i$, and $\epsilon_i$ subject to the constraints implied by your assumptions. You can easily meet all those constraints (about correlations) by choosing $\epsilon_i$ of small magnitude. –  whuber Oct 5 '12 at 14:42
    
I am thankful for the attention my question is receiving from you. However, my question arises from a practical problem where my data have already been sampled and are not being specified by me. I thought it beneficial to try and pose the question as general as possible, but now it appears that posing it in terms of random variables instead of data has made it a wholly different question. –  miura Oct 6 '12 at 9:34
2  
It is correct that questions about random variables are usually different than questions about data. But since nobody has attempted to answer yet, why not just edit this question so that it asks what you really want to know? –  whuber Oct 7 '12 at 16:52

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.