Weird News : In the comments section of Denise Minger’s post on July 16, 2010, which discusses some of the data from the China Study (as a follow up to a previous post on the same topic), Denise herself posted the data she used in her analysis. This data is from the China Study. So I decided to take a look at that data and do a couple of multivariate analyzes with it using WarpPLS (warppls.com).
First I built a model that explores relationships with the goal of testing the assumption that the consumption of animal protein causes colorectal cancer, via an intermediate effect on total cholesterol. I built the model with various hypothesized associations to explore several relationships simultaneously, including some commonsense ones. Including commonsense relationships is usually a good idea in exploratory multivariate analyses.
The model is shown on the graph below, with the results. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore causative associations between variables. The variables are shown within ovals. The meaning of each variable is the following: aprotein = animal protein consumption; pprotein = plant protein consumption; cholest = total cholesterol; crcancer = colorectal cancer.
The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable). A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to.
The P values indicate the statistical significance of the relationship; a P lower than 0.05 means a significant relationship (95 percent or higher likelihood that the relationship is real). The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. Ignore the “(R)1i” below the variable names; it simply means that each of the variables is measured through a single indicator (or a single measure; that is, the variables are not latent variables).
I should note that the P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This is good, because I checked the data, and it does not look like it is normally distributed. So what does the model above tell us? It tells us that:
- As animal protein consumption increases, colorectal cancer decreases, but not in a statistically significant way (beta=-0.13; P=0.11).
- As animal protein consumption increases, plant protein consumption decreases significantly (beta=-0.19; P<0.01). This is to be expected.
- As plant protein consumption increases, colorectal cancer increases significantly (beta=0.30; P=0.03). This is statistically significant because the P is lower than 0.05.
- As animal protein consumption increases, total cholesterol increases significantly (beta=0.20; P<0.01). No surprise here. And, by the way, the total cholesterol levels in this study are quite low; an overall increase in them would probably be healthy.
- As plant protein consumption increases, total cholesterol decreases significantly (beta=-0.23; P=0.02). No surprise here either, because plant protein consumption is negatively associated with animal protein consumption; and the latter tends to increase total cholesterol.
- As total cholesterol increases, colorectal cancer increases significantly (beta=0.45; P<0.01). Big surprise here!
Why the big surprise with the apparently strong relationship between total cholesterol and colorectal cancer? The reason is that it does not make sense, because animal protein consumption seems to increase total cholesterol (which we know it usually does), and yet animal protein consumption seems to decrease colorectal cancer.
When something like this happens in a multivariate analysis, it usually is due to the model not incorporating a variable that has important relationships with the other variables. In other words, the model is incomplete, hence the nonsensical results. As I said before
The China Study again: A multivariate analysis suggesting that schistosomiasis rules!
Selasa, 06 Desember 2011
Langganan:
Posting Komentar (Atom)
0 komentar:
Posting Komentar