There are 3 basic concepts of reproducibility in research:
Reproducible experiments
Reproducible analysis
Reproducible inference
May 26, 2016
There are 3 basic concepts of reproducibility in research:
Reproducible experiments
Reproducible analysis
Reproducible inference
There are 3 basic concepts of reproducibility in research:
Reproducible experiments
Another lab can do the same experiment and obtain similar results.
Reproducible analysis
Another analyst can redo the analysis with the same data and obtain identical results.
Reproducible inference
After reproducing the experiment in another lab, similar scientific inferences will be made.
Today:
We have done an experiment in which we measured a phenotypic score on each sample (e.g. biopsy result) and also measured the gene expression for 1000 genes in the samples.
Our objective is to determine which genes are associated with the phenotypic score, and to develop a prediction equation for the phenotype
We pursue 2 analyses:
Correlation between gene expression and phenotypic score.
Division of the samples into "low" and "high" scores, followed by a t-test to determine if there is a difference in gene expression in the low and high score groups.
There are some genes with correlation as low as -0.6 or as high as 0.6 with the phenotype.
These are good candidates for genes that are important to determining or predicting genotype.
Lets select the 10 genes with the highest absolute correlation and see how well they predict phenotypic score.
anov=anova(regfit) anov
## Analysis of Variance Table ## ## Response: pheno ## Df Sum Sq Mean Sq F value Pr(>F) ## sigGenes 10 25.5240 2.55240 14.074 0.0002485 *** ## Residuals 9 1.6322 0.18136 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that \(R^2=\) 0.94 and the p-value is 2.4810^{-4} which are highly significant.
Now lets do an alternative analysis classifying the phenotype and "low" and "high" and using an eBayes t-test to distinguish between them.
Note that this is a 2-sample t-test with 18 d.f. so the values greater than 2.1 (or less than -2.1) are significant at p<.05. There are several of these.
We might also wonder if the most statistically significant genes in this analysis match the ones from the correlation analysis.
plot(cors,efit.out$t[,2],xlab="Correlation",ylab="t-statistic")
We see that there is very high correlation between the correlation statistic and the t-statistic, so that almost the same gene set will be selected.
Naturally, we feel that we have done a great job of this experiment, but our collaborator wants to redo it to verify.
We compare results:
Of course we don't expect to obtain the exact same correlations, but the pattern is fairly similar.
What about the regression on the top 10 genes?
Our ANOVA table:
## Df Sum Sq Mean Sq F value Pr(>F) ## sigGenes 10 25.52398 2.55240 14.07385 0.00025 ## Residuals 9 1.63222 0.18136
Collaborator's ANOVA table:
## Df Sum Sq Mean Sq F value Pr(>F) ## sigGenesC 10 16.91495 1.69150 13.82306 0.00027 ## Residuals 9 1.10131 0.12237
These are quite similar.
Similarly if we compute the 2-sample t-tests:
The correspondance looks equally good if we consider a heatmap of gene expression and other typical measures.
But there is a problem -
Lets look e.g. at the genes with |cor|>.5 or p<0.05 for our study and our collaborator's study.
Here are the genes with correlation less than -0.5 or greater than 0.5.
Here are the genes with t less than -2.1 or greater than 2.1.
Even though overall our collaborator's results seem similar to ours, the resulting gene lists are very different.
What went wrong?
The two "studies" cited here are "in silico" studies.
For each, I generated 20 "phenotypic scores" and then independently generated 1000 Normally distributed gene expression values for each "sample".
All the gene expression values are independent of the phenotypic scores and of each other.
So why do we get such similar results for the 2 "studies"?
Firstly lets look at the correspondance between the actual results - e.g. the correlation of the 2 sets of correlations is -0.02.
Similarly, we can look at the correspondance between the t-values which have correlation -0.02.
Even though the data were generated at random, we appeared to obtain significant (even highly significant) results.
As well, the magnitude of the correlations, t-tests, regression R2 etc. were very concordant between the 2 totally independent experiments.
To understand whether outcomes of our biological experiments have a biological interpretation, we need to understand the behavior of our statistical methods when the data are random.