Threats to Validity in the Design and Conduct of Preclinical Efficacy Studies

Unintentional bias and repeatability for preclinical studies has been gaining more and more attention as of late. As guidelines have been published, Friends of FSH Research has followed suit by reviewing grant requests in light of such guidelines. The Grant Program information references some basic material on this issue, and has now added a reference to the 2013 paper by Henderson, et al. published by PLOS Medicine, Threats to Validity in the Design and Conduct of Preclinical Efficacy Studies, which seeks to determine common recommendations through a meta-analysis of guidance gathered from multiple sources.


Background: The vast majority of medical interventions introduced into clinical development prove unsafe or ineffective. One prominent explanation for the dismal success rate is flawed preclinical research. We conducted a systematic review of preclinical research guidelines and organized recommendations according to the type of validity threat (internal, construct, or external) or programmatic research activity they primarily address.

Methods and Findings: We searched MEDLINE, Google Scholar, Google, and the EQUATOR Network website for all preclinical guideline documents published up to April 9, 2013 that addressed the design and conduct of in vivo animal experiments aimed at supporting clinical translation. To be eligible, documents had to provide guidance on the design or execution of preclinical animal experiments and represent the aggregated consensus of four or more investigators. Data from included guidelines were independently extracted by two individuals for discrete recommendations on the design and implementation of preclinical efficacy studies. These recommendations were then organized according to the type of validity threat they addressed. A total of 2,029 citations were identified through our search strategy. From these, we identified 26 guidelines that met our eligibility criteria—most of which were directed at neurological or cerebrovascular drug development. Together, these guidelines offered 55 different recommendations. Some of the most common recommendations included performance of a power calculation to determine sample size, randomized treatment allocation, and characterization of disease phenotype in the animal model prior to experimentation.

Conclusions: By identifying the most recurrent recommendations among preclinical guidelines, we provide a starting point for developing preclinical guidelines in other disease domains. We also provide a basis for the study and evaluation of preclinical research practice.