@westonRecommendationsIncreasingTransparency

Recommendations for Increasing the Transparency of Analysis of Preexisting Data Sets

() - Sara J Weston, Stuart J Ritchie, Julia M Rohrer, Andrew K Przybylski

Journal:
Link::
DOI::
Links::
Tags:: #paper #Pre-Analysis
Cite Key:: [@westonRecommendationsIncreasingTransparency]

Abstract

Secondary data analysis, or the analysis of preexisting data, provides a powerful tool for the resourceful psychological scientist. Never has this been more true than now, when technological advances enable both sharing data across labs and continents and mining large sources of preexisting data. However, secondary data analysis is easily overlooked as a key domain for developing new open-science practices or improving analytic methods for robust data analysis. In this article, we provide researchers with the knowledge necessary to incorporate secondary data analysis into their methodological toolbox. We explain that secondary data analysis can be used for either exploratory or confirmatory work, and can be either correlational or experimental, and we highlight the advantages and disadvantages of this type of research. We describe how transparency-enhancing practices can improve and alter interpretations of results from secondary data analysis and discuss approaches that can be used to improve the robustness of reported results. We close by suggesting ways in which scientific subfields and institutions could address and improve the use of secondary data analysis.

Notes

"is also possible to use preexisting data to test theories in a confirmatory fashion. However, this endeavor comes with an important caveat: Many commonly applied statistical tests were developed under specific assumptions. For example, nullhypothesis significance testing assumes that the statistical test is chosen prior to data collection; this is part of what makes data peeking so problematic in research (Armitage, McPherson, & Rowe, 1969; Munafò et al., 2017). Consequently, researchers conducting secondary data analyses that might help confirm a theory must take extra steps to ensure the robustness of their results." (Weston et al :15)

"Table 1. Approaches for Improving Inferences Based on (Secondary) Data Analysis" (Weston et al :16)

"Another approach to limit false-positive findings is setting a conservative alpha level. For example, researchers might want to use a level of .005 instead of .05 (Benjamin et al., 2018), or decrease their alpha as a function of sample size to balance error rates (Lakens, 2018). Note that this suggestion is by no means limited to secondary data analysis." (Weston et al :16)

"The existence of multiple, independent, large-scale survey studies also allows for evaluation of generalizability in the context of secondary data analyses. In this kind of multicohort coordinated analysis (suggested by Hofer & Piccinin, 2009), researchers can test the same (or similar) analytic models in different samples, representing, for example, different geographic locations or cohorts, or different measurement instruments. Results can be pooled to better estimate an effect size and evaluate heterogeneity across differences in populations and method" (Weston et al :16)

"It has been argued that a major flaw of the way research is currently reported is that exploratory research is often written up as if it were confirmatory all along (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). Clearly identifying exploratory analyses helps readers better assess the robustness of a particular result and opens the door for highquality confirmatory follow-up research. We recommend that researchers omit p values and other tests of significance from exploratory analyses, as these cannot be interpreted properly without a confirmatory framework" (Weston et al :16)

"The objective of attempts to increase transparency is to live up to the ideal summarized in the motto of the United Kingdom's Royal Society: Nullius in verba, or "take nobody's word for it."" (Weston et al :20)

"We recommend several ways in which researchers can transparently document a secondary data analysis:" (Weston et al :20)

"First, researchers can provide links to codebooks and instructions for accessing the data." (Weston et al :20)

"Second, researchers should communicate how the data have been used" (Weston et al :20)

"note of caution is warranted: It is quite likely that a researcher's history analyzing a particular preexisting data set is not limited to what has been published. Researchers should disclose any analysis that is relevant to the current project" (Weston et al :21)

"Third, researchers can document the data-wrangling and -analysis pipeline. Sharing the analytic script is not always considered part of sharing materials, depending on the journal, but it is especially important for researchers using preexisting data" (Weston et al :21)

"Fourth, we recommend that secondary data analysis be preregistered. As in the case of primary data analysis, preregistration should occur before the analyses are conducted." (Weston et al :21)

"On the basis of empirical testing, Young and Holsteen (2017) described three different degrees of model robustness: First, the result may hold no matter how" (Weston et al :22)

"the model is specified (i.e., the finding is robust). Second, the result may depend on some specific model ingredients, such as a particular covariate (i.e., there is systematic variability). Third, the result may depend on a very specific combination of parameters and arise only in one (or a few) of many possible models ("knife edge" specification)" (Weston et al :22)

"The simplest way to probe the robustness of a finding is to perform robustness checks (also known as sensitivity analyses)" (Weston et al :22)