Social Behavioral Sciences Replicability

Q: Where can I access the study data?

All data and code on OSF , promoting transparency.

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

a close up of a container with words on it — Photo by Google DeepMind on Unsplash

Promote Your Research… Share it Worldwide

Have a story or a research paper to share? Become a contributor and publish your work on AcademicJobs.com.

Submit your Research - Make it Global News

Understanding the Latest Push for Scientific Integrity in Social and Behavioral Research

The quest for reliable knowledge has long been the cornerstone of scientific progress, particularly in fields like psychology, sociology, economics, and political science, collectively known as the social and behavioral sciences. A groundbreaking study published in the prestigious journal Nature has thrust this pursuit into the spotlight by systematically attempting to replicate 274 specific claims from 164 previously published quantitative papers. This massive endeavor, part of the SCORE project funded by DARPA, offers unprecedented insights into the reproducibility—or replicability—of findings in these disciplines.

Replicability, defined as the ability to obtain consistent results using independent investigators, the same procedures, and comparable data, is fundamental to building trustworthy science. The study, led by over 290 researchers including Andrew H. Tyner from the Center for Open Science and Anna Lou Abatayo from Wageningen University, targeted papers published between 2009 and 2018 across 54 journals. By focusing on claims that reported positive, statistically significant results, the team aimed to test whether these findings hold up under rigorous re-examination.

The Methodology Behind the Replications

To ensure fairness and power, the replications were designed with meticulous care. Each attempt boasted a median power of 99.6% to detect the original reported effect sizes, far surpassing typical study powers. Original materials and stimuli were used whenever available, and protocols underwent pre-registered peer review through a standardized internal process. This approach minimized biases and maximized the chances of accurate reproduction.

The sample spanned diverse subfields within social and behavioral sciences, including experimental psychology, behavioral economics, and social neuroscience. Replications were conducted using both new data collections and secondary datasets where appropriate, providing a robust test of generalizability. All data, code, and materials are openly available on the Open Science Framework, exemplifying transparent research practices.

Diagram illustrating the replication methodology from the Nature SCORE study

Key Findings: A Mixed Picture of Replication Success

The results paint a sobering yet nuanced picture. Out of 274 claims, 151 (55.1%, with a 95% confidence interval of 49.2–60.9%) produced statistically significant results in the same direction as the originals. When aggregated at the paper level—accounting for multiple claims per paper—80.8 out of 164 papers (49.3%, 95% CI 43.8–54.7%) replicated successfully. These rates hover around 50%, echoing earlier replication efforts like the Reproducibility Project: Psychology but extending the scope dramatically.

Effect sizes tell an even starker story. Original studies reported a median Pearson's r of 0.25 (indicating moderate effects), while replications yielded a median r of 0.10—a staggering 82.4% reduction in shared variance (95% CI 67.8–88.2%). This decline is partly anticipated due to statistical phenomena like regression to the mean and the winner's curse, where only positive results are selected for replication. Nonetheless, it underscores fragility in many findings.

Disciplinary Variations and What They Reveal

Replication rates showed modest variation across disciplines, ranging from 42.5% to 63.1%, though some estimates carry high uncertainty due to smaller sample sizes within subfields. For instance, certain areas of psychology and economics mirrored past projects' ~40-60% success rates, while others fared slightly better. This suggests that replicability challenges are not confined to one corner of the social and behavioral sciences but permeate the broader landscape.

Thirteen different metrics for assessing replication success—ranging from p-value comparisons to effect size overlaps—produced estimates between 28.6% and 74.8%, with a median of 49.3%. Such variability highlights the need for standardized criteria in evaluating reproducibility.

Companion Insights on Analytical Robustness

Complementing the replication work, a parallel paper in Nature examined analytical robustness by having multiple independent analysts re-examine data from 100 studies. Shockingly, only 34% of reanalyses matched the original results within a tight tolerance (±0.05 Cohen's d), rising to 57% with broader margins. Conclusions aligned in 74% of cases, but 24% were inconclusive and 2% reversed the original effect. This reveals how researcher choices in data processing, statistical models, and software can profoundly influence outcomes, adding another layer to the reproducibility puzzle.

Explore the analytical robustness findings in detail.

Context Within the Broader Reproducibility Crisis

This study arrives amid a decade-long reproducibility crisis that rocked the social sciences starting around 2011. High-profile failures, such as the inability to replicate power posing or ego depletion effects, sparked debates on questionable research practices like p-hacking and publication bias favoring novel, positive results. Earlier efforts, including replications of 21 Nature and Science papers (Camerer et al., 2018), reported similar ~50% success rates.

Yet, progress is evident. Initiatives like preregistration, open data sharing, and registered reports have gained traction in universities worldwide, fostering a culture of transparency. The SCORE project's scale—replicating across hundreds of claims—provides the most comprehensive benchmark to date, signaling that while problems persist, systematic assessment is advancing the field.

Implications for Researchers and Academic Institutions

For graduate students and early-career researchers in higher education, these findings are a call to action. Universities must integrate replicability training into curricula, emphasizing large sample sizes, transparent reporting, and multi-lab collaborations. Faculty hiring committees could prioritize candidates with open science commitments, as evidenced by growing demand for reproducible skills in research positions.

Institutions face pressure to reform incentives: from rewarding flashy publications to valuing rigorous, replicable work. Funding bodies like NSF and ERC are increasingly mandating data sharing, potentially reshaping grant allocations.

Solutions and Best Practices Emerging from the Data

The study identifies promising paths forward. High-powered designs and preregistration correlate with better outcomes in prior work. Here's a breakdown of actionable steps:

Preregister studies: Outline hypotheses, analyses, and exclusions upfront to curb flexibility.
Increase sample sizes: Aim for 80-90% power to detect realistic effects.
Share materials openly: Enable direct replications by peers.
Use multiverse analysis: Explore outcome robustness to analytical choices.
Collaborate across labs: Pool resources for larger, more reliable tests.

Adopting these could boost replicability rates significantly, as demonstrated in recent high-rigor projects achieving 97% replication fidelity.

Stakeholder Perspectives and Real-World Impacts

Experts like Brian Nosek, co-founder of the Center for Open Science, hail SCORE as a milestone in systematizing confidence in evidence. Policymakers drawing on behavioral insights for public health or economic interventions must now scrutinize findings more rigorously to avoid misguided applications.

In academia, underreplication erodes public trust, potentially affecting enrollment in social science programs and funding. Conversely, addressing it positions universities as leaders in credible science.

Chart depicting replication success rates across disciplines from the Nature study

Future Outlook: Toward a More Replicable Science

Looking ahead, prediction markets and AI-assisted forecasting may preemptively flag fragile findings, as explored in SCORE extensions. Journals are piloting badges for open practices, and platforms like OSF facilitate verification.

Read the full Nature study to dive deeper. As social and behavioral sciences evolve, embracing replicability will ensure their contributions to understanding human behavior remain robust and impactful.

For those navigating academic careers amid these shifts, resources like comprehensive CV guides can help showcase reproducible research prowess.

the word social media written in white type on a black background

Photo by Hakim Menikh on Unsplash

Frequently Asked Questions

🔬What is replicability in scientific research?

Replicability refers to obtaining consistent results from independent studies using similar methods and samples, ensuring findings are generalizable beyond the original context.

📊How many experiments were replicated in the Nature study?

The study attempted replications of 274 positive claims from 164 papers published 2009-2018 across 54 journals in social and behavioral sciences.

✅What was the overall replication success rate?

151 of 274 claims (55.1%) and 49.3% of papers replicated statistically significant results in the original direction.

📉Why did effect sizes decrease in replications?

Median Pearson's r dropped from 0.25 to 0.10 due to regression to the mean, higher power, and selecting only positive originals.

🏛️What is the SCORE project?

SCORE (Systematizing Confidence in Open Research and Evidence), a DARPA-funded initiative assessing reproducibility, robustness, and replicability in social-behavioral sciences.