Understanding Research Credibility: HDR Analysts Contribute to SCORE

Understanding Research Credibility: HDR Analysts Contribute to SCORE

Systematizing Confidence in Open Research and Evidence (SCORE) was a large collaborative research project designed to improve judgments about scientific credibility in the social and behavioral sciences. Back when Peter and I were psychology professors, we independently chose to join this initiative, the results of which were recently published in Nature.  

In working with clients at HDR, we are often faced with the question of how much confidence should be placed in a model result or empirical finding. Decision-making happens under uncertainty, so part of the job is deciding not just what the evidence says, but how much weight it should carry. That broader question is one reason why the results of the SCORE project are relevant to our current work.  

SCORE was a DARPA-funded multi-method collaboration involving 865 researchers. As part of the initiative, the credibility of published findings was evaluated across three dimensions: reproducibility, robustness, and replicability.  

 

  • Researchers for the reproducibility study (Nature | Open Access) examined whether re-running an original analysis on the original data from published research articles will produce the same result reported by the original authors. Only approximately 54% of sampled papers were precisely reproducible. Papers from political science and economics journals had higher reproducibility rates compared to those from other disciplines. Paper recency and journal data sharing policies also predicted reproducibility.  
  • Researchers for robustness study (Nature | Open Access) tested whether conclusions hold when reasonable alternative analytical choices are applied to the same data. While 74% of the re-analyses reached the same conclusion as the original authors, quantitative results like effect sizes varied substantially. 
  • Researchers for the replicability study (Nature | Open Access) attempted independent replications of 274 claims drawn from 164 published papers. The replications were carefully designed, used the original materials when possible, and were peer-reviewed in advance. Only Fifty-five percent of claims replicated with statistically significant results in the original direction (see Figure 1). Replication rates varied somewhat across discipline and replication criteria. 
Figure 1: Each point shows the original and replication effect sizes for a replicated claim. Point size reflects the number of claims per paper. Replication effect sizes are shown as positive when the observed relationship follows the same direction as the original effect, and negative when the relationship is in the opposite direction. Points are classified as successful if the replication is statistically significant (p .05, two-sided) and in the same direction as the original effect; otherwise, they are classified as failed. 

These investigations remind us that research credibility is not a single property. A finding can survive one test and fail another. Reproducing an analysis, obtaining similar conclusions under alternative specifications, and observing the same result in new data each tell us something different. When the same findings are repeatedly observed, confidence in the robustness and reliability of the results increases.  In fact, replication rates are essential for estimating the probability of a hypothesis being true (see Doug’s paper in the American Statistician for more on this point).

Confidence about research findings has value beyond the academic community. In applied work, we rely on leveraging empirical research to support our claims.  Many HDR projects involve integrating multiple forms of evidence, each with different strengths and limitations. Historical observations, expert judgment, and external benchmarks may all serve as inputs into a model. Methods rooted in probabilistic reasoning and uncertainty quantification provide a framework for combining these sources while making confidence levels explicit. Rather than treating evidence as simply true or false, such approaches recognize that confidence should increase as findings remain consistent across multiple lines of inquiry and decrease when conclusions depend heavily on particular assumptions or analytical choices.  

SCORE’s datasets, methods, and findings are openly available. Take a look! This initiative represents one of the most comprehensive efforts to quantify reliability in published social and behavioral science, and Peter and I are proud to have played a small part in it. 

Let’s Get Personal: Does Personality Matter for Decision Makers?

Let’s Get Personal: Does Personality Matter for Decision Makers?

Personality is one of the most extensively studied areas in psychology. Personality has a substantial genetic component, is difficult to modify, and tends to be relatively stable across the lifespan (though subtle trait-level changes do occur). Research on personality and decision-making has repeatedly demonstrated that under different decision conditions, certain personality profiles tend to perform better than others. These findings do not suggest that any personality profile is inherently good or bad at decision-making. Rather, recognizing the conditions under which individuals with certain profiles make better or worse decisions allows for targeted interventions.

One widely used framework for describing personality is the Five Factor Model (FFM) developed by McCrae and Costa (commonly referred to as the Big Five). If you are not familiar, the Big Five constitutes a well-established model that groups personality characteristics into five traits, each measured along a spectrum, meaning individuals can score higher or lower on any of them. The five traits are Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. You can use the acronyms OCEAN or CANOE to remember them. As an aside, I used OCEAN as my starting Wordle guess for a long time (because of the vowels) until one day it came back all yellow, and my heart sank when I realized the answer was CANOE (so close to getting it in 1).

Research has examined how each of these traits can influence decisions for better or for worse. While personality traits themselves are relatively stable, being aware of how your personality may bias your decisions is critical if you want to mitigate those biases and improve decisions.

Determining where you fall on these traits is fairly straightforward. Many validated online assessments are free to use. I will link a test at the bottom of this post if you want to see where you fall on the different traits. Be honest when answering if you want accurate results. When I was a psychology professor, I had to remind my General Psychology students that if they truly wanted insight, they had to answer truthfully rather than trying to “game the system” to get socially desirable results. While many of these assessments rely on rating scales and scoring methods that HDR would not fully endorse for high-stakes decision analysis, they are generally validated and directionally consistent for identifying trait tendencies. The short questionnaire linked includes 50 items that reflect the five traits and are summed into composite scores.  While not an HDR-approved measurement tool, these assessments can provide useful insight into traits relevant to decision making.

Going in order of the OCEAN acronym, Openness, formally Openness to Experience, reflects the extent to which individuals are imaginative, intellectually curious, and open to new ideas. Individuals who score high in this trait tend to tolerate ambiguity and uncertainty more comfortably when making complex judgments. Those who score lower tend to prefer structure and familiarity. This dimension is not a reflection of intelligence, but of cognitive style. Highly open individuals may sometimes over-explore possibilities, whereas less open individuals may resist creative alternatives. Awareness of this tendency allows you to introduce structured mechanisms such as pre-mortems to surface alternative perspectives and reduce blind spots.

Conscientiousness reflects how reliable, organized, disciplined, and plan-oriented an individual is. Highly conscientious individuals tend to engage in more structured and careful analysis before acting. In decision-making contexts, this tendency often translates to more deliberate planning and reduced error from oversight. If you score lower on this trait, you may benefit from imposing additional structure, planning checkpoints, and analytical rigor into your workflow. In How to Measure Anything in Project Management, Doug Hubbard and his co-authors, Alex Budzier and Andreas Leed, argue that project managers can often economically justify investing more time in planning and analysis given the high rate of cost overruns and delays in projects.

Extraversion reflects the degree to which an individual seeks stimulation and social interaction. More extraverted individuals tend to be assertive, outgoing, and energized by social engagement, while those lower in extraversion, often referred to as introverts, tend to prefer smaller settings and quieter reflection. Extraversion is relevant to decision-making because it is often associated with greater risk tolerance and faster action. At HDR, we emphasize empirically quantifying organizational risk tolerance. Without doing so, both external factors and internal characteristics such as a leader’s degree of extraversion can unintentionally influence risk-related decisions.

Agreeableness reflects the extent to which an individual is cooperative, warm, and trusting. Highly agreeable individuals may avoid conflict and be less inclined toward skepticism, which can reduce critical evaluation in certain decisions. Conversely, individuals low in agreeableness may provide strong adversarial scrutiny but risk dismissing valid ideas prematurely. A balance between cooperation and healthy skepticism is often ideal for group decision-making. While trait levels themselves are relatively stable, awareness of these tendencies allows individuals to adjust their behavior intentionally.

Finally, Neuroticism reflects emotional reactivity and sensitivity to stress. Individuals higher in neuroticism tend to experience greater anxiety and stress reactivity, which can impair performance in high-pressure decision environments. Under lower-pressure conditions, decision performance among such individuals is often comparable to that of individuals lower in neuroticism. Recognizing how stress interacts with this trait can help decision makers structure environments that mitigate stress‑related decision errors.

Numerous personality models exist (the FFM just being one of them), and many of them provide useful frameworks for understanding how trait tendencies may affect decision behavior. No trait or combination of traits makes someone inherently poor at decision-making; with awareness and the right safeguards, individuals can measurably improve their decisions.

 

Big Five Personality Test