Scientific tools, open source, and large integers

Mar 12, 2026

penguins on white sand beach during daytime — Photo by Tam Minton on Unsplash

Eventually, academic discussions arise regarding the validity of different software programs. In a statistical modeling interest group, a colleague shared the following text, humorously implying that it was a point for Stata in the battle with R.

Scott's Mixtape Substack

Claude Code 31: Apple-to-Apple Audit of Six Callaway and Sant'Anna packages

This is part of a long-running series on Claude Code which began back in mid-December. You can find all of them here. My interest is consistently more on the end of “using Claude Code for practical empirical research” defined as the type of quantitative (mostly causality) research you see these days in the social sciences which focuses on datasets sto…

2 months ago · 15 likes · 4 comments · scott cunningham

When a controversial topic comes up repeatedly, I tend to write something about it, as was the case with causality and the nature of psychiatric illnesses.

OThe author asked an AI to perform a code audit, testing different estimators (in packages written with R, Stata, Python), finding anomalies in the values estimated via R. They then tried to ask Claude to debug and discovered a problem with big integers in R, the solution to which depended on scaling the values (e.g., transforming 136 into 1.36). Check out a previous discussion with Big Integers and also this text about infinite numbers:

Technical details aside, such comparisons lead to the idea that open-source software is less reliable than closed-source commercial alternatives. On many occasions, this is true. We may have fewer quality and security assessments if we select random software. Some notorious supply chain attacks have exploited this interface.

However, for scientific purposes, I usually advocate for open-source software as the gold standard, which has already led to fruitful discussions.1

Note that, not by chance, the model tested was originally implemented by the authors (Callaway and Sant’Anna) in R. They emphasize the open-source nature of the code in the abstract.

Difference in differences . Abstract: (...) Open-source software is available for implementing the proposed methods.

When I discuss this topic, I see a connection to the concept of science. Many people think of scientific results as those closest to the truth, to correctness, whereas verifiability is a concept more closely tied to methodological rigor that distinguishes science from other traditions.

A good analogy can be found in the security and cryptography sector: In chip production, if you are an end consumer, you will prefer a ‘closed’ chip that has an ‘INMETRO’ seal of approval for tests with a rare failure rate. However, if you are someone who is objectively (scientifically) exploring the security of chips in critical applications (e.g., elevators, airplanes, nuclear power plants), it is not even possible to analyze those with a closed architecture. For this reason, the cryptography and security software that supports the internet usually has open source code.

Similarly, in a scientific context, the verifiability of procedures is essential. It is not possible to assess the quality of the different estimators listed without access to the underlying processes.

Including a heated disagreement with Ron Bosch (https://hsph.harvard.edu/profile/ronald-bosch/), who also enjoys stats, music, and reviews.

Argolo Studio

Discussion about this post

Ready for more?