Methodology · field notes

Corroboration is a scalar, not a vibe.

By Steven Haller · February 12, 2026

When I send a draft to counsel, the first question is not "is this true?" — it is "how confident are you that this is true?"

The default answer most reporters give is a sentence. The right answer is a number.

What follows is the working methodology of a small data-journalism shop that publishes counts the state would prefer not to publish. It is not theory. It is the actual machinery that runs in the background of every report we ship. Other newsrooms are welcome to take any of it, adapt any of it, ignore any of it. I write it down because I would have killed for a document like this in the first year of the work.

the floor — three sources, two channels

Every published claim on this site is staked on a minimum of three independent sources from at least two independent channels.

A source is a single artifact: a court docket entry, an autopsy summary, a FOIA return, an interview transcript, a vendor's contract, a database row, a witness statement. A channel is the class of artifact: the public-records system is one channel, person-to-person interview is another, a contracting database is a third, a leaked container image is a fourth.

Three sources, two channels is a deliberately low floor. It is also higher than what most newsrooms enforce on deadline. Two sources from the same channel — say, two court filings in the same docket — are not two pieces of evidence; they are one piece of evidence quoted twice. A reporter who has not learned that distinction will produce confident-sounding work that collapses the first time a defendant's counsel digs into the citations.

If a story sits at one source, we do not publish. If a story sits at two sources from one channel, we do not publish; we go find a second channel. If after a reasonable effort the second channel does not exist, we either kill the story or publish it as uncorroborated, single-channel, source-named, with a confidence number that reflects that. We have done that exactly twice in three years. Once it turned out we were right. Once we issued a correction. Both times the label did the work it was supposed to do.

the scalar — 0 to 1

Confidence is a number from 0 to 1. Below 0.60, we defer publication. The floor is not arbitrary; it is the threshold at which our retrospective error rate on a population of similar prior claims crossed a level we were willing to live with. We rederive the floor every six months against the current population of corrections.

The number is assigned by the lead reporter and reviewed by a second editor on every claim that ends up in a published count. It is not a posterior probability in a formal Bayesian sense. It is a structured guess, with structure. The structure looks like this:

baseline:              0.50
+ corroboration:       up to +0.30 (three+ sources, two+ channels)
+ source quality:      up to +0.15 (primary documents over secondary)
+ direct attribution:  +0.05 (named on-the-record human source)
- contradictions:      down to -0.40 (any unresolved counter-evidence)
- recency:             -0.05 to -0.20 (artifacts older than the claim window)
- self-interest:       -0.05 to -0.15 (sources with incentive to misrepresent)

This is not formula physics. The weights are negotiated, in writing, between two humans. The point is not that the number is precisely right. The point is that the act of producing a number forces an argument the reporter would otherwise be allowed to wave at.

Confidence is doing work in the sentence. It tells counsel how much exposure the publication is taking on. It tells the subject of the report how much room they have to respond. It tells the reader, in advance, how much of the writer's reputation is staked on this paragraph. If you cannot put a number on it, you do not actually know what you mean.

the ledger

Every claim in a published count lives as a row in a plain-text ledger. The ledger is not glamorous. It looks like this.

2026-01-22  claim_id=mi-deaths-2023-117
            claim: Decedent A.B. died in MDOC custody, Macomb Reg., 2023-09-08
            sources:
              s1  court  Macomb Co. Probate, case 23-2167, file 9
              s2  foia   MDOC FOIA return 2024-1142, response page 6
              s3  ledger county-coroner Macomb 2023 spreadsheet, row 411
            channels: court | foia | coroner
            confidence: 0.84
            reviewer: jc (2026-01-23)
            contradictions: none on record
            right-of-reply: family notified 2026-01-19, declined comment
            publication: counted-deaths-michigan-2015-2024.csv row 1822

The ledger lives in a private repository. It is not the publishable artifact. The publishable artifact is the count, and the dataset, and the report. The ledger is the audit trail behind them. Every count we publish is reproducible, in the sense that two reviewers should be able to walk the ledger row by row and either agree with the published number or surface the disagreement. We have had reviewers from outside our organisation walk the full ledger twice. Both times we ended up changing two or three rows. That is what the audit is for.

what changes when confidence is a scalar

Three things change when you commit to a confidence number.

First, internal disagreements get resolved by argument over the weights rather than by argument over the sentence. "I think this is a 0.55, you think it is a 0.72, where exactly are we apart?" is a much more productive editorial conversation than "I am not sure about this one." The disagreement has a coordinate.

Second, the population of published claims can be retrospectively evaluated. Six months later, we audit a stratified sample of last quarter's claims at confidence band 0.60 to 0.70, and another at 0.85+, and we see whether the higher band has produced fewer corrections than the lower band. If it has not, the weights are wrong, and we tune. Without a number, every error becomes a story, and the stories never converge.

Third, the scalar gives the subject of a report a coordinate to argue against. If we publish a claim at 0.78 and the subject's counsel writes with two pieces of counter-evidence we had not seen, we recompute. The new number might be 0.61, still above floor, with a published note. It might be 0.43, below floor, and we retract. The right-of-reply is no longer a moral abstraction. It is a delta on a number we already wrote down.

the right-of-reply isn't a courtesy

If the subject of the report is a person, I write them first.

If the subject is an institution, I write the named press contact and counsel, and I show them, in the letter, exactly the claim I am going to publish, the confidence I am going to assign it, and the sources behind it. I give a deadline. Usually seven business days. The deadline is the deadline regardless of the response. What can change is the confidence number, on the basis of what comes back. Most of the time something does come back. Sometimes the response moves the number up. Sometimes down. Sometimes it kills the story. All three are correct outcomes.

Newsroom culture has internalised the right-of-reply as a courtesy. It is not. It is an evidence-gathering step. The subject of a claim is, by definition, a primary source. Treating them as such is not deference. It is method.

what this is not

This is not a license to publish anything you can put a number on. The floor of 0.60 is meaningful only because it is paired with a floor of three sources and two channels. A confident-feeling claim, sourced from one channel, with a single artifact, is not a 0.60 claim regardless of how confident the reporter is. The structure prevents that. Reporters who try to back into a high number to override the source floor get pushed back, every time, in the second-editor pass.

It is also not a substitute for shoe leather. Most of the work, still, is reading documents, sending letters, calling people, waiting for FOIA responses, going to courthouses. The scalar does not generate evidence; it accounts for it. If the underlying corroboration work has not been done, no number on top of it is meaningful.

If you operate a newsroom, a research desk, an OSINT volunteer team, or any other organisation that publishes claims about people who did not ask to be written about, I would offer two unsolicited suggestions.

Pick a floor. Write it down. Publish it on a methodology page so your readers know what you mean. Closed Press's page does this, and so does ours, and we both intend the public commitment as a constraint on our own behavior. Floor below which we defer: 0.60. Floor below which you defer: yours to set. The point is not the specific number. The point is that there is one.

Then keep the ledger. It is the boring administrative work. It is also the thing that makes the publication defensible six months later when a defendant's counsel asks to see your math.