EpistemicaLab studies the epistemology of an AI-mediated science.

Frontier models now do most of the writing, most of the searching, and an increasing share of the reading. The bottleneck on scientific progress is shifting from generator capability to verifier capability — from what we can produce to what we can trust.

We build the methodology — the verifiers, audit protocols, claim formats, and provenance standards — that lets credible research keep pace as frontier capability accelerates.

Verifiers are the binding constraint.

Capability is bounded by what we can verify. The history of science is the history of verifier engineering — the microscope, the statistical test, peer review, the formal proof. AI accelerates the generator side dramatically, but verifiers are what compound. A claim with a fast, cheap, model-independent verifier becomes a building block. A claim without one rots into vibes-research, regardless of how confidently a frontier model produced it.

Our research focuses on the verifier engineering that turns unbounded LLM output into bounded research output: cross-family audit protocols, paired-bootstrap intervals, oracle-hashing, formal proof checkers, and identifiability-error metrics. We treat verifier_quality as the load-bearing factor in the equation research_capability = question_quality × generator × verifier × iteration_speed × compute, because verifier quality, more than generator quality, decides whether a research output can be trusted.

Claims as first-class artifacts.

Treat the claim as the unit of research. Whatever the delivery format — paper, code, dataset, post — the value of the work depends on whether its individual claims can be independently traced, tested, and trusted. We design methods that make claims statable, testable, hashable, and version-controllable, so each one can be audited on its own.

No load-bearing claim survives a single model.

Cross-family auditing is non-negotiable. When one frontier model becomes the dominant generator and the dominant verifier, the field's blind spots become its consensus. The structure of disagreement matters: models from the same training culture cluster together, models tuned for deeper reasoning find errors lighter models miss, and switching to a different model family changes the verdict more often than scaling the same model up.

We treat any load-bearing claim as untrusted until it has been audited across at least three model families spanning different reasoning depths. A single-model evaluation is roughly the epistemic weight of a single-arm clinical trial: suggestive, not conclusive.

The lab studies its own methodology.

Method must be its own object of study. A lab that uses AI to do research must also study how AI changes research, or it inherits the failure modes silently. Several are visible already: monoculture (a few frontier models become the field's blind spot), barbell collapse (AI commodifies the middle of research careers), training-loop contamination (AI papers train next-generation models), cognitive offloading, attention-tragedy of the commons, and corpus poisoning. Each is treated as a research question with operationalised sub-questions and verifiable signals, not as a discussion topic.

Open work.

We share questions, verifiers, and negative results. Code is on GitHub; papers will appear on arXiv as they pass review. We believe that researchers who can read each other's verifiers, not only each other's papers, build a faster and more honest science.