DEFENSE DEPTH AND STRUCTURAL COHERENCE
A Formal Framework for Theory Evaluation in the Information Age
David Lowe
December 2025
Abstract
Contemporary academic evaluation relies on proxy metrics — citation count, impact factor, peer-review consensus — that measure the diffusion of a theory rather than its epistemic durability. This monograph argues that such metrics are categorically inadequate: a theory may accrue widespread citation while remaining structurally vulnerable to a single well-formed objection. To address this deficit, we introduce two complementary evaluation systems. The Universal Theory Defense Grading System (UTDGS) quantifies a theory’s adversarial resilience — its capacity to anticipate, engage, and resolve objections — across five weighted dimensions. The Structural Coherence Invariants (SCI) framework identifies twelve domain-agnostic properties that any persistent information system must satisfy to avoid entropic collapse. Both systems are formally derived from first principles, operationally defined, and applied in case studies to General Relativity, String Theory, Quantum Interpretation pluralism, and a cross-domain unification framework designated Theophysics. We conclude with proposals for institutional adoption and explicit defeat conditions for the framework itself.
PART I: THE PROBLEM
1. The Fragmentation of Contemporary Science
The natural sciences have produced, over the last four centuries, a series of unifications that stand as the defining achievements of systematic inquiry. Newtonian mechanics united terrestrial and celestial motion under a single set of laws. Maxwell’s equations demonstrated that electricity, magnetism, and light are manifestations of a single underlying field. Einstein’s general relativity subsumed Newtonian gravity as a special case within a broader geometric framework. These achievements share a structural property: they resolved previously unconnected domains into a single explanatory system by identifying the deeper mechanism that generated the apparent diversity.
The post-1927 period presents a different picture. Quantum mechanics, the most empirically successful physical theory ever formulated, has generated not a single unified interpretation after a century of effort but rather a proliferating ecology of incompatible ontological proposals — Copenhagen, Many-Worlds, Pilot Wave, Relational, QBism, and their derivatives — none of which commands consensus among the theory’s leading practitioners. Cosmology has introduced Dark Matter and Dark Energy as placeholder variables to preserve the fit of the standard model against anomalous data, while the Hubble Tension — a statistically significant discrepancy between independent measurements of the universe’s expansion rate — remains unresolved. Consciousness research has produced a rich empirical literature of neural correlates while producing zero mechanistic theories of the subjective character of experience.
This monograph does not treat these failures as contingent or as problems amenable to incremental solution within current methodology. It treats them as diagnostic of a systematic methodological deficit. The specific hypothesis is as follows: the current evaluation framework of academic science rewards the diffusion of theories (citations, impact) rather than their durability (logical consistency, objection-resolution, scope bounding). A methodology that rewards diffusion generates the incentive structures for fragmentation. A methodology that rewards durability would generate the incentive structures for unification.
2. The Inadequacy of Current Evaluation Metrics
Academic evaluation currently relies on the following instruments:
Citation count measures how frequently a work appears in the reference lists of subsequent publications. It is a proxy for influence, not for correctness. The phlogiston theory of combustion was widely cited throughout the eighteenth century; its citation record provides no evidence of its validity. High-citation theories have been comprehensively refuted (e.g., steady-state cosmology); low-citation theories have proved foundational (e.g., Boltzmann’s statistical mechanics in its initial reception).
Impact factor measures the average citation rate of a journal, not of any specific claim. The assignment of a claim’s credibility on the basis of its publication venue is an institutional fallacy. High-impact journals retract papers; low-prestige venues have published results of lasting significance.
Peer review measures consensus among a credentialed community at a specific moment. Consensus is neither a sufficient nor a necessary condition for truth. Galileo’s heliocentrism, continental drift, and the bacterial origin of ulcers all lacked peer consensus at their initial formulation. Consensus regarding the luminiferous ether and the uniqueness of Euclidean geometry persisted for centuries before being overturned. Peer review is a gatekeeping mechanism, not a truth-detection mechanism.
The H-index measures the productivity of an individual researcher, defined as the number h such that h papers have each been cited at least h times. It conflates quantity of output with quality of contribution. Prolific publication of incrementally modified results will generate a high H-index without advancing the state of knowledge.
Replication measures whether an experimental result is reproducible. Replicability is a necessary but not sufficient condition for the truth of a theoretical claim. A false hypothesis may generate reliably replicable experimental outcomes if the confounding variables are systematically ignored. The replication criterion, furthermore, cannot be applied to theoretical claims that generate no specific experimental predictions — a category that encompasses significant portions of contemporary physics and all of philosophy of science.
| Metric | What It Measures | Why It Is Insufficient |
|---|---|---|
| Citation Count | Popularity / Diffusion | Popular theories have been refuted; true theories have been unpopular |
| Impact Factor | Journal Prestige | Prestige tracks institutional consensus, not accuracy |
| Peer Review | Gatekeeping Consensus | Consensus is neither necessary nor sufficient for truth |
| H-Index | Researcher Productivity | Volume of output does not correlate with durability of claims |
| Replication | Reproducibility | Necessary but not sufficient; does not apply to non-predictive theories |
3. The Missing Dimension: Defense Depth
No current metric measures the following property: given the strongest possible objection to a theoretical claim, how well does the theory respond? This property — which we term Defense Depth — is the primary determinant of a theory’s long-term survival. A theory that has never been pressed against its strongest objectors has not been tested in any epistemically meaningful sense. Its citation count, impact factor, and replication rate tell us how widely it has been accepted, not how well it would survive adversarial scrutiny.
The absence of a Defense Depth metric has a predictable consequence: academic incentives push authors toward the presentation of claims without the anticipation of serious objections. A paper that identifies its own strongest critics and responds to them in detail is longer, harder to write, and more vulnerable to specific rebuttal than a paper that asserts its claims on thin evidential grounds. Under current incentive structures, the rational strategy is to under-defend. The result is a literature systematically organized around assertions rather than around the resolution of disagreements.
This monograph proposes that the unit of epistemic measurement should shift from the claim (what is asserted) to the claim-objection-response chain (what survives scrutiny). A claim defended to five layers of logical depth under steelmanned adversarial pressure is of a categorically different epistemic status than a claim supported only by citation. The metrics introduced in Part II operationalize this distinction.
PART II: FORMAL FRAMEWORK
4. Axioms of Theoretical Robustness
The following axioms are the minimal commitments required to ground the evaluation framework. Each is stated formally, defended independently, and accompanied by its defeat condition.
Axiom 1: Internal Consistency Requirement
Any theoretical system that contains or entails a logical contradiction is epistemically disqualified regardless of its empirical track record.
Defense: The principle of explosion (ex contradictione quodlibet) establishes that from a contradiction, any proposition can be derived. A contradictory theory therefore generates no constraints on what is true, and its predictive successes carry no evidential weight — they are logically trivial consequences of an inconsistent premise set. This is a formal property of classical logic, not a substantive metaphysical commitment.
Defeat Condition: This axiom would be defeated by a demonstration that a logically contradictory theory generates reliable, non-trivial predictions that are not derivable from any consistent theory. No such demonstration has been produced. Paraconsistent logics, which tolerate contradiction without explosion, represent a live alternative but require explicit adoption as a foundational choice — they cannot be invoked post-hoc to rescue a theory from identified contradictions.
Axiom 2: Scope Bounding Requirement
Any theoretical system that cannot specify the class of observations that would falsify it is not making a determinate claim about reality.
Defense: A theory that can accommodate any possible observational result by adjusting its parameters or postulating auxiliary hypotheses does not constrain the world — it merely redescribes it. Popper’s falsifiability criterion captures this in its canonical form: a scientific theory must exclude some possible states of affairs. This axiom generalizes that requirement beyond the domain of empirically testable claims to any theoretical claim, including formal and philosophical ones. A philosophical theory that can be made consistent with any conceivable evidence by reinterpretation is not explaining anything.
Defeat Condition: This axiom would be defeated by a demonstration that an unfalsifiable theory is capable of generating genuine explanatory closure — that is, of answering ‘why’ questions in a way that excludes alternative answers and is not equivalent to tautology. The burden of demonstration is on the proponent of the unfalsifiable theory.
Axiom 3: Update Capacity Requirement
Any theoretical system that cannot, in principle, be revised in response to new evidence or valid logical argument is not functioning as an epistemic instrument.
Defense: The function of a theory is to track truth. A theory that has no mechanism for revision in response to disconfirming evidence is committed to its current claims regardless of what the world turns out to be like. This is equivalent to abandoning the epistemic function of theory entirely. Note that this axiom does not require that theories be revised at every anomaly — it requires only that revision be possible in principle and that the conditions for revision be specified.
Defeat Condition: This axiom would be defeated by a demonstration that a permanently fixed theoretical system — one that explicitly forecloses all revision — reliably converges on truth. This would require an account of how a non-revisable system could be sensitive to the distinction between true and false propositions.
Axiom 4: Logical Transparency Requirement
Any theoretical system making non-trivial claims must be capable of rendering its major inferential steps explicit, such that the logical dependency structure between claims is auditable.
Defense: A theory whose conclusions cannot be derived from its stated premises via auditable logical steps is not a theory but an assertion. The requirement of logical transparency does not demand formal proof in every case, but it does require that the inference from premises to conclusions be reconstructable by a competent reader. Hidden inferential steps, where conclusions are presented as following from premises that do not in fact entail them, are a pervasive source of epistemic error.
Defeat Condition: This axiom would be defeated by a demonstration that some class of genuine theoretical knowledge is in principle non-articulable — that valid inferences can be identified as valid without being stateable. Tacit knowledge of the kind identified by Polanyi may represent a weaker challenge, but Polanyi’s account of tacit knowledge does not entail that the implicit can never be made explicit; it merely notes that not everything has yet been made explicit.
5. Defense Depth: Formal Definition
Defense Depth is the measure of a theory’s capacity to sustain its central claims against adversarial pressure. We define it formally as a function of five components, each of which measures a distinct structural property of the claim-defense architecture.
5.1 The Width Principle
The required depth of defense scales with the controversy level of the claim. A claim that is uncontested within the relevant field requires less extensive defense than a claim that challenges the field’s foundational commitments. We formalize this as the Width Principle:
The required defense width W(c) for a claim c is a monotonically increasing function of the controversy level K(c) of that claim.
Practically: a low-controversy empirical claim (“water boils at 100°C at standard pressure”) requires only a claim, the supporting evidence, and a response to obvious objections. A high-controversy theoretical claim (“consciousness is not reducible to physical processes”) requires, in addition, explicit engagement with the strongest counterarguments, deep evidentiary grounding, and specification of the conditions under which the claim would be abandoned. The Width Principle establishes that under-defended high-controversy claims are automatically suspect, regardless of their citation count.
| Controversy Level | Example Claim | Required Defense Width |
|---|---|---|
| Low | ”Water boils at 100°C” | 3 levels: Claim → Evidence → Objection Response |
| Moderate | ”Consciousness is substrate-independent” | 4 levels: + Mechanistic Grounding |
| High | ”Standard cosmological model requires revision” | 5 levels: + Foundational Axioms |
| Maximal | ”Physics and theology describe the same domain” | 5+ levels: + Meta-Grounding + Defeat Conditions |
5.2 The Five Defense Components
The UTDGS scores a theory on the following five dimensions:
Component 1 — Objection Anticipation (25% weight). This dimension measures whether the theory proactively identifies its strongest critics. A theory that does not anticipate its objectors has not been pre-tested adversarially. Operationally: we assess the presence and specificity of language acknowledging potential counterarguments (“one might object that…”; “critics have argued…”; “the strongest challenge to this position is…”). Theories that contain no such language score zero on this dimension.
Component 2 — Response Strength (25% weight). This dimension measures the persuasive force of the theory’s responses to the objections it identifies. A theory that identifies objections but fails to resolve them — or resolves them only by appeal to authority, dismissal, or deferral to future work — scores poorly here. Operationally: we assess the presence of resolution language (“this objection fails because…”; “the apparent conflict dissolves when we note that…”) and the logical adequacy of the resolution.
Component 3 — Evidence Depth (20% weight). This dimension measures how deep the evidentiary chain extends from the theory’s claims. We distinguish five levels of evidential grounding: (1) bare assertion; (2) citation of authority; (3) mechanistic explanation; (4) grounding in foundational axioms; (5) demonstration that denial of the axioms leads to contradiction. Most published papers operate at levels 1-2. Theories grounded at levels 4-5 are epistemically robust against the obsolescence of the authorities cited at levels 1-2.
Component 4 — Chain Completeness (15% weight). This dimension measures whether the logical chains from premises to conclusions are complete, with no inferential gaps that the reader must fill implicitly. Operationally: we identify each major inferential step and assess whether it is stated explicitly and whether it is valid. Incomplete chains — where a conclusion is presented as following from premises that do not in fact entail it — constitute logical debt that represents unresolved vulnerability.
Component 5 — Width Adequacy (15% weight). This dimension assesses whether the depth of defense is appropriate to the controversy level of the claim, applying the Width Principle. A high-controversy claim defended at only two levels of depth is penalized, regardless of how well it performs on the other four dimensions.
6. Structural Coherence Invariants
The Structural Coherence Invariants (SCI) framework identifies twelve properties that any persistent information system — whether a physical structure, a biological organism, a social institution, or a theoretical framework — must satisfy to avoid entropic collapse. The framework draws on the formal literature in systems theory, information theory, and thermodynamics. The twelve properties are presented below with their formal definitions, system functions, and failure modes.
We note at the outset that these properties map onto a classical ethical and theological vocabulary. This mapping is not incidental — it reflects the fact that the classical vocabulary was developed precisely to characterize the properties of systems that persist. We adopt the mapping explicitly, using the traditional names as labels for formally defined structural properties, while insisting that the properties themselves are domain-agnostic and carry no normative weight beyond the claim that systems which lack them collapse.
| Invariant (Label) | Formal Definition | System Function | Failure Mode |
|---|---|---|---|
| F1 — Grace | Entropy absorption capacity | Recovery from local failures without systemic collapse | Brittle failure under first anomaly |
| F2 — Hope | Non-terminal failure states | Architecture permits future state-space expansion | Deadlock; systemic despair |
| F3 — Patience | Iterative convergence | Accuracy improves over successive refinement epochs | Premature optimization; drift |
| F4 — Faithfulness | Signal fidelity over time | Core axiomatic structure maintained under pressure | Corruption; conceptual drift |
| F5 — Self-Control | Scope bounding | Explicit definition of what the theory cannot explain | Unfalsifiability; totalizing claims |
| F6 — Love | Positive-sum orientation | Integration with adjacent systems generates value | Parasitic; zero-sum collapse |
| F7 — Peace | Internal consistency | Absence of logical contradictions in the axiom set | Logical explosion; self-negation |
| F8 — Truth | Signal-to-reality correspondence | High correlation between model and observation | Hallucination; narrative override |
| F9 — Humility | Update capacity | Mechanism for revising priors based on new evidence | Dogmatic calcification |
| F10 — Goodness | Generative surplus | System produces more order than it consumes | Entropic decay; parasitism |
| F11 — Unity | Integration across sub-domains | Coherence without siloing; no internal fragmentation | Fracture; silo formation |
| F12 — Joy | Positive feedback resonance | Self-sustaining motivation in practitioners and sub-systems | Burnout; apathy attractors |
6.1 The Necessity Argument
The claim of the SCI framework is stronger than the claim that these properties are beneficial. We claim that they are necessary — that a system which lacks any one of them will, over sufficient time, collapse under entropic pressure. The necessity argument proceeds as follows:
A system lacking F1 (Grace / Error Absorption) has no mechanism for recovering from errors. Since errors are statistically inevitable in any system operating in a non-trivial environment, such a system will be destroyed by its first non-trivial error. No graceless system can persist indefinitely.
A system lacking F7 (Peace / Internal Consistency) contains at least one contradiction in its axiom set. By the principle of explosion in classical logic, such a system entails every proposition. A system that entails everything has no predictive or explanatory power. No inconsistent system can function as an epistemic instrument.
A system lacking F9 (Humility / Update Capacity) cannot revise its beliefs in response to evidence. Such a system will eventually encounter evidence that its current beliefs fail to accommodate. Unable to revise, it must either ignore the evidence (generating increasing divergence from reality) or collapse. No non-updatable system can track a changing world.
A system lacking F5 (Self-Control / Scope Bounding) makes claims about everything. A theory that claims to explain everything is unfalsifiable in practice, since any apparent counterevidence can be reinterpreted as a confirmation within an unbounded theoretical vocabulary. Such a theory does not constrain reality; it merely redescribes it.
These four arguments establish the necessity of F1, F5, F7, and F9. The necessity arguments for the remaining invariants follow analogous structures and are not reproduced here in full. The key structural point is that each invariant addresses a specific collapse mode: a system that satisfies all twelve has defended itself against all twelve collapse modes simultaneously.
7. The UTDGS Scoring Algorithm
The UTDGS produces a composite score from 0 to 100 for any text-based theoretical claim. The algorithm operates on the following procedure:
Step 1: Controversy Calibration. The claim is assessed for its controversy level on a scale of 1-5, where 1 represents uncontested empirical fact and 5 represents fundamental challenge to a field’s foundational assumptions. This calibration determines the Width Adequacy baseline for Component 5.
Step 2: Objection Identification. The evaluator identifies all objections acknowledged within the text, categorized as: (a) proactive objections anticipated by the author; (b) implicit objections addressed without explicit acknowledgment; (c) objections that are absent but should be present given the controversy level. Category (a) receives full weight; (b) receives partial weight; (c) generates a penalty.
Step 3: Response Assessment. For each identified objection, the response is assessed on a four-point scale: 0 (absent); 1 (acknowledged but deferred); 2 (partially addressed); 3 (fully resolved with explicit logical grounding). The mean response score, weighted by the strength of the objection, generates the Component 2 score.
Step 4: Evidence Chain Mapping. The inferential structure from premises to conclusions is mapped, and the deepest point of grounding for each major claim is identified according to the five-level evidence depth taxonomy defined in Section 5.2. The mean depth, weighted by the centrality of the claim, generates the Component 3 score.
Step 5: Chain Completeness Audit. Each major inferential step is assessed for validity and explicitness. The proportion of steps that are both valid and explicit generates the Component 4 score. Gaps are flagged as logical debt.
Step 6: Width Adequacy Check. The achieved defense width (the depth to which the most contested claims are defended) is compared to the required width (derived from the controversy level established in Step 1). Shortfalls generate proportional penalties in Component 5.
Step 7: Composite Score. The five component scores are combined at the weights specified in Section 5.2, producing a composite UTDGS score in the range [0, 100].
PART III: APPLICATIONS
8. Case Study I — General Relativity
General Relativity (GR) is widely acknowledged as one of the most precisely confirmed physical theories in the history of science. Its empirical success is not in dispute. The present analysis is not an assessment of its empirical adequacy but of its structural properties as an epistemic document.
8.1 UTDGS Analysis
Objection Anticipation: Einstein’s original 1915 paper and subsequent elaborations by Einstein himself, Eddington, and others explicitly engage with the alternative theories available at the time (Newtonian gravity, Nordström’s scalar theory). GR’s theoretical development includes explicit comparative engagement with predecessor frameworks. Score: High.
Response Strength: The three classical tests of GR (Mercury’s perihelion precession, light deflection by the Sun, gravitational redshift) were proposed by Einstein as specific empirical predictions distinguishing GR from Newtonian mechanics. The theory was presented with explicit defeat conditions. Score: High.
Evidence Depth: GR’s foundational structure is grounded in the Equivalence Principle and the requirement of general covariance, both of which are stated as axioms with explicit motivations. The inferential chain from these axioms to the field equations, while mathematically dense, is in principle auditable. Score: Moderate to High.
Chain Completeness: The formal derivation of GR’s field equations is complete within the mathematical framework of differential geometry. However, the physical interpretation of the formalism — particularly regarding the meaning of spacetime curvature — has been the subject of ongoing philosophical dispute. Score: Moderate.
Width Adequacy: GR’s principal outstanding challenge — its incompatibility with quantum mechanics — is widely acknowledged but has not been resolved. The theory’s documentation does not contain a resolution of this objection; it acknowledges it as an open problem. Score: Moderate (appropriate for a resolved theory with one remaining open frontier).
8.2 SCI Analysis
F7 (Internal Consistency): GR is internally consistent within its own domain. However, taken together with quantum mechanics, the two theories are mutually inconsistent at Planck scales. The inconsistency is with the broader theoretical framework of fundamental physics, not within GR itself. Conditional pass.
F5 (Scope Bounding): GR explicitly restricts itself to the domain of macroscopic spacetime geometry. It does not claim to describe quantum phenomena. Pass.
F9 (Update Capacity): GR has been refined (e.g., in its application to cosmology) and has absorbed anomalies (e.g., by introducing the cosmological constant, later removed and then reintroduced). The theory has demonstrated some update capacity, though resistance to fundamental revision in the direction of quantum gravity has been notable. Partial pass.
F11 (Integration): GR’s failure to integrate with quantum mechanics represents a significant deficit on this invariant. The most significant open problem in fundamental physics is precisely this integration failure. This is an identified structural weakness.
9. Case Study II — String Theory
String Theory is a research programme, rather than a settled theory, that proposes to unify the four fundamental forces by modeling fundamental particles as one-dimensional extended objects (strings) rather than point particles. It is included here because it represents a clear instantiation of structural fragility under the SCI framework, and its case illustrates the diagnostic utility of the invariants.
9.1 UTDGS Analysis
Objection Anticipation: The String Theory literature does contain engagement with objections, particularly since the development of the landscape problem and the critiques by Smolin, Woit, and others. However, much of this engagement is reactive rather than proactive. Score: Moderate.
Response Strength: The most serious objection to String Theory — that it generates no specific, testable predictions distinguishable from those of the Standard Model at currently accessible energies — has not been resolved. The standard response (that the theory is correct but its predictions lie beyond current experimental reach) is a deferral, not a resolution. Score: Low to Moderate.
Width Adequacy: Given that String Theory makes fundamental claims about the nature of spacetime, matter, and unification, its controversy level is maximal. The defense width achieved by the theory’s documentation falls substantially below what the Width Principle requires. Score: Low.
9.2 SCI Analysis
F5 (Scope Bounding): The landscape problem — the existence of an estimated 10^500 distinct vacuum states, each corresponding to a different low-energy physics — represents a near-total failure of scope bounding. In principle, any observed physics can be accommodated by some region of the string theory landscape. This is the structural failure mode of unfalsifiability. Fail.
F9 (Update Capacity): The theory has been extended and revised extensively over its history (bosonic strings, superstrings, M-theory, the landscape). However, revisions have uniformly been in the direction of expansion rather than falsification. No revision has resulted in the elimination of predictions or the reduction of free parameters. The update capacity is asymmetric: the theory grows to accommodate challenges rather than being refined by them. Partial Fail.
F8 (Truth / Signal-Reality Match): String Theory’s signal-to-reality correspondence is currently undefined, since the theory makes no predictions distinguishable from the Standard Model at accessible energies. A theory that generates no distinguishable empirical signal cannot score on this invariant. Indeterminate.
10. Case Study III — Quantum Interpretation Pluralism
The plurality of quantum mechanical interpretations is presented here not as a problem to be resolved but as a diagnostic indicator. The coexistence of twelve or more mutually incompatible interpretations — each claiming to describe the ontological content of the same formalism — after a century of effort, constitutes evidence of a structural failure in the methodology by which the interpretations have been evaluated.
The interpretations (Copenhagen, Many-Worlds, Pilot Wave, Relational, QBism, Consistent Histories, and their variants) all share a common property: they reproduce the empirical predictions of the quantum formalism exactly. They differ in their ontological commitments — in what they claim exists and how measurement outcomes come about. Since the empirical predictions are identical, empirical evidence cannot discriminate between them. The standard methodology of theory choice by empirical adequacy is therefore impotent in this domain.
The SCI framework offers a different mode of discrimination. We can ask, for each interpretation: Does it satisfy F7 (Internal Consistency)? The Measurement Problem — the problem of explaining why quantum superpositions appear to collapse to definite outcomes under observation — represents an internal inconsistency in standard formulations of quantum mechanics, since the linear Schrödinger equation does not predict collapse, but the Born rule applied to measurement outcomes presupposes it. Any interpretation that resolves this inconsistency scores higher on F7 than one that defers it as a measurement postulate. We can ask: Does it satisfy F5 (Scope Bounding)? The Many-Worlds interpretation, for example, posits an unbounded ontology of branching universes that is in principle unobservable; its scope bounding is compromised by the same logic that compromises String Theory’s. We can ask: Does it satisfy F9 (Update Capacity)? An interpretation that specifies what observations would cause its revision is epistemically preferable to one that is formulated so as to be consistent with any possible result.
These discriminations do not resolve the interpretational debate. They provide a structural vocabulary for characterizing why the debate has been intractable: none of the competing interpretations satisfies all twelve invariants, and the current methodology has no instrument for identifying this fact.
11. Case Study IV — Theophysics as a Cross-Domain Framework
The preceding case studies apply the UTDGS and SCI framework to established scientific theories. This case study applies the same framework to a recently developed cross-domain unification framework designated Theophysics, which proposes to unify physics, information theory, and theology under a common axiomatic structure. The framework is the work of a single investigator and has not undergone peer review. It is included here precisely because its structural properties differ from those of the established theories in ways that illuminate the diagnostic value of the metrics.
11.1 Structural Design
The Theophysics framework was developed, according to its author’s documentation, in reverse order from the standard theoretical methodology. Rather than beginning with observations and constructing hypotheses to explain them, the framework began with the identification of the strongest possible objections to any theory that crosses domain boundaries, and constructed its axioms to survive those objections. This procedure is formally equivalent to designing a system for adversarial robustness: the architecture is determined by the attack surface.
The framework makes explicit use of what we have termed the Defense Lattice structure: for each major claim, a set of “kill conditions” is specified — observational or logical results that would falsify the claim. This is a direct implementation of the Scope Bounding requirement (F5) and the Update Capacity requirement (F9) at the level of theoretical architecture rather than as post-hoc additions.
11.2 UTDGS Analysis
Objection Anticipation: The framework explicitly identifies and engages the major competing positions — Materialism, Nihilism, Theodicy, Logical Positivism, and Instrumentalism — as adversarial cases to be addressed before axioms are accepted. Score: High.
Response Strength: Responses to the identified objections ground themselves in formal logic (information theory, entropy, formal semantics) rather than in theological assertion. The framework explicitly separates the formal structure of its claims from the theological vocabulary used to label them, a distinction that preserves the logical force of the responses in non-theological contexts. Score: High.
Evidence Depth: The framework chains from theological terms to formal information-theoretic definitions to thermodynamic properties, reaching levels 4-5 on the evidence depth scale for its central claims. Score: High.
Limitations: The framework is a single investigator’s work and has not been subjected to systematic peer examination. The UTDGS scores reflect the structure of the documentation as written; they do not validate the underlying claims. Independent adversarial testing is required to confirm that the identified “kill conditions” are genuine and that the responses to objections are logically adequate.
11.3 SCI Analysis
F1 (Grace / Error Absorption): The framework explicitly distinguishes between its Primitives (foundational commitments that anchor the entire system) and its Stances (derived positions that can be revised without structural collapse). This architectural distinction is a direct implementation of error absorption: errors in derived positions do not propagate back to the foundational level. Pass.
F7 (Peace / Internal Consistency): The framework’s central architectural constraint is that no axiom is accepted unless it holds within both physics and theology. This cross-domain consistency requirement functions as an ongoing internal coherence audit. The theoretical unification claim itself — that the two domains describe the same underlying structure — is the primary source of what internal consistency the framework achieves. Conditional Pass.
F5 (Self-Control / Scope Bounding): The framework limits its claims to “the observable consequences of moral coherence” and explicitly identifies its theological primitives as irreducible rather than derivable. This is a scope boundary, though a broader one than those of the preceding case studies. Score: Moderate.
F11 (Unity / Integration): The framework was explicitly engineered to maximize this invariant. The central research constraint — that no axiom is accepted unless it holds across domains — generates integration as a design property rather than an emergent one. Score: High.
| Metric | General Relativity | String Theory | Quantum Interpretation Plurality | Theophysics |
|---|---|---|---|---|
| Defense Depth (UTDGS) | High | Low | Indeterminate | High |
| F5 — Scope Bounding | High | Fail | Mixed | Moderate |
| F7 — Internal Consistency | Pass (conditional) | Pass (internal) | Fail (Measurement Problem) | Conditional Pass |
| F9 — Update Capacity | Moderate | Partial Fail | Low | High |
| F11 — Integration | Fail (QM gap) | Moderate | Fail | High |
| Empirical Validation | Very High | Indeterminate | High (formalism) | Early Stage |
PART IV: LIMITS AND IMPLICATIONS
12. Objections and Responses
12.1 Objection: The UTDGS is itself subject to the Width Principle
The UTDGS makes a high-controversy claim — that defense depth is a primary determinant of epistemic value — without itself having been subjected to the adversarial scrutiny it prescribes. The framework’s own controversy level is high; it should therefore be defended to the depth it requires of others.
Response: This objection is correct, and we accept its force. The present document is an attempt to satisfy the Width Principle as applied to the UTDGS itself. The axiomatic structure of Part II is designed to ground the framework’s claims at levels 4-5 on the evidence depth scale. The defeat conditions specified in Section 13 are an explicit response to the Width Adequacy requirement. We acknowledge that the current document represents an initial attempt at this defense and that independent adversarial testing is required before the framework can be considered adequately defended.
12.2 Objection: The SCI invariants are not domain-agnostic — they encode a specific value system
The mapping of the SCI invariants onto the classical theological vocabulary (Grace, Humility, Love, etc.) is not incidental but indicative. The framework is covertly importing a normative agenda under the guise of structural necessity.
Response: The mapping to traditional vocabulary is heuristic rather than constitutive. The formal definitions given in Section 6 are stated in terms of system properties — entropy absorption capacity, update mechanisms, scope bounding — that make no reference to the theological vocabulary. The question of whether these properties are necessary for systemic persistence is a claim about information theory and systems dynamics, not about theology. The theological vocabulary is used because it has historically tracked the same structural properties; but the validity of the framework does not depend on the validity of the theology. An evaluator who finds the theological vocabulary misleading should replace the labels with the formal definitions throughout; the structural claims remain unchanged.
12.3 Objection: The UTDGS can be gamed by writing lengthy defenses of weak claims
A theory could score highly on UTDGS by anticipating many objections with superficially plausible responses, without any of those responses being logically adequate. The metric measures the presence of defense structure, not its quality.
Response: This objection identifies a genuine limitation of the metric as an automated scoring system. The Component 2 assessment (Response Strength) is intended to address this: it explicitly assesses the logical adequacy of responses, not merely their presence. However, the assessment of logical adequacy does require genuine evaluation rather than keyword detection. We acknowledge that UTDGS is most useful as a screening instrument and a structural prompt — it identifies the presence and depth of defense architecture — and that the quality of individual responses requires substantive evaluation by a competent reader. The metric is not a substitute for philosophical judgment; it is a structural instrument for organizing the exercise of that judgment.
12.4 Objection: Defense depth is retrospective — it measures how well a theory has defended itself against known objections, not how well it would survive unknown ones
The strongest objections to any theory are precisely the ones that have not yet been formulated. A theory that has addressed all known objections may be structurally unprepared for the objections that will emerge as the field develops.
Response: This objection is correct and identifies a fundamental limitation of any adversarial evaluation framework. The UTDGS measures robustness against the class of objections that have been formulated at the time of evaluation. It cannot measure robustness against future objections that have not yet been articulated. This is a limitation of the metric, not a defeat of the framework. The response to this limitation is twofold: (1) the SCI invariants provide a prospective complement to the retrospective UTDGS — a theory that satisfies all twelve invariants has structural properties that should generalize to novel challenges; (2) the framework explicitly includes Update Capacity (F9) as a required invariant, specifically to address the case where novel challenges cannot be anticipated.
13. Defeat Conditions for the Framework
The following conditions would, if demonstrated, constitute genuine defeats of the framework’s central claims. We state them explicitly as a requirement of the Scope Bounding invariant (F5).
Defeat of the UTDGS Central Claim: If it were demonstrated that theories scoring highly on UTDGS systematically fare worse in long-run predictive success than theories scoring poorly, the claim that defense depth is a primary determinant of epistemic value would be defeated. This requires longitudinal empirical study of the relationship between structural defense properties and predictive success over multi-decade timescales.
Defeat of the SCI Necessity Claim: If it were demonstrated that a system lacking one of the twelve identified invariants nevertheless persists indefinitely without structural modification, the necessity claim for that invariant would be defeated. Note that the defeat condition requires persistence without modification — a system that survives by acquiring the missing invariant does not constitute a counterexample.
Defeat of the Domain-Agnosticity Claim: If it were demonstrated that the twelve invariants apply to some classes of system but not others, and that the boundary condition cannot be specified by the framework, the domain-agnosticity claim would be defeated. The framework would then apply only to the identified class of systems.
Defeat of the Cross-Domain Application Claim: If it were demonstrated that the SCI invariants, when applied to theoretical frameworks, fail to track any property that competent theorists recognize as theoretically significant, the claim that the framework provides a useful evaluation instrument would be defeated. This requires showing that SCI scores are uncorrelated with expert quality assessments across a suitably diverse sample of theories.
14. Implications for Institutional Practice
The framework has several implications for academic evaluation practice if its central claims are accepted. We state these as proposals subject to empirical test, not as recommendations.
Proposal 1 — Defense Documentation Requirement. Academic journals could require, as a condition of publication for high-controversy claims (controversy level 3 or higher on the UTDGS scale), explicit statement of: (a) the three to five strongest objections to the central claim; (b) substantive responses to each; (c) explicit specification of defeat conditions. This would impose no requirement on the content of the responses but would require that they be present.
Proposal 2 — Structural Audit in Dissertation Evaluation. PhD committees could assess dissertations for the structural properties identified by the SCI invariants, specifically: (a) internal consistency (F7); (b) scope bounding (F5); (c) update capacity (F9); (d) error absorption architecture (F1). These assessments do not require the committee to evaluate the truth of the dissertation’s claims; they require assessment of its structural properties.
Proposal 3 — Public Defense Depth Scoring. An independent institution could publish UTDGS and SCI scores for canonical theories in each field, providing a public record of which claims have been adequately defended against adversarial scrutiny and which have not. This would introduce a comparative dimension into theory evaluation that current metrics do not provide.
We note that each of these proposals is itself subject to the Width Principle: they are high-controversy claims about institutional change and should be defended to greater depth than this monograph provides. The present document establishes the conceptual foundations for such proposals; it does not constitute a complete defense of them as policy recommendations.
15. Conclusion
This monograph has argued that contemporary academic evaluation is categorically inadequate as a truth-detection mechanism. The proxy metrics currently in use — citation count, impact factor, peer review consensus, H-index, replication — measure the diffusion and institutional acceptance of theories, not their structural durability under adversarial scrutiny. The consequence of optimizing for diffusion rather than durability is a systematic incentive toward under-defended claims, which generates the conditions for academic fragmentation.
As an alternative, we have introduced two evaluation systems grounded in formal considerations about information systems and logical structure. The UTDGS quantifies adversarial resilience across five weighted components, operationalizing the Width Principle: the required defense depth scales with the controversy level of the claim. The SCI framework identifies twelve structural invariants whose necessity is grounded in arguments from information theory and systems dynamics, and whose absence in any system predicts specific, identifiable collapse modes.
These frameworks were applied to four case studies: General Relativity, String Theory, Quantum Interpretation pluralism, and Theophysics. The application revealed that the metrics discriminate between theories in ways that reflect genuine structural differences: GR’s empirical success is matched by substantial but not complete structural robustness; String Theory’s theoretical ambition is structurally undermined by its failure of scope bounding; Quantum Interpretation plurality represents an intractable dispute that current methodology cannot resolve but that the SCI framework can structurally characterize; and Theophysics demonstrates structural properties (defense lattice architecture, cross-domain integration requirement) that score highly on the proposed metrics while requiring empirical validation at the level of its specific claims.
The framework has been defended against four significant objections and has specified explicit defeat conditions. It is not a finished product. It is a proposal for a shifted methodology — one in which the primary question asked of any theoretical claim is not ‘how widely has this been accepted?’ but ‘how well does this survive its strongest critics?’ The metrics introduced here are instruments for applying that question systematically.
Truth, on this account, persists by structural coherence. The task of evaluation methodology is to measure coherence rather than popularity. This monograph has proposed instruments for doing so.
The framework’s formal implementation in Python is available for replication. Source code for the utdgs_scorer, fruits_scorer, and compare_theories modules enables independent auditing of the scoring algorithms and application to new corpora.
Canonical Hub: CANONICAL_INDEX