Coherence Scoring Framework - Sensitivity Analysis Study
Scientific Validation of a Universal Coherence Metric
Ring 2 — Canonical Grounding
Ring 3 — Framework Connections
What’s In This Directory
This directory contains the complete methodology, source code, and results for our sensitivity analysis of the coherence scoring framework.
Files
| File | Description |
|---|---|
| METHODOLOGY_SENSITIVITY_ANALYSIS.md | Full methodology paper (READ THIS FIRST) |
| sensitivity_analyzer.py | Sensitivity test suite (ablation, topology, labels, adversarial) |
| score_moral_decay.py | Example application: scoring America’s moral decline |
| unified_scorer.py | Core coherence scoring engine |
| rubrics/ | YAML files defining all detection rules |
| results/ | Test outputs and reports |
Quick Start
1. Read the Methodology
Start with METHODOLOGY_SENSITIVITY_ANALYSIS.md to understand:
- What we’re testing
- Why we’re testing it this way
- How to falsify the framework
- Full transparency into our approach
2. Run the Tests
cd [this_directory]
python sensitivity_analyzer.pyThis will:
- Test all 31 components (ablation)
- Scramble graph topology
- Swap theological labels for neutral ones
- Run adversarial attacks
- Generate full report
Output: sensitivity_report.txt
3. Score Your Own Document
from unified_scorer import UnifiedCoherenceScorer
scorer = UnifiedCoherenceScorer(rubrics_path="rubrics/")
result = scorer.score_document(your_text, "Document Name")
print(f"Coherence (chi): {result.chi:.2f}/10")
print(f"Confidence (kappa): {result.kappa:.2%}")
print(f"Robustness (rho): {result.rho:.2%}")The Core Claim
Most frameworks tune parameters until they get the answer they want.
We don’t.
Instead, we test whether the structure itself is necessary:
- Ablation: Remove components → does system degrade?
- Topology: Change connections → does coherence collapse?
- Labels: Swap names → does math still work?
- Adversarial: Feed nonsense → does it correctly reject?
If the structure survives these tests, it’s not arbitrary.
Key Features
1. NO Parameter Tuning
We do not adjust weights to fit data. Components are either:
- Present (1.0)
- Absent (0.0)
No middle ground. No curve-fitting.
2. Full Transparency
All detection rules are in human-readable YAML files:
fruit_matrix.yaml- 12 Fruits of the Spiritvariable_rubric.yaml- 10 Master Equation variablesconstraint_rubric.yaml- 9 Universal constraintsdefense_rubric.yaml- Evidence quality metrics
No hidden logic. No black boxes.
3. Falsification-First
We define upfront what would prove us wrong:
- If <5 components are load-bearing → Framework is redundant
- If scrambled topology works equally → Structure is arbitrary
- If neutral labels fail → Framework is semantic, not structural
- If keyword spam scores high → Framework can be gamed
4. Open for Attack
We invite adversarial testing:
- Try to game the system
- Find counterexamples
- Break the structure
- Prove it’s no better than random
If you break it, we’ll fix it or admit failure.
Falsification Criteria
The framework FAILS if:
| Test | Failure Condition |
|---|---|
| Ablation | <5 components are load-bearing |
| Topology | Random graphs work equally well |
| Labels | Neutral labels cause >10% degradation |
| Adversarial | Keyword spam scores high |
| Null Hypothesis | Random functions perform equally |
The framework PASSES if:
| Test | Success Condition |
|---|---|
| Ablation | ≥10 components are load-bearing |
| Topology | Structure changes cause ≥15% degradation |
| Labels | Neutral labels cause <5% change |
| Adversarial | ≥66% of attacks correctly rejected |
| Null Hypothesis | Signal-to-noise ratio > 2.0 |
Current Status: Testing in progress
Preliminary Results
Test Document
Sample text describing Master Equation framework (~400 words)
Findings
Ablation (Component Necessity):
- Load-bearing components: 0/31 (0%)
- Interpretation: Either framework is too robust OR test document is too simple
Topology (Structure Sensitivity):
- Scrambled mappings: Δχ = +0.06 (+1.2%)
- Interpretation: Topology change did NOT degrade score
Label Independence: (In progress)
Adversarial Resistance: (In progress)
Concern
Framework may be TOO INSENSITIVE to structural changes.
Possible Causes:
- Test document is too simple
- Thresholds (10%, 15%, 5%) are too strict
- Framework genuinely is too robust
Next Steps:
- Test on diverse corpus (100+ documents)
- Include high-coherence (physics papers) and low-coherence (word salad)
- Calibrate thresholds against human expert ratings
Comparison to Other Frameworks
| Feature | Our Framework | Typical Frameworks |
|---|---|---|
| Parameter Tuning | NONE | Extensive |
| Structural Tests | YES | Rare |
| Open Source | YES | Rare |
| Falsifiable | YES | Difficult |
| Adversarial Invited | YES | No |
Key Differentiator: We test structure, not fit.
How to Replicate
System Requirements
- Python 3.9+
- numpy, pyyaml, pathlib
- Any modern PC (no GPU needed)
Installation
pip install numpy pyyamlRun Full Test Suite
python sensitivity_analyzer.py > report.txtExpected Runtime
- Ablation tests: ~2 minutes
- Topology tests: ~1 minute
- Label swap: ~30 seconds
- Adversarial: ~1 minute
- Total: ~5 minutes
Outputs
sensitivity_report.txt- Human-readable resultssensitivity_analysis_report.json- Machine-readable summary- Console output showing test progress
Invitation for Adversarial Testing
We Want You to Break This
Seriously. Try to:
- Game the system: Create documents that score high but are nonsense
- Find label dependence: Show that neutral labels break the math
- Prove structure doesn’t matter: Show random frameworks work equally
- Demonstrate curve-fitting: Prove we’re just pattern-matching
How to Submit
- Create your attack document/test
- Run it through the scorer
- Document unexpected behavior
- Submit to: [contact info]
Future Bounty Program
We plan to reward successful attacks:
- Prove framework is gameable → $[TBD]
- Prove label dependence → $[TBD]
- Prove no better than random → $[TBD]
Philosophical Foundation
Why This Matters
Most frameworks claim to be “scientific” but:
- Tune parameters until they fit
- Can’t be proven wrong
- Hide their logic
- Don’t invite attack
This is storytelling with equations, not science.
Our Standard
“A theory that cannot be falsified is not scientific.”
— Karl Popper
We define upfront what would prove us wrong, then invite the world to try.
If the framework survives, it earns credibility not through our claims, but through surviving attacks.
Status & Roadmap
Current Status: DRAFT
- ✅ Methodology documented
- ✅ Test suite implemented
- ⏳ Full testing in progress
- ⏳ Diverse corpus needed
- ⏳ Expert validation needed
Next Milestones
Phase 1: Complete Testing (Week 1)
- Run full test suite on 100+ documents
- Collect preliminary results
- Identify structural weaknesses
Phase 2: Expert Validation (Month 1)
- Collect human expert ratings
- Compare framework scores to expert scores
- Calibrate thresholds
Phase 3: Adversarial Hardening (Month 2)
- Invite attacks
- Fix identified weaknesses
- Retest
Phase 4: Public Release (Month 3)
- Publish methodology paper
- Release open-source code
- Launch public dashboard
Citation
(To be formatted upon publication)
License
[To be determined - likely MIT or CC-BY for academic use]
Contact
[To be added]
Acknowledgments
This work was developed as part of the Theophysics project, exploring the intersection of physics, theology, and information theory.
We are grateful to all who contribute critiques, attacks, and adversarial tests. Your skepticism makes this stronger.
Last Updated: January 11, 2026
Version: 1.0 (DRAFT)
Repository: [To be added]
Canonical Hub: CANONICAL_INDEX