Data Acquisition Plan: The “Two-Day Sweep”

Extracted from Methodology Notes (Codex Plan)

Overview

To gather all datasets with minimal friction, a structured two-day manual download plan is outlined below. The plan assumes no coding — just navigating websites, downloading files (CSV/XLS/PDF), and logging time.

Estimated Total Time: ~12—14 hours over 2 days.


Day 1: Acquire Society & Individual Domain Data (Approx. 7 hours)

1. Trust in Government — Pew dataset (30 min)

  • Source: Pew Research Center, “Public Trust in Government: 1958-2025” fact sheet.
  • URL: pewresearch.org (Search for “Public Trust in Government”)
  • Action: Click “Download data as .csv” link under the main trust trend chart.
  • Cleaning: Open CSV, verify 1958+ data. Reformat/transpose to one row per year.

2. General Social Trust — GSS data (45 min)

  • Source: GSS Data Explorer (gssdataexplorer.norc.org).
  • Variable: “TRUST” (General Social Survey, 1972-2022 Cumulative File).
  • Action: Extract trend report for TRUST over time or download full dataset.
  • Metric: % responding “Most people can be trusted”.
  • Cross-check: Use Pew/Commentary figures (1973: 47%, recent ~32%).

3. Crime Rate — FBI UCR (60 min)

  • Source: FBI Crime Data Explorer (cde.ucr.cjis.gov) or DisasterCenter.
  • Action:
    • Option 1: FBI CDE United States Violent Crime 1960-2020 Export CSV.
    • Option 2: DisasterCenter table Copy to Excel Save CSV.
  • Metric: Violent Crime rate per 100k.
  • Note: Append 2020-2022 from BJS reports if missing.

4. Income Gini — Census/FRED (30 min)

  • Source: FRED (Federal Reserve Economic Data).
  • Series: “GINI Index for the United States (SIPOVGINIUSA)“.
  • URL: fred.stlouisfed.org
  • Action: Download CSV (1967-2023).
  • Check: Verify against Census PDF reports.

5. Family (Unmarried Birth %) — CDC/NCHS (45 min)

  • Source: National Vital Statistics Report or Washington State OFM.
  • URL: ofm.wa.gov (Search “Percent of Births to Unmarried Mothers”)
  • Action: Copy year and % columns (1960-2021).
  • Metric: % of births to unmarried mothers.

6. Voter Turnout (30 min)

  • Source: United States Elections Project (electproject.org).
  • Action: Download “National Turnout Rates 1789—Present” CSV.
  • Metric: Turnout % of VAP (Voting Age Population).
  • Filter: Copy presidential years (1960+) into master dataset.

Wrap-up Day 1

  • Organize into master spreadsheet/database.
  • Check for anomalies/gaps.

Day 2: Acquire Logos/Physics Domain Data & Remaining Metrics (Approx. 6—7 hours)

1. Trust in Media — Gallup (45 min)

  • Source: Gallup News, “Americans’ Trust in Media”.
  • URL: news.gallup.com
  • Action: Find historical trend table (1972-present).
  • Data Points: e.g., 1972: ~68%; 1997: 53%; 2016: 32%; 2022: 34%.
  • Metric: % trusting mass media “fair amount” or “fully”.

2. Media Fragmentation — TV Audience (60 min)

  • Source: Nielsen reports, EBSCO Research Starters, textbooks.
  • Metric: % TV audience watching Big 3 networks.
  • Data Points:
    • 1950s-70s: >90%
    • 1990: ~61%
    • 2000: ~49%
  • Action: Create CSV of Year vs. Big3 Share. Interpolate if necessary.

3. Polarization — Voteview (45 min)

  • Source: Voteview.com.
  • Dataset: House/Senate Members CSV (HSall_members.csv) or Polarization Table.
  • Action: Download CSV.
  • Metric: Difference in party means (DW-NOMINATE) for each Congress.
  • Filter: 1971 onward.

4. Economic Policy Uncertainty (EPU) (15 min)

  • Source: policyuncertainty.com.
  • Dataset: US Monthly EPU Index.
  • Action: Download Excel/CSV.
  • Processing: Compute annual averages (1985-2025).

5. VIX Volatility Index (15 min)

  • Source: FRED.
  • Series: “VIXCLS”.
  • Action: Download CSV (1990-present).
  • Processing: Compute annual averages.

6. Suicide Rate — CDC (30 min)

  • Source: CDC WISQARS, HealthData.gov, or Kaggle.
  • Metric: Suicide mortality rate (age-adjusted) per 100k.
  • Filter: Total population, 1960-2020.
  • Check: 2000: ~10.4; 2018: ~14.2.

7. Life Expectancy — FRED (10 min)

  • Source: FRED.
  • Series: Life Expectancy at Birth (USA).
  • Action: Download CSV.

8. Overdose Deaths — CDC (30 min)

  • Source: CDC Wonder or NCHS Data Briefs.
  • Query: “All drug overdose deaths, 1999—2022, age-adjusted rate”.
  • Action: Export CSV.

9. Religious Attendance — Gallup/GSS (30 min)

  • Source: GSS Data Explorer (“ATTEND” variable) or Gallup reports.
  • Metric: % attending weekly.
  • Action: Generate trend report 1972-2022.

10. Quality Check & Assembly (60 min)

  • Compile all into PostgreSQL table structure or master CSV.
  • Align years.
  • Highlight gaps.
  • Scan for outliers.

Fred API Automation (The “Backbone”)

Use the FRED API to automate ~40-50% of the indicators.

Tier 1: FRED API Targets

  • Gini Index
  • Economic stability metrics
  • VIX
  • EPU
  • Life Expectancy
  • Labor metrics
  • Health proxies
  • Inflation / Volatility
  • Consumer Sentiment

Tier 2: Manual Downloads (Static)

  • Trust (Pew, Gallup, GSS)
  • Crime (FBI)
  • Family Structure (NVSS)
  • Polarization (Voteview)
  • Media Fragmentation (Nielsen)

Postgres Schema Target

CREATE TABLE tci_indicators (
    year SMALLINT PRIMARY KEY,
    trust_gov_pct REAL,        -- Pew
    social_trust_pct REAL,     -- GSS
    violent_crime_rate REAL,   -- FBI UCR
    gini_index REAL,           -- FRED
    unmarried_birth_pct REAL,  -- NVSS
    turnout_pct REAL,          -- Elections Project
    happy_pct REAL,            -- GSS
    suicide_rate REAL,         -- CDC
    life_expectancy REAL,      -- FRED
    overdose_rate REAL,        -- CDC
    weekly_worship_pct REAL,   -- GSS/Gallup
    media_trust_pct REAL,      -- Gallup
    tv_big3_share_pct REAL,    -- Nielsen/Manual
    cong_polarization REAL,    -- Voteview
    epu_index REAL,            -- PolicyUncertainty/FRED
    vix_index REAL,            -- FRED
    trust_gov_std REAL         -- Derived volatility
);