Data Acquisition Plan: The “Two-Day Sweep”
Extracted from Methodology Notes (Codex Plan)
Overview
To gather all datasets with minimal friction, a structured two-day manual download plan is outlined below. The plan assumes no coding — just navigating websites, downloading files (CSV/XLS/PDF), and logging time.
Estimated Total Time: ~12—14 hours over 2 days.
Day 1: Acquire Society & Individual Domain Data (Approx. 7 hours)
1. Trust in Government — Pew dataset (30 min)
- Source: Pew Research Center, “Public Trust in Government: 1958-2025” fact sheet.
- URL: pewresearch.org (Search for “Public Trust in Government”)
- Action: Click “Download data as .csv” link under the main trust trend chart.
- Cleaning: Open CSV, verify 1958+ data. Reformat/transpose to one row per year.
2. General Social Trust — GSS data (45 min)
- Source: GSS Data Explorer (gssdataexplorer.norc.org).
- Variable: “TRUST” (General Social Survey, 1972-2022 Cumulative File).
- Action: Extract trend report for TRUST over time or download full dataset.
- Metric: % responding “Most people can be trusted”.
- Cross-check: Use Pew/Commentary figures (1973: 47%, recent ~32%).
3. Crime Rate — FBI UCR (60 min)
- Source: FBI Crime Data Explorer (cde.ucr.cjis.gov) or DisasterCenter.
- Action:
- Option 1: FBI CDE → United States → Violent Crime → 1960-2020 → Export CSV.
- Option 2: DisasterCenter table → Copy to Excel → Save CSV.
- Metric: Violent Crime rate per 100k.
- Note: Append 2020-2022 from BJS reports if missing.
4. Income Gini — Census/FRED (30 min)
- Source: FRED (Federal Reserve Economic Data).
- Series: “GINI Index for the United States (SIPOVGINIUSA)“.
- URL: fred.stlouisfed.org
- Action: Download CSV (1967-2023).
- Check: Verify against Census PDF reports.
5. Family (Unmarried Birth %) — CDC/NCHS (45 min)
- Source: National Vital Statistics Report or Washington State OFM.
- URL: ofm.wa.gov (Search “Percent of Births to Unmarried Mothers”)
- Action: Copy year and % columns (1960-2021).
- Metric: % of births to unmarried mothers.
6. Voter Turnout (30 min)
- Source: United States Elections Project (electproject.org).
- Action: Download “National Turnout Rates 1789—Present” CSV.
- Metric: Turnout % of VAP (Voting Age Population).
- Filter: Copy presidential years (1960+) into master dataset.
Wrap-up Day 1
- Organize into master spreadsheet/database.
- Check for anomalies/gaps.
Day 2: Acquire Logos/Physics Domain Data & Remaining Metrics (Approx. 6—7 hours)
1. Trust in Media — Gallup (45 min)
- Source: Gallup News, “Americans’ Trust in Media”.
- URL: news.gallup.com
- Action: Find historical trend table (1972-present).
- Data Points: e.g., 1972: ~68%; 1997: 53%; 2016: 32%; 2022: 34%.
- Metric: % trusting mass media “fair amount” or “fully”.
2. Media Fragmentation — TV Audience (60 min)
- Source: Nielsen reports, EBSCO Research Starters, textbooks.
- Metric: % TV audience watching Big 3 networks.
- Data Points:
- 1950s-70s: >90%
- 1990: ~61%
- 2000: ~49%
- Action: Create CSV of Year vs. Big3 Share. Interpolate if necessary.
3. Polarization — Voteview (45 min)
- Source: Voteview.com.
- Dataset: House/Senate Members CSV (
HSall_members.csv) or Polarization Table. - Action: Download CSV.
- Metric: Difference in party means (DW-NOMINATE) for each Congress.
- Filter: 1971 onward.
4. Economic Policy Uncertainty (EPU) (15 min)
- Source: policyuncertainty.com.
- Dataset: US Monthly EPU Index.
- Action: Download Excel/CSV.
- Processing: Compute annual averages (1985-2025).
5. VIX Volatility Index (15 min)
- Source: FRED.
- Series: “VIXCLS”.
- Action: Download CSV (1990-present).
- Processing: Compute annual averages.
6. Suicide Rate — CDC (30 min)
- Source: CDC WISQARS, HealthData.gov, or Kaggle.
- Metric: Suicide mortality rate (age-adjusted) per 100k.
- Filter: Total population, 1960-2020.
- Check: 2000: ~10.4; 2018: ~14.2.
7. Life Expectancy — FRED (10 min)
- Source: FRED.
- Series: Life Expectancy at Birth (USA).
- Action: Download CSV.
8. Overdose Deaths — CDC (30 min)
- Source: CDC Wonder or NCHS Data Briefs.
- Query: “All drug overdose deaths, 1999—2022, age-adjusted rate”.
- Action: Export CSV.
9. Religious Attendance — Gallup/GSS (30 min)
- Source: GSS Data Explorer (“ATTEND” variable) or Gallup reports.
- Metric: % attending weekly.
- Action: Generate trend report 1972-2022.
10. Quality Check & Assembly (60 min)
- Compile all into PostgreSQL table structure or master CSV.
- Align years.
- Highlight gaps.
- Scan for outliers.
Fred API Automation (The “Backbone”)
Use the FRED API to automate ~40-50% of the indicators.
Tier 1: FRED API Targets
- Gini Index
- Economic stability metrics
- VIX
- EPU
- Life Expectancy
- Labor metrics
- Health proxies
- Inflation / Volatility
- Consumer Sentiment
Tier 2: Manual Downloads (Static)
- Trust (Pew, Gallup, GSS)
- Crime (FBI)
- Family Structure (NVSS)
- Polarization (Voteview)
- Media Fragmentation (Nielsen)
Postgres Schema Target
CREATE TABLE tci_indicators (
year SMALLINT PRIMARY KEY,
trust_gov_pct REAL, -- Pew
social_trust_pct REAL, -- GSS
violent_crime_rate REAL, -- FBI UCR
gini_index REAL, -- FRED
unmarried_birth_pct REAL, -- NVSS
turnout_pct REAL, -- Elections Project
happy_pct REAL, -- GSS
suicide_rate REAL, -- CDC
life_expectancy REAL, -- FRED
overdose_rate REAL, -- CDC
weekly_worship_pct REAL, -- GSS/Gallup
media_trust_pct REAL, -- Gallup
tv_big3_share_pct REAL, -- Nielsen/Manual
cong_polarization REAL, -- Voteview
epu_index REAL, -- PolicyUncertainty/FRED
vix_index REAL, -- FRED
trust_gov_std REAL -- Derived volatility
);