Documentation for StateNavigate Data
In this document, all data available for download provided by StateNavigate is detailed. If you have questions about any of these data, please feel free to reach out to mary@statenavigate.org.
Demographics
These files contain demographic information about each district, derived from either the 2020 Census block data or the 2018-2023 5-year American Community Survey tract data conducted and published by the U.S. Census Bureau. Specifically, each file contains the following columns (Data marked by * is derived from the 2020 Census, all other data is estimated from ACS):
district: District numberTotal: Total district population*Households: Number of households in the district*NH_White: Number of nonhispanic white residents*NH_Black: Number of nonhispanic black residents*NH_AIAN: Number of nonhispanic American Indian or Alaska native residents*NH_Asian: Number of nonhispanic Asian residents*NH_NHPI: Number of nonhispanic Pacific Islander residents*NH_Oth: Number of nonhispanic residents that identify as members of a single race other than white, black, American Indian, Alaska native, Asian, or Pacific Islander*NH_Multi: Number of nonhispanic residents that identify as members of multiple races*Hispanic: Number of Hispanic residents*PCT_NH_White: Percent of district residents that identify as nonhispanic white*PCT_NH_Black: Percent of district residents that identify as nonhispanic black*PCT_NH_AIAN: Percent of district residents that identify as nonhispanic American Indian or Alaska native*PCT_NH_Asian: Percent of district residents that identify as nonhispanic Asian*PCT_NH_NHPI: Percent of district residents that identify as nonhispanic Pacific Islander*PCT_NH_Oth: Percent of district residents that are nonhispanic and identify as members of a single race other than white, black, American Indian, Alaska native, Asian, or Pacific Islander*PCT_NH_Multi: Percent of district residents that are nonhispanic and identify as members of multiple races*pres_2024: Margin of victory in the 2024 presidential race within the district boundaries. Positive numbers indicate that Donald Trump won the district; negative numbers indicate that Kamala Harris won the district.no_hs_degree: Percent of district residents who do not have a high school degreehs_degree: Percent of district residents whose highest educational attainment is a high school degreecollege_bachelor: Percent of district residents whose highest educational attainment is a bachelor’s degreecollege_graduate_plus: Percent of district residents whose highest educational attainment is beyond a bachelor’s degreewhite_no_college: Percent of white district residents who do not have a bachelor’s degreewhite_college: Percent of white district residents who have a bachelor’s degreemedian_income: Estimate of the individual median earnings for residents aged 25+ in the district, calculated as a weighted average of the median earnings of each census tract that intersects with the district, weighted based upon the population of the tract included in the districtborn_in_state: Percent of district residents born in the statenatural_born_outside_state: Percent of district residents that are natural born American citizens and were born in a different statenaturalized_citizen: Percent of district residents that are naturalized American citizensnon_us_citizen: Percent of district residents that are not American citizensPCT Under 18: Percent of district residents under the age of 18*PCT 18 to 24: Percent of district residents between the ages of 18 and 24*PCT 25 to 34: Percent of district residents between the ages of 25 and 34*PCT 35 to 44: Percent of district residents between the ages of 35 and 44*PCT 45 to 54: Percent of district residents between the ages of 45 and 54*PCT 55 to 64: Percent of district residents between the ages of 55 and 64*PCT 65 and Over: Percent of district residents over the age of 64*
Election Results
These files contain election results according to both county lines and district boundaries. In particular, each download contains three files: namely County Data, District Data, and Topline Data. All three files share the following columns:
candidate: Full name of candidate in this election.election_full: Identifier for the election. Note that elections with “state_leg” as part of their identification refer to the chamber for which the data files belong; for example, in the files referring to the State Senate, these indicate State Senate results.party: Party of candidate in this election.incumbent: This column is currently not in use.election_id: Shortened identifier for the election.district: District identifier, where relevant.votes: Total votes earned by the candidate in the corresponding geography.vote_pct: Percent of votes earned by the candidate in the corresponding geography.
The County Data files carry election results by county for each state. Note that for legislative elections, the results indicated are the results for whatever portion of the relevant district lies within the relevant county boundaries. In the case that multiple districts intersect the same county, the total voteshare is aggregated by party, and the candidate will appear as “DEM candidate” to indicate that multiple candidates are included. This file carries the following additional columns:
county: Name of the county for which the results are calculated.
The District Data files carry election results within the relevant district boundaries.
The Topline Data files carry the overall election results for each of the relevant races.
Campaign Finance - Candidates
These files contain information about candidates in each chamber. Each row in the file indicates a candidate that appeared on the ballot for a specific election. If a candidate appeared on more than one ballot for the same seat (e.g., in both a primary and general election), then there will be more than one row in the file representing each ballot on which the candidate appeared. Specifically, the file contains the following rows:
id: Internal StateNavigate identifier for the campaign being run. Note that multiple rows may carry the same id, if they represent the same candidate appearing on multiple ballots seeking the same seat.candidate_id: Internal StateNavigate identifier for the candidate running for office. Note that this value remains constant across election cycles if the same candidate runs for office more than once.party: Full name of the candidate’s partyparty_code: Standard abbreviation for the candidate’s partyincumbent: Boolean for whether the candidate is an incumbent. Blanks indicate that data has not yet been collected.candidate_name: Full name of the candidatecandidate_last_name: Last name of the candidatefinance_report_start: Start date of the campaign finance filing report that includes the date of the relevant election. If such a report is not available, the most recent report prior to the date of the election will be substituted here.finance_report_end: End date of the campaign finance filing report that includes the date of the relevant election. If such a report is not available, the most recent report prior to the date of the election will be substituted here.cf_raised: Total raised over the entirety of the campaign, as of the relevant filing report.cf_spent: Total spent over the entirety of the campaign, as of the relevant filing report.cf_cash_on_hand: Total cash on hand, as of the relevant filing report.cf_loans: Outstanding loans, as of the relevant filing report.cycle: Campaign cycle of the candidate’s election.special: Boolean indicating whether the election is a special election. Blanks should be interpreted as FALSE.stage: Stage of the election for this row.election_date: Date of the election for this row.chamber: Name of the chamber for the seat. In the case of statewide elections, this column holds the name of the office being sought (e.g., “Governor” or “Attorney General”).chamber_type: Type of the chamber for the seat (upper, lower, or statewide)district: District number (or other identifier) for the seat. In the case of statewide elections, this column will be blank.state: Name of the state in which the seat is located.
Campaign Finance - Donations
These files contain donations given to candidates by individuals, organizations, and PACs, as reported to the relevant state authority on campaign finance, either by the campaign or the donor. These files contain the following columns:
id: Internal StateNavigate identifier for the donation.campaign_id: Internal StateNavigate identifier for the campaign receiving the donation. Note that this column can be crossreferenced with the column “id” in the candidate downloads.candidate_name: Full name of the candidate.cycle: Cycle in which the donation was received.name: Name of the donor.amount: Amount of the donation.date: Date of the donation.purpose: Purpose of the donation, if indicated on filing reports.
Campaign Finance - Expenditures
These files contain expenditures made by candidates in the course of their campaigns, as reported to the relevant state authority on campaign finance. These files contain the following columns:
id: Internal StateNavigate identifier for the expenditure.campaign_id: Internal StateNavigate identifier for the campaign spending the money. Note that this column can be crossreferenced with the column “id” in the candidate downloads.candidate_name: Full name of the candidate.cycle: Cycle in which the donation was received.name: Name of the payee.amount: Amount of the expense.date: Date of the expense.purpose: Purpose of the expense, if indicated on filing reports.
W-NOMINATE Vote Data
These files contain the data on each vote taken in the specified chamber used to calculate W-NOMINATE scores, and the results of the W-NOMINATE algorithm as applied to that state. These files contain the following columns:(for technical documentation on GMP and other technical parameters, see Poole KT, Rosenthal H (1983). “A Spatial Model for Legislative Roll Call Analysis.” Carnegie Mellon Working Paper, and Poole, K., Lewis, J. B., Lo, J., & Carroll, R. (2011). “Scaling Roll Call Votes with wnominate in R”. Journal of Statistical Software, 42(14), 1–21. https://doi.org/10.18637/jss.v042.i14[1]
vote_event_id: Identifier for the votecorrectYea: Number of yea votes correctly predicted by W-NOMINATEwrongYea: Number of votes predicted to be yea by W-NOMINATE, but were actually naywrongNay: Number of votes predicted to be nay by W-NOMINATE, but were actually yeacorrectNay: Number of nay votes correctly predicted by W-NOMINATEGMP: Geometric Mean Probability (the geometric mean for the likelihood functions of the underlying optimizations); a measure of goodness of fitPRE: Proportional Reduction in Error; another measure of goodness of fitspread1D: Spread of the w-nominate scores in the first dimension (horizontal axis)spread2D: Spread of the w-nominate scores in the second dimension (vertical axis)midpoint1D: Midpoint of the w-nominate scores in the first dimensionmidpoint2D: Midpoint of the w-nominate scores in the second dimensiontotalYea: Total number of yea votestotalNay: Total number of nay votescv1: First parameter of the cutting line (the line that separates ‘yeas’ from ‘nays’)cv2: Second parameter of the cutting lineangle1: Angle of the normal vector (atan(cv1, cv2)*180/pi)angle1adj: Angle adjusted to be non-negativecv1adj: cv1 adjusted to be non-negativecv2adj: cv2 adjusted to be non-negativeangle2: Angle of the normal vector using adjusted CVsangle2adj: Angle of the normal vector plus 90 degreesbill_id: Identifier for the billstart_date: Date on which the vote beganmotion_text: Text of the motion on which the vote is being held.result: Result of the vote.session_identifier: Identifier for the session in which the vote is held.title: Title of the underlying bill.identifier: Identifier for the underlying bill.abstract: Abstract for the underlying bill.url: Link to the bill on the relevant state agency’s website.yes: Total voting yes.no: Total voting no.not voting: Total not voting on the motion.abstain: Total abstaining from the vote on the motion.other: Total legislators in the relevant body whose vote cannot be classified as yes, no, not voting, or abstention.total_votes: Total votes on the motion.roll_call_id: Shortened identifier for the voting event event.
W-NOMINATE Vote Log
These files contain all votes by all legislators on all motions considered when calculating W-NOMINATE scores. These files contain the following columns:
vote_event_id: Identifier for the vote event considered. Note that this can be cross-referenced with the vote_event_id column in the W-NOMINATE Vote Data files.voter_id: Identifier for the legislator voting.yea_prob: Probability of the legislator voting in favor of the motion, as calculated by W-NOMINATE.vote_desc: The legislator’s actual vote on the legislation.people_id: Shortened identifier for the legislator voting.roll_call_id: Shortened identifier for the vote event considered.
W-NOMINATE Legislators
These files contain the W-NOMINATE scores for each individual legislator (see footnote 1 for technical references). These files contain the following columns:
voter_id: Identifier for the legislator.name: Name of the legislator.party: Current party affiliation of the legislator.district: District name/number for the legislator.correctYea: Number of yea votes cast by this legislator correctly predicted by W-NOMINATE.wrongYea: Number of votes predicted to be yea by W-NOMINATE, but were actually cast as nay.wrongNay: Number of votes predicted to be nay by W-NOMINATE, but were actually cast as yea.correctNay: Number of nay votes cast by this legislator correctly predicted by W-NOMINATE.GMP: Geometric Mean Probability (the geometric mean for the likelihood functions of the underlying optimizations); a measure of goodness of fitPRE: Proportional Reduction in Error; another measure of goodness of fitcoord1D: First dimension W-NOMINATE score for the legislator.coord2D: Second dimension W-NOMINATE score for the legislator.se1D: standard error of the first dimension scoresse2D: standard error of the second dimension scoresCorr.1: correlation between the first and second dimension scorespeople_id: Shortened identifier for the legislator voting.
[1] For documentation on GMP and other technical parameters, see Poole KT, Rosenthal H (1983). “A Spatial Model for Legislative Roll Call Analysis.” Carnegie Mellon Working Paper, and Poole, K., Lewis, J. B., Lo, J., & Carroll, R. (2011). “Scaling Roll Call Votes with wnominate in R”. Journal of Statistical Software, 42(14), 1–21.