Skip to main content

cs2370 Notes: 30 Spreadsheets

··2 mins

Spreadsheets

Chapter: https://automatetheboringstuff.com/2e/chapter13/

Incindiary claim: Using a spreadsheet to calculate anything important is malpractice.

  • A spreadsheet is a way to write a computer program.
  • Computer programs for import stuff should be testable, reusable, and maintainable.
  • Spreadsheets embed the input data for your program. The only way to run a spreadsheet on different data is to copy the whole thing and edit in the new data.
  • Not only that - spreadsheets are error prone. They like to do things like turning fractions into dates.

Spreadsheets can be a reasonable tool for exploring data, but any results where getting something wrong would cause harm should be replicated using a real computer program.

Now that I’ve said that, spreadsheets can also be used as a way to store data rather than as a way to calculate - and that makes them potentially useful as the input or output of a computer program.

“Pork Barrel Spending”

Looking around for a moderately complicated spreadsheet, I found Congressionally Directed Spending FY23. This is a list of local projects that congresspeople have gotten federal funding for.

We’ve got amount and column, so we can start asking questions:

  • What’s the total funding to each state?
import openpyxl as xl
from openpyxl.utils import column_index_from_string as l2i
import re

wb = xl.load_workbook("FY2023 CDS.xlsx")
sheet = wb["results"]

amount_col = l2i("H")
states_col = l2i("I")


totals = {}


for row in range(8, sheet.max_row):
    amount = sheet.cell(column=amount_col, row=row).value
    states = sheet.cell(column=states_col, row=row).value

    if states == None:
        continue
    
    xs = re.split(r',\s*', states)

    for st in xs:
        amt = totals.get(st, 0)
        amt += int(amount) / len(xs)
        totals[st] = amt


out_wb = xl.Workbook()
sh = out_wb.active
sh.title = "CDS Totals by State"

sh["A1"] = "State"
sh["B1"] = "Total"

ii = 2
for st in sorted(totals.keys()):
    sh[f"A{ii}"] = st
    sh[f"B{ii}"] = totals[st]
    ii += 1

out_wb.save("output.xlsx")

California gets all the money. Why?

Let’s normalize by population.

Here’s the congressional apportionment spreadsheet, which gives populations as of 2020 by state.

awb = xl.load_workbook("apportionment-2020.xlsx")

sh = awb.active

for row in range(5, 55):
    state = sh[f"A{ii}"].value
    pop = sh[f"B{ii}"].value

    # Wait, that's state names not abbreviations. 

We can probably figure this out.

Data sources: