Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electricity activity and emissions data -- 2005 #98

Merged
merged 64 commits into from
Jul 11, 2024

Conversation

LimerickSam
Copy link
Collaborator

@LimerickSam LimerickSam commented Jul 5, 2024

Checklist

Please complete this checklist as a courtesy to the PR reviewer.

Code and styling

  • All of the files/scripts I added are in the right place and named appropriately. See the README for details.
  • I have not used setwd()
  • I have used file.path(here::here(), "file_name")) to source any scripts or read in data
  • I have not added any large datasets, unless absolutely necessary (explain)
  • I have commented my code, particularly in hard to understand areas
  • I have added additional package dependencies as necessary with renv::install()
  • I have run styler::style_dir(".", recursive = TRUE, filetype = c("R", "qmd"))
  • Plots
    • If plotly, use source = "chunk-name" in plot_ly()
    • Use formatting with councilR::plotly_layout()?
  • Chunk formatting
    • All chunks named
    • All figure or table chunks have caption
    • out.width: "95%". If a specific height is needed, use pixels. and/or out.height: "500px"

Document editing

  • I have ensured that modified documents knit successfully from render_for_publication.R
  • I have fixed any missing citations, cross references, hyperlinks
  • I have reviewed my contributions for typos and misspellings.

GitHub and project management

  • I have identified and assigned at least one Reviewer (Liz, Sam, Laine) to this PR
  • I have assigned myself to this PR
  • I have updated the status in the GitHub Project

…roportions.R and new cprg_county_proportions.RDS generated; **/desktop.ini added to gitignore
…ole of WI and in-scope counties to support proportional allocation of activity data
…census block data to utility service area data and thereby enable estiamtion of utility service area populations
… electricity missions into Minnesota county-level table
…computing resources to process block-level data at scale
… counties/utilities to enable aggregation and calc of proportions added
…g, add code for utility/utility-county rollups
…umbers get only totals and NA for sector for now
…eflect that sectoral breakdowns from NREL only affect 2021 data
@pawilfahrt pawilfahrt requested a review from zeyandell July 8, 2024 18:22
Copy link
Collaborator

@zeyandell zeyandell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LimerickSam Is 2005 activity data included as a change in this PR? I'm seeing that data sourced/calculated in energy/data-raw/processed_mn_electricUtil_activityData.R and wisconsin_2005_2021_utilityProportions.R but it doesn't seem like they're included as changes here. Have they already been reviewed?

@zeyandell
Copy link
Collaborator

@LimerickSam Is 2005 activity data included as a change in this PR? I'm seeing that data sourced/calculated in energy/data-raw/processed_mn_electricUtil_activityData.R and wisconsin_2005_2021_utilityProportions.R but it doesn't seem like they're included as changes here. Have they already been reviewed?

Disregard I see them now. Don't know how I missed that.

@zeyandell zeyandell closed this Jul 8, 2024
@zeyandell zeyandell reopened this Jul 8, 2024
Copy link
Collaborator

@zeyandell zeyandell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LimerickSam @pawilfahrt The WI county-level population estimates for 2021 used in _energy/data-raw/wisconsin_2005_2021_utilityProportions don't seem to be lining up with 2020 ACS values, based on the browser dashboard. I'm worried that this is because of all the GIS work combining and recombining data with small geographies. Confusing this further, the saved RDS wi_popBlocks_2021_withUtility.RDS doesn't have the same population totals that I get when I run lines 95-152.

I used this code:
wi_pop_2021_filtered %>% filter(county == "Pierce") %>% summarize(total_pop = sum(pop2020)) when wi_pop_2021_filtered <- read_rds(here("_energy", "data-raw", "wi_popBlocks_2021_withUtility.RDS")), and got a result of total_pop = 67465. The ACS 2020 dashboard shows Pierce County's 2020 population as 42212.

@LimerickSam
Copy link
Collaborator Author

@LimerickSam @pawilfahrt The WI county-level population estimates for 2021 used in _energy/data-raw/wisconsin_2005_2021_utilityProportions don't seem to be lining up with 2020 ACS values, based on the browser dashboard. I'm worried that this is because of all the GIS work combining and recombining data with small geographies. Confusing this further, the saved RDS wi_popBlocks_2021_withUtility.RDS doesn't have the same population totals that I get when I run lines 95-152.

I used this code: wi_pop_2021_filtered %>% filter(county == "Pierce") %>% summarize(total_pop = sum(pop2020)) when wi_pop_2021_filtered <- read_rds(here("_energy", "data-raw", "wi_popBlocks_2021_withUtility.RDS")), and got a result of total_pop = 67465. The ACS 2020 dashboard shows Pierce County's 2020 population as 42212.

This is a really good catch, but I don't think it is ultimately cause for concern. Ultimately, the .RDS file you're referencing is not meant to be used as a source for county-level population estimates at all, and that population number is a ROUGH estimate whose goal is simply to facilitate the calculation of the proportion of population within a utility's WHOLE service area that falls into just our counties. Population counts embedded in wi_popBlocks_2021_withUtility.RDS should only be considered valid when tabulated/aggregated to UTILITY service territory, and NO other geography.

The incorrect population number you're associating with a county there is an artifact of an unfortunate feature of WI's utility service territory data -- some utility polygons in this dataset have overlapping areas, which means that a given census block centroid may be associated with 2 or more utilities (however many utility service area polygons overlay). This is despite the fact that electric utilities are regulated in WI and consumers' utility assignment is based on their location. After a good deal of research on 1) why this was (likely due to shared facilities, joint ownership of infrastructure, and/or simple oversimplification by the data maintainer), then 2) ways I could reconcile this data (e.g., keeping just the SMALLEST utility in cases of overlap, since I want to make sure municipal and co-op utilities aren't subsumed by Xcel's massive footprint; this is computationally difficult and represents strong assumptions I wasn't prepared to make), and then, finally 3) understanding the impact of this data reality on my basic workflow of spatially joining census block centroids to utility service area polygons as an input to a calculation of utility service area populations within a given county as a proportion of "in-scope" utilities statewide operations... I determined that this method was actually spot on, since the goal was never accurate population tabulation, it was estimating what proportion of population in a utility's entire service territory was/is in each county in our study area. If we think about each utility individually, we can make this estimation even if some of the blocks are shared across utilities -- we are simply using what we know about each utility (i.e. what the state provides) and taking the population within that space. Some utilities, by nature of their overlap, will make reference to some of the same census blocks, but we can still feel good about our proportion estimate. We are simply using this proportion to allocate a utility's total activity when we don't have customer counts by county (which we preferentially use over population to estimate allocation).

Basically, the way that that file should be interpreted is only on a utility-by-utility basis, and really not even as a pop figure for utilities -- given these overlaps, we are by definition overestimating for any given utility, when their utility polygon probably covers some space that don't actually have customer accounts based in. But this is unknowable, and this doesn't keep us from tabulating proportions that we can be confident in. Aggregations to county as contained within that file will be also by definition be invalid since they will incorporate repeated inclusions of any census block centroids that intersect with any space that has overlapping utility polygons. These population tabulations in the end are really only useful as inputs to calculating one key equation: propUtilityPopInCounty = estServiceAreaPop / estTotalServiceAreaPop. This equation can be accurate when grouped to utility or utility-county, but the actual population numbers that go into this formula shouldn't really be used anywhere. I hope this made sense and wasn't too roundabout?

@LimerickSam LimerickSam added the energy Energy label Jul 10, 2024
Copy link
Collaborator

@pawilfahrt pawilfahrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sam and Zoey. My review was more at a scan level since Zoey was so thorough and I know there's more to come here. Everything runs well and looks sensible at the end, so pushing it through!

@pawilfahrt pawilfahrt merged commit 7d9ac48 into dev-2005-baseline Jul 11, 2024
@pawilfahrt pawilfahrt deleted the 83-electricity-data-sourcing---2005 branch July 11, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
energy Energy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Electricity data sourcing - 2005
3 participants