-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Electricity activity and emissions data -- 2005 #98
Electricity activity and emissions data -- 2005 #98
Conversation
…helpful QA resources
…roportions.R and new cprg_county_proportions.RDS generated; **/desktop.ini added to gitignore
…ole of WI and in-scope counties to support proportional allocation of activity data
…census block data to utility service area data and thereby enable estiamtion of utility service area populations
… electricity missions into Minnesota county-level table
…computing resources to process block-level data at scale
… counties/utilities to enable aggregation and calc of proportions added
…g, add code for utility/utility-county rollups
…e, calculate proportions, write to RDS
…exists) to data pipeline
…umbers get only totals and NA for sector for now
…eflect that sectoral breakdowns from NREL only affect 2021 data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LimerickSam Is 2005 activity data included as a change in this PR? I'm seeing that data sourced/calculated in energy/data-raw/processed_mn_electricUtil_activityData.R
and wisconsin_2005_2021_utilityProportions.R
but it doesn't seem like they're included as changes here. Have they already been reviewed?
Disregard I see them now. Don't know how I missed that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LimerickSam @pawilfahrt The WI county-level population estimates for 2021 used in _energy/data-raw/wisconsin_2005_2021_utilityProportions
don't seem to be lining up with 2020 ACS values, based on the browser dashboard. I'm worried that this is because of all the GIS work combining and recombining data with small geographies. Confusing this further, the saved RDS wi_popBlocks_2021_withUtility.RDS
doesn't have the same population totals that I get when I run lines 95-152.
I used this code:
wi_pop_2021_filtered %>% filter(county == "Pierce") %>% summarize(total_pop = sum(pop2020))
when wi_pop_2021_filtered <- read_rds(here("_energy", "data-raw", "wi_popBlocks_2021_withUtility.RDS"))
, and got a result of total_pop = 67465. The ACS 2020 dashboard shows Pierce County's 2020 population as 42212.
This is a really good catch, but I don't think it is ultimately cause for concern. Ultimately, the .RDS file you're referencing is not meant to be used as a source for county-level population estimates at all, and that population number is a ROUGH estimate whose goal is simply to facilitate the calculation of the proportion of population within a utility's WHOLE service area that falls into just our counties. Population counts embedded in wi_popBlocks_2021_withUtility.RDS should only be considered valid when tabulated/aggregated to UTILITY service territory, and NO other geography. The incorrect population number you're associating with a county there is an artifact of an unfortunate feature of WI's utility service territory data -- some utility polygons in this dataset have overlapping areas, which means that a given census block centroid may be associated with 2 or more utilities (however many utility service area polygons overlay). This is despite the fact that electric utilities are regulated in WI and consumers' utility assignment is based on their location. After a good deal of research on 1) why this was (likely due to shared facilities, joint ownership of infrastructure, and/or simple oversimplification by the data maintainer), then 2) ways I could reconcile this data (e.g., keeping just the SMALLEST utility in cases of overlap, since I want to make sure municipal and co-op utilities aren't subsumed by Xcel's massive footprint; this is computationally difficult and represents strong assumptions I wasn't prepared to make), and then, finally 3) understanding the impact of this data reality on my basic workflow of spatially joining census block centroids to utility service area polygons as an input to a calculation of utility service area populations within a given county as a proportion of "in-scope" utilities statewide operations... I determined that this method was actually spot on, since the goal was never accurate population tabulation, it was estimating what proportion of population in a utility's entire service territory was/is in each county in our study area. If we think about each utility individually, we can make this estimation even if some of the blocks are shared across utilities -- we are simply using what we know about each utility (i.e. what the state provides) and taking the population within that space. Some utilities, by nature of their overlap, will make reference to some of the same census blocks, but we can still feel good about our proportion estimate. We are simply using this proportion to allocate a utility's total activity when we don't have customer counts by county (which we preferentially use over population to estimate allocation). Basically, the way that that file should be interpreted is only on a utility-by-utility basis, and really not even as a pop figure for utilities -- given these overlaps, we are by definition overestimating for any given utility, when their utility polygon probably covers some space that don't actually have customer accounts based in. But this is unknowable, and this doesn't keep us from tabulating proportions that we can be confident in. Aggregations to county as contained within that file will be also by definition be invalid since they will incorporate repeated inclusions of any census block centroids that intersect with any space that has overlapping utility polygons. These population tabulations in the end are really only useful as inputs to calculating one key equation: propUtilityPopInCounty = estServiceAreaPop / estTotalServiceAreaPop. This equation can be accurate when grouped to utility or utility-county, but the actual population numbers that go into this formula shouldn't really be used anywhere. I hope this made sense and wasn't too roundabout? |
…ce census_county_population.R /.RDS + state_population.RDS is doing that work for us now!
…_population_proportions
… to reflect updated columns in cprg_population_proportions
…th new population changes
…ssions.RDS" This reverts commit 5481142.
…com/Metropolitan-Council/ghg-cprg into 83-electricity-data-sourcing---2005
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sam and Zoey. My review was more at a scan level since Zoey was so thorough and I know there's more to come here. Everything runs well and looks sensible at the end, so pushing it through!
Checklist
Please complete this checklist as a courtesy to the PR reviewer.
Code and styling
setwd()
file.path(here::here(), "file_name"))
to source any scripts or read in datarenv::install()
styler::style_dir(".", recursive = TRUE, filetype = c("R", "qmd"))
source = "chunk-name"
inplot_ly()
councilR::plotly_layout()
?out.width: "95%"
. If a specific height is needed, use pixels. and/orout.height: "500px"
Document editing
render_for_publication.R
GitHub and project management