Faceted map example chart #1711

palewire · 2019-09-27T22:06:20Z

Tasks

Based on (#1711 (comment)), (#1711 (comment))

Tip

The tasks below should be completed in order of appearance.
Earlier tasks can be performed by any contributor (new or old) familiar with alt.Chart.mark_geoshape

Find an appropriate dataset
- Contains a categorical field, not directly dependent on location
- License that enables adding to vega/vega-datasets and using/transforming within vega projects (non-commercial)
  - Current dataset licenses: 1, 2
Demonstrate using the dataset (in this issue)
- Should be possible using a remote URL - e.g. GeoJSON, TopoJSON
Open an upstream issue, linking back to this one
Follow up by opening an upstream PR

Example (not `altair`)

(#1711 (comment)) @mattijn

Maybe something like species richness for a limited number of species.
Similar to below, from here, but then preferably aggregated per county, instead of raster cells:

The text was updated successfully, but these errors were encountered:

mattijn · 2019-09-30T20:42:10Z

Try this:

import altair as alt
from vega_datasets import data

states = alt.topo_feature(data.us_10m.url, 'states')
source = data.income.url

alt.Chart(source).mark_geoshape().encode(
    shape=alt.Shape(field='geo', type='geojson'),
    color='pct:Q',
    column='group:N',
    tooltip=['name:N', 'group:N', 'pct:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=states, key='id'),
    as_='geo'
).properties(
    width=75,
    height=150
).project(
    type='albersUsa'
)

I noticed there is no shorthand for type='geojson' (otherwise you could do something as shape='geo:G'). It's also not mentioned in the Altair docs, where it is in the Vega-Lite docs

irisslee · 2019-10-04T00:48:44Z

Here's an example using the LA riots sample dataset

import altair as alt
from vega_datasets import data

df = data.la_riots()

n = alt.topo_feature('https://gist.githubusercontent.com/irisslee/70039051188dac8f64e14182b5a459a9/raw/2412c45551cff577f7b10604ca523bd3f4dd31d3/countytopo.json', 'county')

LAbasemap = alt.Chart(n).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(width = 400, height =400).project('mercator')

points = alt.Chart().mark_circle().encode( 
    longitude = 'longitude:Q',
    latitude='latitude:Q',
    size = alt.value(15), 
    color = 'gender:N'
)

alt.layer(LAbasemap, points, data=df).facet('gender:N')

jakevdp · 2019-10-04T03:21:08Z

That's a nice example of the mechanics of a faceted map, but I think for this particular dataset the visualization would be more effective without splitting it across facets.

palewire · 2019-10-04T03:27:47Z

What do you see as an ideal example of a faceted map for the gallery?

…

On Thu, Oct 3, 2019, 8:21 PM Jake Vanderplas ***@***.***> wrote: That's a nice example of the mechanics of a faceted map, but I think for this particular dataset the visualization would be more effective without splitting it across facets. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1711>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACOCMC7QXEF63373YFC63QM2ZCZANCNFSM4I3LKKBQ> .

jakevdp · 2019-10-04T04:04:15Z

I haven't been able to come up with a good example.

mattijn · 2019-10-04T05:50:00Z

I add one already in #1714..

palewire · 2019-10-04T13:34:13Z

In news graphics, the most common case for a faceted map is when you want to create a set of "mini multiples" that compare quantitative values on a shared scaled across a set of competing nominative values. A current example would be mapping the location of campaign donors across America for the 20+ Democratic presidential candidates. If you want something in that ballpark, I think we should look for a sample 50 state dataset that has a nominative facet where the different categories show some variety across the country.

…

On Thu, Oct 3, 2019, 10:50 PM mattijn ***@***.***> wrote: I add one already in #1714 <#1714>.. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1711>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACOCPVUJO7WS7U26KQJGDQM3KQ7ANCNFSM4I3LKKBQ> .

mattijn · 2019-10-04T13:39:30Z

I like this one as well:
https://www.ryansleeper.com/a-tale-of-50-cities-population-changes-of-the-50-largest-us-cities-since-1790/

palewire · 2019-10-04T15:02:48Z

I think facets by time series segment or by a quantitative bracket are interesting, but I'd wager that both are much less common than charts that facet by a nominative category.

mattijn · 2019-10-04T15:32:17Z

How does a facet by quantitative data look like? Albeit years can be a quantitative data type as well, aren't they used as nominative categories here?

import altair as alt
from vega_datasets import data

countries = alt.topo_feature(data.world_110m.url, 'countries')
source = 'https://raw.githubusercontent.com/mattijn/datasets/master/cities_prediction_population.csv'

base = alt.Chart(countries).mark_geoshape(
    fill='lightgray',
    stroke='white',
    strokeWidth=0.2
).properties(width=300, height=200).project('naturalEarth1')

cities = alt.Chart().mark_circle().encode( 
    latitude='lat:Q',    
    longitude='lon:Q',
    size=alt.Size('population:Q',scale=alt.Scale(range=[0, 1000]), legend=alt.Legend(title="Population (million)")),
    fill=alt.value('green'),
    stroke=alt.value('white'),
    tooltip=['city:N','population:Q']
)

alt.layer(base, cities, data=source).facet(
    facet='year:N', 
    columns=2, 
    title='The 20 Most Populous Cities in the World by 2100'
)

Based on https://www.visualcapitalist.com/animated-map-worlds-populous-cities-2100/

palewire · 2019-10-04T15:40:50Z

Perhaps I am not using the term nominative correctly, but in this example you give I would say you are still grouping an ordinal time series at the end of the day. The result is an example that is slightly more complex, and less common, than one where the dataset already possesses a simple categorical column, like politician candidate in my earlier example, or like gender in the one given by Iris Lee.

…

On Fri, Oct 4, 2019, 8:32 AM mattijn ***@***.***> wrote: How does a facet by quantitative data look like? Albeit years can be a quantitative data type as well, aren't they used as nominative categories here? import altair as altfrom vega_datasets import data countries = alt.topo_feature(data.world_110m.url, 'countries') source = 'https://raw.githubusercontent.com/mattijn/datasets/master/cities_prediction_population.csv' base = alt.Chart(countries).mark_geoshape( fill='lightgray', stroke='white', strokeWidth=0.2 ).properties(width=300, height=200).project('naturalEarth1') cities = alt.Chart().mark_circle().encode( latitude='lat:Q', longitude='lon:Q', size=alt.Size('population:Q',scale=alt.Scale(range=[0, 1000]), legend=alt.Legend(title="Population (million)")), fill=alt.value('green'), stroke=alt.value('white'), tooltip=['city:N','population:Q'] ) alt.layer(base, cities, data=source).facet( facet='year:N', columns=2, title='The 20 Most Populous Cities in the World by 2100' ) [image: image] <https://user-images.githubusercontent.com/5186265/66219935-5bd65200-e6cc-11e9-9314-e858a74efd4a.png> Based on https://www.visualcapitalist.com/animated-map-worlds-populous-cities-2100/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1711>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAACOCONVNIMW5ET2KYKF6LQM5OYRANCNFSM4I3LKKBQ> .

mattijn · 2019-10-04T16:32:20Z

Yeah, my example is more ordinal then nominal

palewire · 2019-10-04T16:36:13Z

In my opinion, the best Altair examples import from vega_datasets and do not require any transformation of data prior to plotting.

With those requirements, I'm not sure there's a suitable dataset in the current example list other than the LA riots dataset used by @irisslee. However, that set may require the import of outside geographies for the base map, something I think we should also aim to avoid.

Unless we can find a good candidate with the examples, or solve the issue of the base map for the riots data, I think we should consider nominating a new example dataset for vega_datasets to document this relatively common news chart.

dangotbanned · 2025-01-01T17:20:36Z

@mattijn was this not solved by?

ENH: add datatype geojson #1714

If we were going by the issue title/description alone; your example (US Income by State: Wrapped Facet) seems like the solution.

Reading through the thread, it isn't clear to me what the additional requirements would be for closing the issue

Note

I'm trying to do some housekeeping on old issues, e.g. closing, labelling, adding relationships.

mattijn · 2025-01-01T22:01:05Z

Happy new year @dangotbanned!
This is still a valid issue in a sense that there is currently not a good dataset that describes what @palewire is after.

Maybe something like species richness for a limited number of species.
Similar to:

Taken from here, but then preferably aggregated per county, instead of raster cells.

dangotbanned · 2025-01-01T22:13:27Z

#1711 (comment)

@mattijn in that case, how about we source a suitable dataset and open an issue in https://github.com/vega/vega-datasets?

We could then have access to it after dealing with:

mattijn · 2025-01-01T22:14:46Z

Sounds good to me!

dangotbanned · 2025-01-01T22:22:30Z

Sounds good to me!

Great @mattijn !

I'm not 100% sure the license for the dataset you linked would work for us
https://creativecommons.org/licenses/by-nc-nd/4.0/

A potential starting point might be here though https://github.com/datasets

giswqs · 2025-01-04T14:03:24Z

Two data sources for consideration:

Zillow Research Data: https://www.zillow.com/research/data/
Data Commons: https://datacommons.org/

A sample US housing dataset derived from the Zillow data.
https://github.com/opengeos/data/blob/main/housing/Core/RDC_Inventory_Core_Metrics_State.csv

giswqs · 2025-01-04T14:08:55Z

A demo I created a few years ago for using the Zillow housing data. I am not quite sure about the dataset license though.

Demo: https://www.youtube.com/watch?v=gMghAsNuTbw
Web App: https://huggingface.co/spaces/giswqs/Streamlit/blob/main/pages/2_%F0%9F%8F%A0_U.S._Housing.py

If the Zillow dataset license does not work for you, https://datacommons.org/ has a lot of other open datasets.

dangotbanned · 2025-01-07T12:22:09Z

#1711 (comment), #1711 (comment)

Thanks for showing interest in this issue @giswqs!

I've just updated the description with an overview of what work I'm thinking we need to do. Hope you find that helpful.

I don't personally work with spatial data, but I'm quite impressed by your GitHub profile! 🧠

dsmedia · 2025-02-04T12:57:39Z

The species data looks like a good candidate for a new dataset. We can probably leverage the county data already present in us-10m.json.

The source of the chart above, per the paper, is:

We obtained gridded 30-m × 30-m resolution maps of suitable habitat for 1697 species and subspecies (hereafter “species”) of mammals, birds, amphibians, and reptiles for CONUS through the US Geological Survey’s Gap Analysis Program (U.S. Geological Survey Gap Analysis Program, 2018). We focus on these species because methods of producing suitable habitat were consistent across species for the entire area.

It's probably not necessary to use all 1697 species. Instead, we can focus on four common species:

American Robin (Turdus migratorius)
American Bullfrog (Lithobates catesbeianus
White-tailed Deer (Odocoileus virginianus)
Common Garter Snake (Thamnophis sirtalis)

Data are available for download here but will need some processing.

I'll work to put this together and will share a sample visualization to see if it's suitable.

dsmedia · 2025-02-05T17:26:33Z

@giswqs @mattijn

Source data:

USGS GAP habitat maps (30m resolution GeoTIFF)
Each cell indicates species presence in winter, summer, both, or neither
Files are large (~300MB each) for 4 different species

Target visualization:

Need to display this on a county-level US map
Using the us-10m.json county boundaries from vega-datasets

Questions:

What's the best approach to aggregate the high-resolution presence/absence data to the county level? Should I use a binary indicator (species present/absent in county) or calculate the percentage of suitable habitat per county? If binary, what should be the threshold to determine presence? I'm trying to understand both the technical process of this aggregation in code (zonal statistics from GeoTIFF to GeoJSON) and which metric would be most meaningful for visualization.

Any practical tips or code snippets would be very welcome.

mattijn · 2025-02-06T11:21:08Z

The sizes are relatively large indeed. I would use an approach that incorporates geowombat for clipping the extent of each county (snapped to raster) before counting the 'year-round' pixels.

It will probably be a process that simmers for a few hours per species, but the following can function as a good starting point to get the job done. Rasterstats also might function, but given the size of the tif, I prefer more control.

import geopandas as gpd
import geowombat as gw
import pandas as pd

def compute_count_yearround(gdf_row, bounds, raster_fp):
    with gw.config.update(ref_bounds=bounds, ref_tar=raster_fp):
        with gw.open(raster_fp, dtype=int) as ds:
            ds = ds.gw.mask(gdf_row, keep='in')
            # I think 3 equals 'year-round'
            count_yearround = (ds == 3).sum().compute().item()
    return count_yearround

# bAMROx is 'Bird, American Robin'
raster_fp = "bAMROx_CONUS_HabMap_2001v1.tif"
vector_fp = "us10m.topo.json"

gdf = gpd.read_file(vector_fp, driver='TopoJSON')
gdf = gdf.set_crs(epsg=4326)  # degrees
gdf = gdf.to_crs(epsg=5070)  # meters; and same crs as `raster_fp`
gdf = gdf[~gdf.is_empty]  # filter out empty geoms
# slice first 5 rows to test method
gdf = gdf.iloc[0:5]

pixel_count_county = []
for ix, row in gdf.iterrows():
    # print(ix)
    bounds = row.geometry.bounds
    gdf_row = gdf.loc[ix:ix]
    id_count = compute_count_yearround(gdf_row, bounds, raster_fp)
    pixel_count_county.append((row.id, id_count, row.geometry.area))

df = pd.DataFrame(pixel_count_county, columns=["id", "bAMROx_count", "area"])
# for area percentage computation use pixel (spatial) resolution of 30 meters
df["bAMROx_pct"] = (df.bAMROx_count * (30*30) / df.area) * 100
df[['id', 'bAMROx_pct']]

	id	bAMROx_pct
0	53000	14.035112
1	53073	71.105570
2	30105	15.965316
3	30029	89.271899
4	16021	85.322979

dangotbanned · 2025-02-06T11:36:08Z

It will probably be a process that simmers for a few hours per species

@mattijn am I understanding this correctly, that you're saying processing this data will take 8-16 hours?

mattijn · 2025-02-06T11:43:09Z

As is, maybe? Depending on hardware. Can execute in parallel with chunks of counties. But without ingesting the raw geotiffs in dedicated systems, I think this is likely as expected. Potentially can resample to larger pixel sizes before counting to get rough idea of the numbers.

dangotbanned · 2025-02-06T12:58:28Z

#1711 (comment)

@mattijn I think the main factor would be this:

# NOTE: 3226 rows
gdf = gdf[~gdf.is_empty]  # filter out empty geoms
# slice first 5 rows to test method
# gdf = gdf.iloc[0:5] <------------------------------ (Assuming you'd want to use all rows)
for ix, row in gdf.iterrows():
    ...

Because up in this function you would be reading the same file 3226 times:

def compute_count_yearround(gdf_row, bounds, raster_fp):
    with gw.config.update(ref_bounds=bounds, ref_tar=raster_fp):
        with gw.open(raster_fp, dtype=int) as ds:
            ...

Also, pixel_count_county could be inlined to a listcomp:

pixel_count_county = [
    (
        row.id,
        compute_count_yearround(gdf.loc[ix:ix], row.geometry.bounds, raster_fp),
        row.geometry.area,
    )
    for ix, row in gdf.iterrows()
]

Ideally though, avoiding .iterrows() would probably yield a bigger perf boost.

Note

I'm not sure what file "bAMROx_CONUS_HabMap_2001v1.tif" is referring to and haven't used geowombat before

dangotbanned · 2025-02-06T13:03:59Z

Side note

@mattijn changing these arguments to gpd.read_file seems to give the same result:

# gpd.read_file(vector_fp, driver='TopoJSON')
gpd.read_file(vector_fp, layer="counties")

But it removes these warnings we also get when building the docs:

../altair/.venv/Lib/site-packages/pyogrio/raw.py:198: RuntimeWarning: driver TopoJSON does not support open option DRIVER
  return ogr_read(
../altair/.venv/Lib/site-packages/pyogrio/geopandas.py:265: UserWarning: More than one layer found in 'ff7a7e679c46f2d1eb85cc92521b990f1a7a5c7a.json': 'counties' (default), 'states', 'TopoJSON'. Specify layer parameter to avoid this warning.
  result = read_func(

mattijn · 2025-02-06T13:09:37Z

Surely one should only batch the for loop, why would you want to read the vector file so many times? By using the config manager of geowombat one avoids reading the full tiff, and is especially the beauty of this code. I don't think the listcomp is the bottleneck here, I really don't think the numbers are bad as is (given source tif is ~15billion pixels), but always open to learn faster methods to compute this!

dangotbanned · 2025-02-06T13:24:07Z

why would you want to read the vector file so many times?

I'm confused, this part (which I'm assuming you mean by vector file) is only reading once in either case

vector_fp = "us10m.topo.json"

gdf = gpd.read_file(vector_fp, driver='TopoJSON')

I'm suggesting that what you originally proposed would read the same file 3226 times

with gw.open(raster_fp, dtype=int) as ds:

Maybe this is normal in the spatial domain?
Generally though, I'd expect reading the file once or somehow reading it with some predicates applied would be ideal

I really don't think the numbers are bad as is (given source tif is ~15billion pixels), but always open to learn faster methods to compute this!

@mattijn Could you link/point to how I'd download "bAMROx_CONUS_HabMap_2001v1.tif" please?
Would be fun to experiment a little bit

mattijn · 2025-02-06T13:29:27Z

OK, now I see you included the code section to read geopandas dataframe only to state that it has 3226 row.
I'm indeed iterating over the same tif file but every time with another bounds. To download the data go to https://gapanalysis.usgs.gov/apps/species-data-download/, search for American Robin (top-right) and then click download habitat data. The tif is zipped I thought.

dsmedia · 2025-02-06T13:31:31Z

If necessary we can always choose a less common species. e.g. barking treefrog 49mb

.... or we could show in a single state or region rather than all USA.

.... but it would be cool if the generation script could be configured to allow any species to be processed in a reasonable amount of time.

dangotbanned · 2025-02-06T13:37:16Z

#1711 (comment)

Thanks @mattijn!

I'd downloaded the right file before (AmericanRobin_T.zip), but hadn't checked what was in it 😅

So the file is at this path (if anyone else is new to this):

AmericanRobin_T.zip/bAMROx_CONUS_HabMap_2001v1.zip/bAMROx_CONUS_HabMap_2001v1.tif

Edit

I'm probably not going to get to this soon, so don't let me be the hold up on any else making progress

mattijn · 2025-02-08T23:23:08Z

A faster method to compute this is using exactextract. Since recently it provides Python bindings. Using this approach it took for me not yet 8 minutes to compute the fractions in all counties for a single specie.

import geopandas as gpd
from exactextract import exact_extract

vector_fp = "us10m.topo.json"
raster_fp = "bAMROx_CONUS_HabMap_2001v1.tif"

gdf = gpd.read_file(vector_fp, layer='counties')
gdf = gdf.set_crs(epsg=4326)  # degrees
gdf = gdf.to_crs(epsg=5070)  # meters; and same crs as `raster_fp`
gdf = gdf[~gdf.is_empty]  # filter out empty geoms
gdf.columns = ['county_id', 'geometry']
gdf = gdf.iloc[0:5]

# default_value: specifies a value to be used for NODATA pixels instead of ignoring them
# use 255, since source raster is 8bit integer
df = exact_extract(
    rast=raster_fp, 
    vec=gdf, 
    ops=["unique(default_value=255)", "frac(default_value=255)"], 
    include_cols='county_id', 
    output='pandas',
    progress=True
)
df = df.explode(['unique', 'frac'])
df = df[df.unique == 3].drop('unique', axis=1)

	county_id	frac
0	53000	0.138258
1	53073	0.710393
2	30105	0.159549
3	30029	0.892068
4	16021	0.852625

The fractions are slightly off, but close enough, compared to the previous snap-to-raster geowombat approach. Using this approach we are much close to compute all of this in a reasonable amount of time.

dsmedia · 2025-02-09T11:28:50Z

Great - with this, I can make an example visualization to share shortly.

A faster method to compute this is using exactextract. Since recently it provides Python bindings. Using this approach it took for me not yet 8 minutes to compute the fractions in all counties for a single specie.

mattijn · 2025-02-09T13:44:23Z

Awesome, looking forward! I think this also can be a use-case to highlight the show mode for showing invalid values (see: https://vega.github.io/vega-lite/docs/invalid-data.html#show, available since vega-altair version 5.4.0)

import altair as alt
import pandas as pd
from vega_datasets import data

df = pd.read_csv('bAMROx_counties.csv')
counties = alt.topo_feature(data.us_10m.url, 'counties')
source = data.unemployment.url

alt.Chart(counties).mark_geoshape().encode(
    color=alt.Color('frac:Q').scale(scheme='lightgreyteal')
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=df, key='county_id', fields=['frac'])
).project(
    type='albersUsa'
).configure_mark(invalid='show')

mattijn · 2025-02-09T15:18:02Z

Btw, I realised the exact_extract() accepts also lists of raster input (see CLI-docs on multiple input files). It can then reuse the computation for each county/raster overlay. I was able to compute all in just over 20 minutes. Quite impressive, I've to say!

Meaning the data can be derived using:

import geopandas as gpd
import pandas as pd
import numpy as np
from exactextract import exact_extract

vector_fp = "us10m.topo.json"
gdf = gpd.read_file(vector_fp, layer='counties')
gdf = gdf.set_crs(epsg=4326)  # degrees
gdf = gdf.to_crs(epsg=5070)  # meters; and same crs as `raster_fp`
gdf = gdf[~gdf.is_empty]  # filter out empty geoms
gdf.columns = ['county_id', 'geometry']

raster_fps = [
    "bAMROx_CONUS_HabMap_2001v1.tif",
    "aAMBUx_CONUS_HabMap_2001v1.tif",
    "mWTDEx_CONUS_HabMap_2001v1.tif",
    "rCOGAx_CONUS_HabMap_2001v1.tif"
]

df = exact_extract(
    rast=raster_fps, 
    vec=gdf, 
    ops=["unique(default_value=255)", "frac(default_value=255)"], 
    include_cols='county_id', 
    output='pandas',
    progress=True
)

# List to store results
all_data = []

for raster in raster_fps:
    stem = raster.split('.')[0]
    species = stem.split('_')[0]
    
    unique_col = f"{stem}_unique"
    frac_col = f"{stem}_frac"

    # Convert columns to NumPy arrays
    unique_values = np.array(df[unique_col].to_list(), dtype=object)
    frac_values = np.array(df[frac_col].to_list(), dtype=object)

    # Repeat county_id based on array lengths
    county_ids = np.repeat(df["county_id"].values, [len(arr) for arr in unique_values])

    # Flatten arrays
    unique_flat = np.concatenate(unique_values)
    frac_flat = np.concatenate(frac_values)

    # Filter where unique == 3
    mask = unique_flat == 3
    temp_df = pd.DataFrame({
        "county_id": county_ids[mask],
        "species": species,
        "pct": frac_flat[mask]
    })

    all_data.append(temp_df)

# Combine results into a single DataFrame
df_melted = pd.concat(all_data, ignore_index=True)
df_melted.to_csv('species_counties.csv', index=False)

If you do not have the time to do this by yourself, here is the data that comes out of this: species_counties.csv

And for the viz I had a quick attempt as such:

import altair as alt
import pandas as pd
from vega_datasets import data

alt.data_transformers.disable_max_rows()

# Load data
df = pd.read_csv('species_counties.csv')
counties = alt.topo_feature(data.us_10m.url, 'counties')

# Create maps for each species
charts = [
    alt.Chart(counties, title=species)
    .mark_geoshape(tooltip=True)
    .encode(
        color=alt.Color('pct:Q', scale=alt.Scale(scheme='inferno'))
    )
    .transform_lookup(
        lookup='id',
        from_=alt.LookupData(data=df[df['species'] == species], key='county_id', fields=['pct'])
    )
    .project(type='albersUsa')
    .properties(width=180, height=130)
    for species in df['species'].unique()
]

# Concatenate charts into a grid layout
chart = alt.concat(*charts, columns=2).configure_mark(invalid='show')

chart

I did realised we do something similar in the docs here: https://altair-viz.github.io/user_guide/marks/geoshape.html#facet-a-map.

Vega-Editor link: https://vega.github.io/editor/#/gist/5e2141d8e73ea6acd96812bda364ed1d/vl5_species.json

dangotbanned · 2025-02-09T15:43:51Z

#1711 (comment)

@mattijn looking quite good

And for the viz I had a quick attempt as such:
...

Would you be able to take a look at this PR?

fix: facet fails on geoshape vega-lite#9292

There appears to be an abandoned PR that would fix geoshape facet.

Would be great if we could have this example working with .facet(...) rather than a listcomp.
If only for simplicity

mattijn · 2025-02-09T16:13:38Z

Agree it would be better, but unfortunately I'm not confident with typescript.

dsmedia · 2025-02-18T01:32:59Z

@mattijn Would this arrow file suffice? Do we want to change up the species before setting this for posterity? Below is a sample visualization (and script) to see if the species selection + dataset format is ok. (We can enhance the tooltip in the actual example.)

Also, would there be a straightforward way to spot check a single county's calculation of the pct field? If so I'd like to include this in vega-datasets#684 to highlight the accuracy of the technique used.

Species Habitat Visualization Code

import altair as alt
import pyarrow as pa
import requests
import io
from vega_datasets import data

# Download and load the Arrow file
url = "https://github.com/vega/vega-datasets/raw/c4ff52d22bfa2e70a1b40554cc4c56df216838f3/data/species.arrow"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
table = pa.ipc.RecordBatchFileReader(io.BytesIO(response.content)).read_all()

# Disable row limit for Altair (if using vegafusion, this may not be needed)
alt.data_transformers.disable_max_rows()

# Load US counties topology
counties = alt.topo_feature(data.us_10m.url, 'counties')

# Create a chart for each unique species (limiting to first 4 for demonstration)
#  With pyarrow, we can use filter directly on the table.
species_list = table.column('CommonName').to_pylist()
species_list = list(dict.fromkeys(species_list))[:4]  # Get unique species and limit

charts = [
    alt.Chart(counties, title=species)
    .mark_geoshape(tooltip=True)
    .encode(
        color=alt.Color(
            'percent_habitat:Q',
            scale=alt.Scale(
                domain=[0, 1],
                scheme='viridis',
                zero=True,
                nice=False
            ),
            title='Habitat Percentage',
            legend=alt.Legend(format=".0%")  # Format within the Legend object
        ),
        tooltip=[
            alt.Tooltip('county_id:N', title='County ID'),
            alt.Tooltip('percent_habitat:Q', title='Habitat %', format='.2%')
        ]
    )
    .transform_lookup(
        lookup='id',
        from_=alt.LookupData(
            # Filter the pyarrow table directly, no need for to_pandas()
            data=table.filter(pa.compute.equal(table.column('CommonName'), species)),
            key='county_id',
            fields=['percent_habitat']
        )
    )
      .transform_calculate(
        percent_habitat="datum.percent_habitat === null ? 0 : datum.percent_habitat"
    )
    .project(type='albersUsa')
    .properties(width=300, height=200)
    for species in species_list
]

# Combine charts into a grid
chart = alt.concat(*charts, columns=2).configure_view(
    stroke=None
).configure_mark(
    invalid='filter'
)

# Display the chart
chart
</details>

This commit introduces a new example to the Altair documentation showcasing choropleth maps faceted by category. The example visualizes the distribution of suitable habitat for different species across US counties, using the proposed new Species Habitat dataset from vega-datasets (vega/vega-datasets#684). Key features of this example: - Demonstrates the use of `alt.Chart.mark_geoshape()` for geographical visualizations. - Shows how to create faceted maps for comparing categorical data across geographic regions. - Utilizes the `transform_lookup` and `transform_calculate` transforms for data manipulation within Altair. - Uses a CSV data file temporarily hosted in the vega-datasets repository branch (pending dataset merge). This example addresses issue #1711, which requested a faceted map example for the Altair documentation. Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>

mattijn mentioned this issue Oct 1, 2019

ENH: add datatype geojson #1714

Merged

jakevdp added the documentation label Nov 24, 2019

dangotbanned added the good first issue label Jan 7, 2025

dangotbanned added the vega: vega-datasets Requires upstream action in `vega-datasets` label Jan 7, 2025

dangotbanned mentioned this issue Feb 6, 2025

Update geopandas.read_file(...) in examples/user guide #3794

Closed

This was referenced Feb 11, 2025

Add Species Habitat Dataset for Faceted Map Examples vega/vega-datasets#683

Open

feat: Add Species Habitat Dataset for Faceted Map Examples vega/vega-datasets#684

Draft

dsmedia mentioned this issue Feb 22, 2025

docs(example): Add Faceted map using Species Habitat dataset #3809

Draft

Faceted map example chart #1711

Faceted map example chart #1711

Comments

palewire commented Sep 27, 2019 • edited by dsmedia Loading

Tasks

Example (not altair)

mattijn commented Sep 30, 2019 • edited Loading

irisslee commented Oct 4, 2019

jakevdp commented Oct 4, 2019

palewire commented Oct 4, 2019 via email

jakevdp commented Oct 4, 2019

mattijn commented Oct 4, 2019

palewire commented Oct 4, 2019 via email • edited Loading

mattijn commented Oct 4, 2019

palewire commented Oct 4, 2019

mattijn commented Oct 4, 2019

palewire commented Oct 4, 2019 via email

mattijn commented Oct 4, 2019

palewire commented Oct 4, 2019 • edited Loading

dangotbanned commented Jan 1, 2025

mattijn commented Jan 1, 2025

dangotbanned commented Jan 1, 2025

mattijn commented Jan 1, 2025

dangotbanned commented Jan 1, 2025

giswqs commented Jan 4, 2025

giswqs commented Jan 4, 2025

dangotbanned commented Jan 7, 2025

dsmedia commented Feb 4, 2025

dsmedia commented Feb 5, 2025

mattijn commented Feb 6, 2025 • edited Loading

dangotbanned commented Feb 6, 2025

mattijn commented Feb 6, 2025

dangotbanned commented Feb 6, 2025

dangotbanned commented Feb 6, 2025

Side note

mattijn commented Feb 6, 2025

dangotbanned commented Feb 6, 2025

mattijn commented Feb 6, 2025

dsmedia commented Feb 6, 2025

dangotbanned commented Feb 6, 2025 • edited Loading

Edit

mattijn commented Feb 8, 2025

dsmedia commented Feb 9, 2025

mattijn commented Feb 9, 2025

mattijn commented Feb 9, 2025 • edited Loading

dangotbanned commented Feb 9, 2025 • edited Loading

mattijn commented Feb 9, 2025

dsmedia commented Feb 18, 2025 • edited Loading

palewire commented Sep 27, 2019 •

edited by dsmedia

Loading

Example (not `altair`)

mattijn commented Sep 30, 2019 •

edited

Loading

palewire commented Oct 4, 2019 via email •

edited

Loading

palewire commented Oct 4, 2019 •

edited

Loading

mattijn commented Feb 6, 2025 •

edited

Loading

dangotbanned commented Feb 6, 2025 •

edited

Loading

mattijn commented Feb 9, 2025 •

edited

Loading

dangotbanned commented Feb 9, 2025 •

edited

Loading

dsmedia commented Feb 18, 2025 •

edited

Loading