I wanted a database of all zip codes in the country, with their lat & long. And I wanted it to be easy to import and easy to update.
I found this gist, which was great and did most of the hard work, but I wanted to:
- Put the scripts in a single location
- Have the database in dolthub for easy querying, updating, and sharing
- Be able to easily update the zip code data
Later on, I wanted the population in the database as well, so I added that in.
The cool thing about dolthub is you can use it as an API. Get a random zip code:
http "https://www.dolthub.com/api/v1alpha1/iloveitaly/zip_codes_with_lat_and_lng/main?q=SELECT+*%0AFROM+%60zip_codes%60%0AORDER+BY+RAND%28%29%0ALIMIT+1%3B"
Even better, get a random zip code by population:
https://www.dolthub.com/api/v1alpha1/iloveitaly/zip_codes_with_lat_and_lng/main?q=SELECT+*%0AFROM+%28%0A++++SELECT+*%0A++++FROM+%60zip_codes%60%0A++++ORDER+BY+%60population%60+DESC%0A++++LIMIT+100%0A%29+AS+top_100%0AORDER+BY+RAND%28%29%0ALIMIT+1%3B
- Download the latest zip code database ("gazetteer") from the Census Bureau. This contains lat/lng and zip code data.
- Transform this data and import it into a Dolthub database.
- Separately, manually download population data from the Census Bureau.
- Transform this data, import into dolt, and then update the existing zip code data with the population data.
The entrypoint to this process is bin/download-gazetteer
.
- JSON
- CSV
- CSV with PK ID
- SQL (mysql dialect)
Where did this data come from?
- https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html
- "ZIP Code Tabulation Areas"
This is documented in the bin/download-gazetteer
script.
The data is available on Dolthub here:
https://www.dolthub.com/repositories/iloveitaly/zip_codes_with_lat_and_lng
- Update download URL in `bin/download-gazetteer
- Run
bin/download-gazetteer
- Profit
- clean up the population stuff, mostly vibe-coded