Import company affiliations from cncf/devstats into GrimoireLab Sorting Hat database.
Setting Sorting Hat database parameters: you can either provide full database connect string/dsn via SH_DSN=...
or provide all or some paramaters individually, via SH_*
environment variables. SH_DSN=..
has a higher priority and no SH_*
parameters are used if SH_DSN
is provided. When using SH_*
parameters, only SH_PASS
is required, all other parameters have default values.
Sorting Hat database connection parameters:
SH_DSN
- provides full database connect string, for example:SH_DSN='shuser:shpassword@tcp(shhost:shport)/shdb?charset=utf8'
SH_USER
- user name, defaults toshuser
.SH_PASS
- password - required.SH_PROTO
- protocol, defaults totcp
.SH_HOST
- host, defaults tolocalhost
.SH_PORT
- port, defaults to3306
.SH_DB
- database name, defaults toshdb
.SH_PARAMS
- additional parameters that can be specified via?param1=value1¶m2=value2&...¶mN=valueN
, defaults to?charset=utf8
. You can useSH_PARAMS='-'
to specify empty params.
To cleanup existing company affiliations (delete from organizations
and enrollments
tables) set the SH_CLEANUP
variable.
Testing connection:
SH_TEST_CONNECT
- set this variable to only test connection.
json2hat
needs to read cncf/devstats
affiliations json file. It first tries to read a local json file and fallbacks to a remote file.
You can set local file path via SH_LOCAL_JSON_PATH=/path/to/github_users.json
. Default value is github_users.json
. If local file is found then no remote file is read.
You can set remote file path via SH_REMOTE_JSON_PATH=http://some.url.org/path/to/github_users.json
. Default value is https://github.com/cncf/devstats/raw/master/github_users.json
. This file is only read when reading local json fails. If both local and remote files cannot be read program exists with a fatal error message.
json2hat
needs to read cncf/devstats
company acquisitions/name mapping yaml file. It first tries to read a local json file and fallbacks to a remote file.
You can set local file path via SH_LOCAL_YAML_PATH=/path/to/companies.yaml
. Default value is companies.yaml
. If local file is found then no remote file is read.
You can set remote file path via SH_REMOTE_YAML_PATH=http://some.url.org/path/to/companies.yaml
. Default value is https://github.com/cncf/devstats/raw/master/companies.yaml
. This file is only read when reading local json fails. If both local and remote files cannot be read program exists with a fatal error message.
json2hat
reads this file for mappings.
json2hat
is packaged as a docker image docker.io/dajohn/json2hat. You can use scripts from docker/
directory to manage docker image.
Scripts (most require setting docker username via something like this: docker login; DOCKER_USER=your_user_name ./docker/docker_scriptname.sh
):
docker/docker_build.sh
- this will buildjson2hat
docker image. Image is using multi layer setup to build the smallest possible output. It don't even havebash
. SeeDockerfile
for details. Image is only about 6Mb size.docker/docker_run.sh
- this will executejson2hat
from within the container. You should passSH_*
variables to control Sorting Hat database connection and affiliations JSON path.docker/docker_publish.sh
- it will publishjson2hat
image to your docker hub.docker/docker_pull.sh
- it will pulljson2hat
image from your docker hub.docker/docker_remove.sh
- removes generatedjson2hat
docker image.docker/docker_cleanup.sh
- removes generatedjson2hat
docker image and executesdocker system prune
.
- Replace env with
prod
ortest
orlocal
:./json2hat.sh env
. - Pass
ONLY_GGH_USERNAME=1
if you want to match username only for git and GitHub source. - Pass
ONLY_GGH_NAME=1
if you want to match name only for git and GitHub source. - Clear
NO_PROFILE_UPDATE
env if you do not want import to be able to update country and other profile data. - Pass
REPLACE=1
env if you want to replace any existing affiliations found (will only touch affiliations withproject_slug
likecncf/*
orcncf-f
). - Pass
DRY_RUN=1
to avoid and DB writing. - Pass
SKIP_BOTS=1
to avoid auto marking bots. - Pass
ONLY_GGH_USERNAME=1
to match usernames only for git or GitHub usernames. - Pass
ONLY_GGH_NAME=1
to match names only for git or GitHub names. - Use
NAME_MATCH=n
to specify how to match using name: 0 - do not match using name, 1 - match only when single hit, 2 - match on multiple hits, default is 1. - Set
ORGS_RO=1
to skip adding any new organizations. It will dump a CSV file with missing org names then and won't add any enrollments to orgs that were not found (directly, lowerace or by acquisition or mapping YAMLs). - Set
MISSING_ORGS_CSV=filename.csv
to specify filename containing missing orgs (only whenORGS_RO
is used), default ismissing.csv
if not specified.
You should call DA affiliations API map_org_names
after a successfull CNCF affiliations data import.