|
| 1 | +# A more sensible Collector environment variable resolution |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The OpenTelemetry Collector supports three different syntaxes for |
| 6 | +environment variable resolution which differ in their syntax, semantics |
| 7 | +and allowed variable names. Before we stabilize confmap, we need to |
| 8 | +address several issues related to environment variables. This document |
| 9 | +describes: the current (as of v0.97.0) behavior of the Collector, the |
| 10 | +goals that an environment variable resolution should aim for, existing |
| 11 | +deviations from these goals and the desired behavior after making some |
| 12 | +changes. |
| 13 | + |
| 14 | +CLI environment variable resolution has a single syntax (`--config env:ENV`) |
| 15 | +and it is considered out of scope for this document, focusing |
| 16 | +instead on expansion within the Collector configuration. |
| 17 | + |
| 18 | +How to get from the current to desired behavior is also considered out |
| 19 | +of scope and will be discussed on individual PRs. It will likely involve |
| 20 | +one or multiple feature gates, warnings and transition periods. |
| 21 | + |
| 22 | +## Goals of an expansion system |
| 23 | + |
| 24 | +The following are considered goals of the expansion system: |
| 25 | + |
| 26 | +1. ***Expansion should happen only when the user expects it***. We |
| 27 | + should aim to expand when the user expects it and keep the original |
| 28 | + value when we don't (e.g. because the syntax is used for something |
| 29 | + different). |
| 30 | +2. ***Expansion should have predictable behavior***. |
| 31 | +3. ***Multiple expansion methods, if present, should have similar behavior.*** |
| 32 | + Switching from `${env:ENV}` to `${ENV}` or vice versa |
| 33 | + should not lead to (too many?) surprises. |
| 34 | +4. ***When the syntax overlaps, expansion should be aligned with*** |
| 35 | + [***the expansion defined by the Configuration Working Group***](https://github.com/open-telemetry/opentelemetry-specification/blob/032213cedde54a2171dfbd234a371501a3537919/specification/configuration/file-configuration.md#environment-variable-substitution). |
| 36 | + |
| 37 | +## Current behavior |
| 38 | + |
| 39 | +The Collector supports three different syntaxes for environment variable |
| 40 | +resolution: |
| 41 | + |
| 42 | +1. The *naked syntax*, `$ENV`. |
| 43 | +2. The *braces syntax*, `${ENV}`. |
| 44 | +3. The *env provider syntax*, `${env:ENV}`. |
| 45 | + |
| 46 | +These differ in the character set allowed for environment variable names |
| 47 | +as well as the type of parsing they return. Escaping is supported in all |
| 48 | +syntaxes by using two dollar signs. |
| 49 | + |
| 50 | +### Type casting rules |
| 51 | + |
| 52 | +A provider or converter takes a string and returns some sort of value |
| 53 | +after potentially doing some parsing. This gets stored in a |
| 54 | +confmap.Conf. When unmarshalling, we use mapstructure with |
| 55 | +WeaklyTypedInput enabled, which does a lot of implicit casting. The |
| 56 | +details of this type casting are complex and are outlined on issue |
| 57 | +[#9532](https://github.com/open-telemetry/opentelemetry-collector/issues/9532). |
| 58 | + |
| 59 | +When using this notation in inline mode (e.g. |
| 60 | +`http://endpoint/${env:PATH}`) we also do manual implicit type |
| 61 | +casting with a similar approach to mapstructure. These are outlined |
| 62 | +[here](https://github.com/open-telemetry/opentelemetry-collector/blob/fc4c13d3c2822bec39fa9d9658836d1a020c6844/confmap/expand.go#L124-L139). |
| 63 | + |
| 64 | +### Naked syntax |
| 65 | + |
| 66 | +The naked syntax is supported via the expand converter. It is |
| 67 | +implemented using the [`os.Expand`](https://pkg.go.dev/os#Expand) stdlib |
| 68 | +function. This syntax supports identifiers made up of: |
| 69 | + |
| 70 | +1. ASCII alphanumerics and the `_` character |
| 71 | +2. Certain special characters if they appear alone typically used in |
| 72 | + Bash: `*`, `#`, `$`, `@`, `!`, `?` and `-`. |
| 73 | + |
| 74 | +You can see supported identifiers in this example: |
| 75 | +[`go.dev/play/p/YfxLtYbsL6j`](https://go.dev/play/p/YfxLtYbsL6j). |
| 76 | + |
| 77 | +The environment variable value is taken as-is and the type is always |
| 78 | +string. |
| 79 | + |
| 80 | +### Braces syntax |
| 81 | + |
| 82 | +The braces syntax is supported via the expand converter. It is also |
| 83 | +implemented using the os.Expand stdlib function. This syntax supports |
| 84 | +any identifiers that don't contain `}`. Again, refer to the os.Expand |
| 85 | +example to see how it works in practice: |
| 86 | +[`go.dev/play/p/YfxLtYbsL6j`](https://go.dev/play/p/YfxLtYbsL6j). |
| 87 | + |
| 88 | +The environment variable value is taken as-is and the type is always |
| 89 | +string. |
| 90 | + |
| 91 | +### `env` provider |
| 92 | + |
| 93 | +The `env` provider syntax is supported via (you guessed it) the `env` |
| 94 | +provider. Its implementation is custom made. This syntax supports any |
| 95 | +identifier that does not have a $ in it, to support recursive |
| 96 | +resolution (e.g. `${env:${http://example.com}}` would get the |
| 97 | +environment variable whose name is stored in the URL |
| 98 | +`http://example.com`). |
| 99 | + |
| 100 | +The environment variable value is parsed by the yaml.v3 parser to an |
| 101 | +any-typed variable. The yaml.v3 parser mostly follows the YAML v1.2 |
| 102 | +specification with [*some exceptions*](https://github.com/go-yaml/yaml#compatibility). |
| 103 | +You can see |
| 104 | +how it works for some edge cases in this example: |
| 105 | +[`go.dev/play/p/RtPmH8aZA1X`](https://go.dev/play/p/RtPmH8aZA1X). |
| 106 | + |
| 107 | +### Issues of current behavior |
| 108 | + |
| 109 | +#### Unintuitive behavior on unset environment variables |
| 110 | + |
| 111 | +When an environment variable is empty, all syntaxes return an empty |
| 112 | +string with no warning given; this is frequently unexpected but can also |
| 113 | +be used intentionally. This is especially unintuitive when the user did |
| 114 | +not expect expansion to happen. Two examples where this is unexpected |
| 115 | +are the following: |
| 116 | + |
| 117 | +1. **Opaque values such as passwords that contain `$`** (issue |
| 118 | + [#8215](https://github.com/open-telemetry/opentelemetry-collector/issues/8215)). |
| 119 | + If the $ is followed by an alphanumeric character or one of the |
| 120 | + special characters, it's going to lead to false positives. |
| 121 | +2. **Prometheus relabel config** (issue |
| 122 | + [`contrib#9984`](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/9984)). |
| 123 | + Prometheus uses `${1}` in some of its configuration values. We |
| 124 | + resolve this to the value of the environment variable with name |
| 125 | + '`1`'. |
| 126 | +3. **Other uses of $** (issue |
| 127 | + [`contrib#11846`](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/11846)). |
| 128 | + If a product requires the use of `$` in some field, we would most |
| 129 | + likely interpret it as an environment variable. This is not |
| 130 | + intuitive for users. |
| 131 | + |
| 132 | +#### Unexpected type casting |
| 133 | + |
| 134 | +When using the env syntax we parse its value as YAML. Even if you are |
| 135 | +familiar with YAML, because of the implicit type casting rules and the |
| 136 | +way we store intermediate values, we can get unintuitive results. |
| 137 | + |
| 138 | +The most clear example of this is issue |
| 139 | +[*#8565*](https://github.com/open-telemetry/opentelemetry-collector/issues/8565): |
| 140 | +When setting a variable to value `0123` and using it in a string-typed |
| 141 | +field, it will end up as the string `"83"` (where as the user would |
| 142 | +expect the string to be `0123`). |
| 143 | + |
| 144 | +#### We are less restrictive than the Configuration WG |
| 145 | + |
| 146 | +The Configuration WG defines an [*environment variable expansion feature |
| 147 | +for SDK |
| 148 | +configurations*](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/file-configuration.md#environment-variable-substitution). |
| 149 | +This accepts only non empty alphanumeric + underscore identifiers |
| 150 | +starting with alphabetic or underscore. If the Configuration WG were to |
| 151 | +expand this in the future (e.g. to include other features present in |
| 152 | +Bash-like syntax), we would not be able to expand our braces syntax to |
| 153 | +support new features without breaking users. |
| 154 | + |
| 155 | +## Desired behavior |
| 156 | + |
| 157 | +*This section is written as if the changes were already implemented.* |
| 158 | + |
| 159 | +The Collector supports **two** different syntaxes for environment |
| 160 | +variable resolution: |
| 161 | + |
| 162 | +1. The *braces syntax*, `${ENV}`. |
| 163 | +2. The *env provider syntax*, `${env:ENV}`. |
| 164 | + |
| 165 | +These both have** the same character set and behavior**. They both use |
| 166 | +the env provider under the hood. |
| 167 | + |
| 168 | +The naked syntax supported in BASH is not supported in the Collector. |
| 169 | +Escaping is supported by using two dollar signs. Escaping is also |
| 170 | +honored for unsupported identifiers like `${1}`. |
| 171 | + |
| 172 | +### Type casting rules |
| 173 | + |
| 174 | +The environment variable value is parsed by the yaml.v3 parser to an |
| 175 | +any-typed variable and the original representation as a string is stored |
| 176 | +for numeric types. The `yaml.v3` parser mostly follows the YAML v1.2 |
| 177 | +specification with [*some |
| 178 | +exceptions*](https://github.com/go-yaml/yaml#compatibility). You can see |
| 179 | +how it works for some edge cases in this example: |
| 180 | +[*https://go.dev/play/p/RtPmH8aZA1X*](https://go.dev/play/p/RtPmH8aZA1X). |
| 181 | + |
| 182 | +When unmarshalling, we use mapstructure with WeaklyTypedInput |
| 183 | +**disabled**. We check via a hook an AsString method from confmap.Conf |
| 184 | +and use its return value when it is valid and we are mapping to a string |
| 185 | +field. This method has default casting rules for unambiguous scalar |
| 186 | +types but may return the original representation depending on the |
| 187 | +construction of confmap.Conf |
| 188 | + |
| 189 | +For using this notation in inline mode (e.g.`http://endpoint/${env:PATH}`), we |
| 190 | +use the AsString method from confmap.Conf. |
| 191 | + |
| 192 | +### Character set |
| 193 | + |
| 194 | +An environment variable identifier must be a nonempty ASCII alphanumeric |
| 195 | +or underscore starting with an alphabetic or underscore character. Its |
| 196 | +maximum length is 200 characters. Both syntaxes support recursive |
| 197 | +resolution. |
| 198 | + |
| 199 | +When an invalid identifier is found, a warning log is emitted, and the |
| 200 | +string is kept as-is. |
| 201 | + |
| 202 | +### Comparison table with current behavior |
| 203 | + |
| 204 | +This is a comparison between the current and desired behavior for |
| 205 | +loading a field with the braces syntax, `env` syntax. |
| 206 | + |
| 207 | +| Raw value | Field type | Current behavior, `${ENV}`, single field | Current behavior, `${env:ENV}` , single field | Desired behavior, entire field | Desired behavior, inline string field | |
| 208 | +|--------------|------------|------------------------------------------|------------------------------------------------|--------------------------------|---------------------------------------| |
| 209 | +| `123` | integer | 123 | 123 | 123 | n/a | |
| 210 | +| `0123` | integer | 83 | 83 | 83 | n/a | |
| 211 | +| `0123` | string | 0123 | 83 | 0123 (83?) | 0123 | |
| 212 | +| `0xdeadbeef` | string | 0xdeadbeef | 3735928559 | 0xdeadbeef (3735928559?) | 0xdeadbeef | |
| 213 | +| `"0123"` | string | "0123" | 0123 | 0123 | 0123 | |
| 214 | +| `!!str 0123` | string | !!str 0123 | 0123 | 0123 | 0123 (!!str 0123) | |
| 215 | +| `t` | boolean | true | true | Error: mapping string to bool | n/a | |
| 216 | +| `23` | boolean | true | true | Error: mapping integer to bool | n/a | |
0 commit comments