Skip to content

Commit 8e0344e

Browse files
committedMar 18, 2020
RFC: reduce lifecycle script environment size
PR-URL: #90 Credit: @isaacs Close: #90 Reviewed-by: @isaacs
1 parent ddd9793 commit 8e0344e

File tree

1 file changed

+146
-0
lines changed

1 file changed

+146
-0
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Reduce the environment variables provided to lifecycle scripts
2+
3+
In versions of npm up to v6, the following items are all placed into the
4+
environment of scripts run for various lifecycle events (install, prepare,
5+
etc., as well as explicit scripts such as test and start).
6+
7+
- `PATH` Configured to include `.../node_modules/.bin` for current and all
8+
parent `node_modules` directories.
9+
- `npm_package_*` for all `package.json` values in the current package for
10+
which the lifecycle event is running.
11+
- `npm_config_*` for all npm configuration values that do not start with a
12+
`_` character.
13+
- `npm_lifecycle_event` the current lifecycle event.
14+
- `npm_lifecycle_script` the command being run.
15+
- `npm_node_execpath` the path to the Node.js executable npm is using.
16+
- `npm_execpath` the path to the npm executable being run.
17+
18+
The suggestion presented here is to remove (or vastly reduce) the
19+
`npm_config_*` and `npm_package_*` environment variables from the context
20+
of lifecycle scripts, and potentially also add new fields that may be more
21+
useful to more users.
22+
23+
## Motivation
24+
25+
Lifecycle scripts are run in many different contexts throughout the npm
26+
codebase.
27+
28+
- Explicit scripts are run directly from the `lib/run-script.js`
29+
command implementation.
30+
- Build scripts are run from the context of the tree building logic, which
31+
is moving to a new implementation with `@npmcli/arborist` in v7.
32+
- Prepare scripts are run by `pacote` when it creates a tarball for
33+
publication or when it installs `git` dependencies.
34+
35+
All of this necessitates passing around a single configuration object,
36+
which has some problems.
37+
38+
1. It is tedious and error-prone, and has led to a more complicated
39+
codebase
40+
2. While we have not had security issues with it in the past, it runs the
41+
risk of exposing something sensitive in a context where it should not be
42+
exposed.
43+
3. It invites users to fork package behavior based on npm configuration,
44+
which should be a contract between the user and npm, and not between the
45+
user, npm, and the publisher.
46+
4. While the package.json data does not have as many of these problems, it
47+
is also largely unnecessary (and not widely used). The `package.json`
48+
file is readily available and easily parsed, and most scripts that would
49+
depend on package data simply read it directly.
50+
5. The environment is created anew for every script that's run. This could
51+
be optimized further, but as it currently stands, it's pretty
52+
inefficient.
53+
6. Lastly, exposing the full configuration and package.json makes the
54+
environment significantly larger, and can lead to problems on
55+
memory-constrained systems.
56+
57+
The advantage of including `npm_config_*` values in the lifecycle
58+
environment is that npm commands run from within lifecycle events will have
59+
the same config values as the process that spawned them, since `env` values
60+
will override any other values except explicit command line flags.
61+
62+
For example, a script named `release` may run tests, update the changelog,
63+
and then publishe the package. Running `npm run release --otp=123456` will
64+
put the two-factor auth one-time password into the `npm_config_otp`
65+
environment variable, so that the subsequent `npm publish` command will
66+
have the one-time password provided in the config.
67+
68+
## Detailed Explanation
69+
70+
1. Remove `npm_package*` values from the script lifecycle environment.
71+
2. Provide a new field, `npm_package_json`, with the path to the
72+
`package.json` file.
73+
3. Provide a new field, `npm_command`, with the canonical name of the
74+
command being run.
75+
3. Remove all `npm_config_*` values from the script lifecycle environment
76+
_except_:
77+
1. `npm_config_userconfig`
78+
2. `npm_config_globalconfig`
79+
3. Environment variables corresponding to any non-default config
80+
values.
81+
4. Add `npm_package_from`, `npm_package_resolved`, and
82+
`npm_package_integrity` for the package whose lifecycle event is
83+
running, if it's part of an install. (This addresses the needs of build
84+
tools, as discussed in
85+
[#38](https://github.com/npm/rfcs/pull/38#issuecomment-529182151).)
86+
5. `PATH` will continue to be provided as it currently is, so that scripts
87+
find their dependencies' executables first.
88+
89+
This makes it easier to find and rely on package.json data, while ensuring
90+
that config defaults are maintained, without blowing up the size of the
91+
environment for lifecycle processes, or requiring access to the npm config
92+
subsystem in every npm CLI dependency.
93+
94+
## Rationale and Alternatives
95+
96+
Possible alternatives:
97+
98+
### Just go ahead and pass around the whole config object like we do today
99+
100+
This is not ideal for the reasons mentioned above, but also, it makes it
101+
virtually assured that Arborist remains tightly coupled to the npm cli.
102+
While _some_ degree of coupling is unavoidable, having to provide a valid
103+
npm config object would make this coupling much tighter than necessary.
104+
105+
### Inversion of Control on the npm-lifecycle environment creation
106+
107+
Rather than provide a config object matching a given interface, provide
108+
`npm-lifecycle` with a method that can build up and return the environment
109+
object.
110+
111+
This approach would address the tight coupling between cli and arborist,
112+
but it doesn't address the other problems with having a giant config object
113+
dumped into the environment.
114+
115+
## Implementation
116+
117+
The npm CLI will set all non-default config flags in the environment so
118+
that scripts and sub-scripts will have them set in their configs by default
119+
at the env level.
120+
121+
Replace `npm-lifecycle` with a lighter-weight approach, published as
122+
`@npmcli/run-script`.
123+
124+
Instead of building the environment up from the config and package data,
125+
`@npmcli/run-script` will only set `npm_package_json` to the path to the
126+
package.json file for the package being run, take an object to define
127+
additional environment variables, and always inherit the environment in the
128+
child process being run.
129+
130+
Because the npm CLI sets the relevant config fields, they'll be inherited
131+
to the child processes automatically. Arborist will use the environment
132+
option to pass in the `npm_package_from`, `npm_package_resolved`, and
133+
`npm_package_integrity` values.
134+
135+
## Prior Art
136+
137+
npm v6 and yarn both do roughly the same thing, though they have different
138+
config values.
139+
140+
## Downsides
141+
142+
Some modules today use `npm_config_argv`. These will have to be updated to
143+
use other means to get this information.
144+
145+
Where the argv is being parsed in order to determine the command being run,
146+
the `npm_command` environ provides a safer approach.

0 commit comments

Comments
 (0)
Please sign in to comment.