Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRESOLVER-466] Site documentation updates #409

Merged
merged 9 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 204 additions & 0 deletions src/site/markdown/common-misconceptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Common Misconceptions
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

Due to smooth transitions from Maven2 into Maven3 (and soon
Maven4), and the fact that Maven2 plugins kept working with Maven3, maybe
even without change, there were some misconceptions crept in
as well. Despite the marvel of "compatibility", Maven3 resolution
differs quite much from Maven2, and the sole reason is actual improvement
in area of resolution, it became much more precise (and, due
that lost some "bad" habits present in Maven2). Here, we will try to
enumerate some of the most common misconceptions.

## Misconception No1: How Resolver Works

(Simplified)

The most typical use case for Resolver is to "resolve transitively"
dependencies. Resolver, to achieve this, internally (but these are
exposed via API as distinguished API calls as well) performs 3 steps:
"collect", "transform" and "resolve".

The "collect" step is first, where it builds the "dirty tree" (dirty graph)
of artifacts. It is important to remark, that in "collect" step, while
the graph is being built, Maven uses only POMs. Hence, if collecting an
Artifact that was never downloaded to your local repository, it will
download **the POMs only**. Using POMs resolver is able to build current
"node" of graph, but also figure outgoing vertices and adjacent nodes of
current node and so on. Which dependency is chosen to continue with from
the current node POM is decided by various criteria (configured).

The "transform" step transforms the "dirty graph": this is where conflict resolution
happens. It is here when resolver applies various rules to resolve conflicting
versions, conflicting scopes, and so on. Here, if "verbose tree" is asked for,
conflict resolution does not remove graph nodes, merely marks the conflicts
and the conflict "winner". Thus, "verbose tree" cannot be resolved.

Finally, the "resolve" step runs, when the (transformed) graph node artifacts
are being resolved, basically ensuring (and downloading if needed) their
correspondent files (i.e. JAR files) are present in local repository.

It is important to state, that in "collect" step happens the selection of nodes
by various criteria, among other by the configured scope filters. And here we
come to the notion of "runtime classpath" vs "test classpath".

In resolver, maybe un-intuitively, the "scope filter" is usually used (but does
not have to, this is just how it IS used in Maven Core, probably for historical
reasons) as "what should be omitted". The default session filter in Maven
is set up as this:

```
new ScopeDependencySelector("test", "provided")
```

This means, that "current dependency node" dependencies in "test" and "provided" scope
will be simply omitted from the graph. In other words, this filter builds
the "downstream runtime classpath" of supplied artifact (i.e. "what is needed by the
artifact at runtime when I depend on it").

With selector like this:

```
new ScopeDependencySelector("provided")
```

the "downstream dependency test classpath" would be built. Aside of giving example,
this selector is actually never used, as "test classpath" makes sense only in the
scope of "current project", but not for "downstream dependant projects".

Note: these are NOT "Maven related" notions yet, there is nowhere Maven in picture here,
and these are not the classpath used by Compiler or Surefire plugins, merely just
a showcase how Resolver works.


## Misconception No2: "Test classpath" Is Superset of "Runtime classpath"

**Wrong**. As can be seen from above, for runtime classpath we leave out "test" scoped
dependencies. It was true in Maven2, where test classpath really was superset of runtime,
this does not stand anymore in Maven3. And this have interesting consequences. Let me show an example:

(Note: very same scenario, as explained below for Guice+Guava would work for Jackson Databind+Core, etc.)

Assume your project is using Google Guice, so you have declared it as a dependency:

```
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
<version>${guiceVersion}</version>
</dependency>
```

All fine and dandy. At the same time, you want to avoid any use of Guava. We all know Guava is a direct dependency
of Guice. This is fine, since as we know, the best practice is to declare all dependencies your code compiles
against. By not having Guava here, analyse tools will report if code touches Guava as "undeclared dependency".

But let's go one step further: turns out, to set up your unit tests, you **do need** Guava. So what now? Nothing, just
add it as a test dependency, so your POM looks like this:

```
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
<version>${guiceVersion}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guavaVersion}</version>
<scope>test</scope>
</dependency>
```

The `dependency:tree` plugin for this project outputs this verbose tree:

```
[INFO] --- dependency:3.6.1:tree (default-cli) @ DEMO-PROJECT ---
[INFO] DEMO-PROJECT
[INFO] +- com.google.inject:guice:jar:6.0.0:compile
[INFO] | +- javax.inject:javax.inject:jar:1:compile
[INFO] | +- jakarta.inject:jakarta.inject-api:jar:2.0.1:compile
[INFO] | +- aopalliance:aopalliance:jar:1.0:compile
[INFO] | \- (com.google.guava:guava:jar:31.0.1-jre:compile - omitted for duplicate)
[INFO] \- com.google.guava:guava:jar:31.0.1-jre:test (scope not updated to compile)
[INFO] +- com.google.guava:failureaccess:jar:1.0.1:test
[INFO] +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:test
[INFO] +- com.google.code.findbugs:jsr305:jar:3.0.2:test
[INFO] +- org.checkerframework:checker-qual:jar:3.12.0:test
[INFO] +- com.google.errorprone:error_prone_annotations:jar:2.7.1:test
[INFO] \- com.google.j2objc:j2objc-annotations:jar:1.3:test
```

And is right, this IS the "test classpath" **of the project** and contains a conflict as noted by "omitted for duplicate"
and "scope not updated to compile" remarks next to Guava nodes.

So this setup results that:
* when you compile, it ensures Guava is NOT on compile classpath, so you cannot even touch it (by mistake)
* when test-compile and test-execute runs, Guava will be present on classpath, as expected

So far good, but what happens when this library is consumed downstream by someone? When it becomes used as a library?
Nothing, all works as expected!

When a downstream dependency declares dependency on this project, the downstream project will get this graph (from
the node that is your library):

```
[INFO] --- dependency:3.6.1:tree (default-cli) @ DOWNSTREAM-PROJECT ---
[INFO] DOWNSTREAM-PROJECT
[INFO] \- DEMO-PROJECT:compile
[INFO] \- com.google.inject:guice:jar:6.0.0:compile
[INFO] +- javax.inject:javax.inject:jar:1:compile
[INFO] +- jakarta.inject:jakarta.inject-api:jar:2.0.1:compile
[INFO] +- aopalliance:aopalliance:jar:1.0:compile
[INFO] \- com.google.guava:guava:jar:31.0.1-jre:compile
[INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] +- org.checkerframework:checker-qual:jar:3.12.0:compile
[INFO] +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile
[INFO] \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
```

So what happens here? First, revisit "How Resolver Works", there you will see that for "runtime classpath" of the
dependency the "test" and "provided" scopes of the dependency artifact **are not even considered**. They are simply
omitted. Not skipped, but completely omitted, like they do not even exists. Hence, in the graph there is
**no conflict happening** (as "test" Guava is completely omitted during "collect" step). Hence, everything
goes as expected.

### Important Consequences

One, maybe not so obvious consequence can be explained with use of `maven-assembly-plugin`. Let assume you want to
assemble your module "runtime" dependencies.

If you do it from "within" of the project, for example in package phase, your packaging will be incomplete:
Guava will be missing! But if you do it from "outside" of the project (i.e. subsequent module of the build, or
downstream dependency), the assembly will contain Guava as well.

This is a [Maven Assembly plugin bug](https://issues.apache.org/jira/browse/MASSEMBLY-1008), somewhat explained
in [MRESOLVER-391](https://issues.apache.org/jira/browse/MRESOLVER-391). In short, Maven Assembly plugin considers
"project test classpath", and then "cherry-picks runtime scoped nodes" from it, which, as we can see in this case,
is wrong. You need to build different graphs for "runtime" and "test" classpath, unlike as it was true in Maven2.
For Assembly plugin, the problem is that as Mojo, it requests "test classpath", then it reads configuration
(assembly descriptor, and this is the point where it learns about required scopes), and then it "filters"
the resolved "test classpath" by runtime scopes. And it is wrong, as Guava is in test scope. Instead, the plugin
should read the configuration first, and ask Resolver for "runtime classpath" and filter that. In turn, this problem
does not stand with `maven-war-plugin`, as the "war" Mojo asks for "compile+runtime" scope. Of course, WAR use case
is much simpler than Assembly use case is, as former always packages same scope, while Assembly receives a complex
configuration and exposes much more complex "modus operandi".
71 changes: 57 additions & 14 deletions src/site/markdown/expected-checksums.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,14 @@ in their response. Since advent of modern Repository Managers, most of
them already sends checksums (usually the "standard" SHA-1 and MD5)
in their response headers. Moreover, Maven Central, and even Google Mirror of Maven Central
sends them as well. By extracting these checksums from response, we can get hashes
that were provided by remote repository along with its content.
that were provided by remote repository along with its content. This saves one HTTP round-trip, as we
got both, content and checksums in one response.

Finally, the **Remote External** checksums are the classic checksums we all know: They are laid down
Finally, the **Remote External** checksums are the "classic" checksums we all know: They are laid down
next to Artifact files, external in other words on the remote repository, according
to remote repository layout. To obtain Remote External checksum, new request again remote repository is
required. The order of requested checksums will follow the order given in `aether.checksums.algorithms`,
it asks for checksums in same order as the parameter contains algorithm names.
to remote repository layout. To obtain Remote External checksum, new HTTP request against remote repository is
required. The order of requested checksums will follow the order given in layout configuration,
asking for checksums in same order as the parameter contains algorithm names.

During single artifact retrieval, these strategies are executed in above specified order,
and only if current strategy has "no answer", the next strategy is attempted. Hence, if
Expand All @@ -78,7 +79,26 @@ be probably satisfied by "Remote Included" strategy and "Remote External" will b
The big win here is that by obtaining hashes using "Remote Included" and not by "Remote External"
strategy, we can halve the count of HTTP requests to download an Artifact.

### Remote Included Strategies
Related configuration keys:
* `aether.layout.maven2.checksumAlgorithms` A comma-separated list of checksum algorithms. Order is important, as
transport will ask for those in specified order (default is "SHA-1,MD5"), and first received and matched causes
integrity validation algorithm to stop.

Note: Since Maven 3.9.x you can use expression `${session.rootDirectory}/.mvn/checksums/` to store checksums along with
sources as `session.rootDirectory` will become an absolute path pointing to the root directory of your project (where
usually the `.mvn` directory is).


### Provided Checksums

There is a Resolver SPI `ProvidedChecksumsSource` that makes possible to feed Provided Checksums to Resolver ahead
of actual transport. These checksums are used **during transport only** to verify transported payload (artifacts)
integrity. Hence, Provided checksums are NOT usable to verify already cached artifacts integrity (unless you build
with empty repository, of course, that forces all of your artifact go through transport).

Resolver out of the box provides one SPI implementation: one that simply delegates to "trusted checksums".

### Remote Included Checksums

**Note: Remote Included checksums work only with transport-http, they do NOT work with transport-wagon!**

Expand All @@ -87,6 +107,11 @@ count, since many repository services along Maven Central emits the reference ch
the artifact response itself (as HTTP headers). Hence, we are able to get the
artifact and reference "expected" checksum using only one HTTP round-trip.

Related configuration keys:
* `aether.connector.basic.smartChecksums` to enable or disable Remote Included checksums.

The Remote Included checksums support several "strategies" to extract checksums from HTTP response header.


#### Sonatype Nexus 2

Expand All @@ -100,21 +125,30 @@ Emitted by: Sonatype Nexus2 only.

Maven Central emits headers `x-checksum-sha1` and `x-checksum-md5` along with artifact response.
Google GCS on the other hand uses `x-goog-meta-checksum-sha1` and `x-goog-meta-checksum-md5`
headers. Resolver will detect these and use their value.
headers. Resolver will detect all these and use their value.

Emitted by: Maven Central, GCS, some CDNs and probably more.


### Remote External checksums

These are the "classic" checksums existing since Maven 1. They are laid on layout in the remote repository, next
to the payload file (i.e. "lib.jar" and checksum "lib.jar.sha1"). While they are the oldest kind of Resolver checksums,
their shortcoming is that most often only SHA-1 and MD5 are produced. Basically, consumer is tied to those checksum
algorithms only, that are provided by remote repository. Similarly, given both, the payload and the checksum comes
from same origin, unless the origin is trusted (like Maven Central is), it may be seen as a risk.


## Trusted Checksums

All the "expected" checksums discussed above are transport bound, they are all
All the "expected" checksums discussed above are used in transport only, they are all
about URLs, HTTP requests and responses, or require Transport related API elements.

Trusted checksums is a SPI component that is able to deliver "expected" checksums
`TrustedChecksumsSource` is a SPI component that is able to deliver "expected" checksums
for given Artifact, without use of any transport API element. In other words, this
API is not bound to transport.
API is not bound to transport, but is generic.

Since they map almost on-to-one into transport "Provided Checksum" strategy, resolver provides
Since they map almost one-to-one into transport "Provided Checksum" strategy, resolver provides
implementation that delegates Provided to Trusted checksums (makes Provided and Trusted
checksums equivalent, transport-wise).

Expand All @@ -124,11 +158,19 @@ Trusted Checksums is ArtifactResolver post-processing.
This new functionality, at the cost of checksum calculation overhead, is able to validate all
the resolved artifacts against Trusted Checksums, thus, making sure that all resolved
artifacts are "validated" with some known (possibly even cryptographically strong) checksum
provided by user. This new feature may become handy in cases when user does not trust the local
repository, as it may be shared with some other unknown or even untrusted party.
provided by user. This new feature may become handy in cases when user cannot trust the local
repository, as it may be shared with some other unknown or even untrusted parties.

Moreover, using Resolver Trusted Checksum post-processor, one can "record" the checksums,
for example when executed in a known "pristine" and safe environment, and reuse the produced
checksum to distribute within organization.

The Trusted Checksums provide two source implementations out of the box.

Related configuration keys:
* `aether.trustedChecksumsSource.*`
* `aether.artifactResolver.postProcessor.trustedChecksums.*`

### Summary File Trusted Checksums Source

The summary file source uses single file that is in GNU coreutils compatible format: each
Expand All @@ -144,5 +186,6 @@ Each summary file contains information for single checksum algorithm, represente
This source mimics Maven local repository layout, and stores checksums in similar layout
as Maven local repository stores checksums in local repository.

Hare, just like Maven local repository, the sparse directory can contain multiple algorithm checksums,
Here, just like Maven local repository, the sparse directory can contain multiple algorithm checksums,
as they are coded in checksum file path (the extension).

4 changes: 4 additions & 0 deletions src/site/markdown/remote-repository-filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ directory. It will be referred to in this document with `${filterBasedir}` place
To explicitly set filter basedir, use following setting: `-Daether.remoteRepositoryFilter.${filterName}.basedir=somePath`,
where "somePath" can be relative path, then is resolved from local repository root, or absolute path, then is used as is.

Since Maven 3.9.x you can use expression like `${session.rootDirectory}/.mvn/rrf/` to store filter data along with
sources as `session.rootDirectory` will become an absolute path pointing to the root directory of your project (where
usually the `.mvn` directory is).

### The Prefixes Filter

The "prefixes" named filter relies on a file containing a list of "repository prefixes" available from a given repository.
Expand Down
1 change: 1 addition & 0 deletions src/site/site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ under the License.
<item name="About Local Repository" href="local-repository.html"/>
<item name="Remote Repository Filtering" href="remote-repository-filtering.html"/>
<item name="Third-party Integrations" href="third-party-integrations.html"/>
<item name="Common Misconceptions" href="common-misconceptions.html"/>
<item name="Upgrading Resolver" href="upgrading-resolver.html"/>
<item name="JavaDocs" href="apidocs/index.html"/>
<item name="Source Xref" href="xref/index.html"/>
Expand Down