diff --git a/src/site/markdown/common-misconceptions.md b/src/site/markdown/common-misconceptions.md new file mode 100644 index 000000000..ea2d453d0 --- /dev/null +++ b/src/site/markdown/common-misconceptions.md @@ -0,0 +1,194 @@ +# Common Misconceptions + + +Due to smooth transitions from Maven2 into Maven3 (and soon +Maven4), and the fact that Maven2 plugins kept working with Maven3, maybe +even without change, there were some misconceptions crept in +as well. Despite the marvel of "compatibility", Maven3 resolution +differs quite much from Maven2, and the sole reason is actual improvement +in area of resolution, it became much more precise (and, due +that lost some "bad" habits present in Maven2). Here, we will try to +enumerate some of the most common misconceptions. + +## Misconception No1: How Resolver Works + +(Simplified) + +The most typical use case for Resolver is to "resolve transitively" +dependencies. Resolver, to achieve this, internally (but these are +exposed via API as distinguished API calls as well) performs 3 steps: +"collect", "transform" and "resolve". + +The "collect" step is first, where it builds the "dirty tree" (dirty graph) +of artifacts. It is important to remark, that in "collect" step, while +the graph is being built, Maven uses only POMs. Hence, if collecting an +Artifact that was never downloaded to your local repository, it will +download **the POMs only**. Using POMs resolver is able to build current +"node" of graph, but also figure outgoing vertices and adjacent nodes of +current node and so on. Which dependency is chosen to continue with from +the current node POM is decided by various criteria (configured). + +The "transform" step transforms the "dirty graph": this is where conflict resolution +happens. It is here when resolver applies various rules to resolve conflicting +versions, conflicting scopes, and so on. Here, if "verbose tree" is asked for, +conflict resolution does not remove graph nodes, merely marks the conflicts +and the conflict "winner". Thus, "verbose tree" cannot be resolved. + +Finally, the "resolve" step runs, when the (transformed) graph node artifacts +are being resolved, basically ensuring (and downloading if needed) their +correspondent files (i.e. JAR files) are present in local repository. + +It is important to state, that in "collect" step happens the selection of nodes +by various criteria, among other by the configured scope filters. And here we +come to the notion of "runtime graph" vs "test graph". + +In resolver, maybe un-intuitively, the "scope filter" is usually used (but does +not have to, this is just how it IS used in Maven Core, probably for historical +reasons) as "what should be omitted". The default session filter in Maven +is set up as this: + +``` + new ScopeDependencySelector("test", "provided") +``` + +This means, that "current dependency node" dependencies in "test" and "provided" scope +will be simply omitted from the graph. In other words, this filter builds +the "downstream runtime classpath" of supplied artifact (i.e. "what is needed by the +artifact at runtime when I depend on it"). + +Note: these are NOT "Maven related" notions yet, there is nowhere Maven in picture here, +and these are not the classpath used by Compiler or Surefire plugins, merely just +a showcase how Resolver works. + + +## Misconception No2: "Test graph" Is Superset Of "Runtime graph" + +**Wrong**. As can be seen from above, for runtime graph we leave out "test" scoped +dependencies. It was true in Maven2, where test graph really was superset of runtime, +this does not stand anymore in Maven3. And this have interesting consequences. Let me show an example: + +(Note: very same scenario, as explained below for Guice+Guava would work for Jackson Databind+Core, etc.) + +Assume your project is using Google Guice, so you have declared it as a dependency: + +``` + + com.google.inject + guice + ${guiceVersion} + +``` + +All fine and dandy. At the same time, you want to avoid any use of Guava. We all know Guava is a direct dependency +of Guice. This is fine, since as we know, the best practice is to declare all dependencies your code compiles +against. By not having Guava here, analyse tools will report if code touches Guava as "undeclared dependency". + +But let's go one step further: turns out, to set up your unit tests, you **do need** Guava. So what now? Nothing, just +add it as a test dependency, so your POM looks like this: + +``` + + com.google.inject + guice + ${guiceVersion} + + + com.google.guava + guava + ${guavaVersion} + test + +``` + +The `dependency:tree` plugin for this project outputs this verbose tree: + +``` +[INFO] --- dependency:3.6.1:tree (default-cli) @ DEMO-PROJECT --- +[INFO] DEMO-PROJECT +[INFO] +- com.google.inject:guice:jar:6.0.0:compile +[INFO] | +- javax.inject:javax.inject:jar:1:compile +[INFO] | +- jakarta.inject:jakarta.inject-api:jar:2.0.1:compile +[INFO] | +- aopalliance:aopalliance:jar:1.0:compile +[INFO] | \- (com.google.guava:guava:jar:31.0.1-jre:compile - omitted for duplicate) +[INFO] \- com.google.guava:guava:jar:31.0.1-jre:test (scope not updated to compile) +[INFO] +- com.google.guava:failureaccess:jar:1.0.1:test +[INFO] +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:test +[INFO] +- com.google.code.findbugs:jsr305:jar:3.0.2:test +[INFO] +- org.checkerframework:checker-qual:jar:3.12.0:test +[INFO] +- com.google.errorprone:error_prone_annotations:jar:2.7.1:test +[INFO] \- com.google.j2objc:j2objc-annotations:jar:1.3:test +``` + +And is right, this IS the "test graph" **of the project** and contains a conflict as noted by "omitted for duplicate" +and "scope not updated to compile" remarks next to Guava nodes. + +So this setup results that: +* when you compile, it ensures Guava is NOT on compile classpath, so you cannot even touch it (by mistake) +* when test-compile and test-execute runs, Guava will be present on classpath, as expected + +So far good, but what happens when this library is consumed downstream by someone? When it becomes used as a library? +Nothing, all works as expected! + +When a downstream dependency declares dependency on this project, the downstream project will get this graph (from +the node that is your library): + +``` +[INFO] --- dependency:3.6.1:tree (default-cli) @ DOWNSTREAM-PROJECT --- +[INFO] DOWNSTREAM-PROJECT +[INFO] \- DEMO-PROJECT:compile +[INFO] \- com.google.inject:guice:jar:6.0.0:compile +[INFO] +- javax.inject:javax.inject:jar:1:compile +[INFO] +- jakarta.inject:jakarta.inject-api:jar:2.0.1:compile +[INFO] +- aopalliance:aopalliance:jar:1.0:compile +[INFO] \- com.google.guava:guava:jar:31.0.1-jre:compile +[INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile +[INFO] +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile +[INFO] +- com.google.code.findbugs:jsr305:jar:3.0.2:compile +[INFO] +- org.checkerframework:checker-qual:jar:3.12.0:compile +[INFO] +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile +[INFO] \- com.google.j2objc:j2objc-annotations:jar:1.3:compile +``` + +So what happens here? First, revisit "How Resolver Works", there you will see that for "runtime graph" of the +dependency the "test" and "provided" scopes of the dependency artifact **are not even considered**. They are simply +omitted. Not skipped, but completely omitted, like they do not even exists. Hence, in the graph there is +**no conflict happening** (as "test" Guava is completely omitted during "collect" step). Hence, everything +goes as expected. + +### Important Consequences + +One, maybe not so obvious consequence can be explained with use of `maven-assembly-plugin`. Let assume you want to +assemble your module "runtime" dependencies. + +If you do it from "within" of the project, for example in package phase, your packaging will be incomplete: +Guava will be missing! But if you do it from "outside" of the project (i.e. subsequent module of the build, or +downstream dependency), the assembly will contain Guava as well. + +This is a [Maven Assembly plugin bug](https://issues.apache.org/jira/browse/MASSEMBLY-1008), somewhat explained +in [MRESOLVER-391](https://issues.apache.org/jira/browse/MRESOLVER-391). In short, Maven Assembly plugin considers +"project test graph", and then "cherry-picks runtime scoped nodes" from it, which, as we can see in this case, +is wrong. You need to build different graphs for "runtime" and "test" classpath, unlike as it was true in Maven2. +For Assembly plugin, the problem is that as Mojo, it requests "test graph", then it reads configuration +(assembly descriptor, and this is the point where it learns about required scopes), and then it "filters" +the resolved "test graph" for runtime scopes. And it is wrong, as Guava is in test scope. Instead, the plugin +should read the configuration first, and ask Resolver for "runtime graph" and filter that. In turn, this problem +does not stand with `maven-war-plugin`, as the "war" Mojo asks resolution of "compile+runtime" scope. Of course, +WAR use case is much simpler than Assembly use case is, as former always packages same scope, while Assembly receives +a complex configuration and exposes much more complex "modus operandi". diff --git a/src/site/markdown/expected-checksums.md b/src/site/markdown/expected-checksums.md index 236137d3b..6038d8c12 100644 --- a/src/site/markdown/expected-checksums.md +++ b/src/site/markdown/expected-checksums.md @@ -59,13 +59,14 @@ in their response. Since advent of modern Repository Managers, most of them already sends checksums (usually the "standard" SHA-1 and MD5) in their response headers. Moreover, Maven Central, and even Google Mirror of Maven Central sends them as well. By extracting these checksums from response, we can get hashes -that were provided by remote repository along with its content. +that were provided by remote repository along with its content. This saves one HTTP round-trip, as we +got both, content and checksums in one response. -Finally, the **Remote External** checksums are the classic checksums we all know: They are laid down +Finally, the **Remote External** checksums are the "classic" checksums we all know: They are laid down next to Artifact files, external in other words on the remote repository, according -to remote repository layout. To obtain Remote External checksum, new request again remote repository is -required. The order of requested checksums will follow the order given in `aether.checksums.algorithms`, -it asks for checksums in same order as the parameter contains algorithm names. +to remote repository layout. To obtain Remote External checksum, new HTTP request against remote repository is +required. The order of requested checksums will follow the order given in layout configuration, +asking for checksums in same order as the parameter contains algorithm names. During single artifact retrieval, these strategies are executed in above specified order, and only if current strategy has "no answer", the next strategy is attempted. Hence, if @@ -78,7 +79,26 @@ be probably satisfied by "Remote Included" strategy and "Remote External" will b The big win here is that by obtaining hashes using "Remote Included" and not by "Remote External" strategy, we can halve the count of HTTP requests to download an Artifact. -### Remote Included Strategies +Related configuration keys: +* `aether.layout.maven2.checksumAlgorithms` A comma-separated list of checksum algorithms. Order is important, as + transport will ask for those in specified order (default is "SHA-1,MD5"), and first received and matched causes + integrity validation algorithm to stop. + +Note: Since Maven 3.9.x you can use expression `${session.rootDirectory}/.mvn/checksums/` to store checksums along with +sources as `session.rootDirectory` will become an absolute path pointing to the root directory of your project (where +usually the `.mvn` directory is). + + +### Provided Checksums + +There is a Resolver SPI `ProvidedChecksumsSource` that makes possible to feed Provided Checksums to Resolver ahead +of actual transport. These checksums are used **during transport only** to verify transported payload (artifacts) +integrity. Hence, Provided checksums are NOT usable to verify already cached artifacts integrity (unless you build +with empty repository, of course, that forces all of your artifact go through transport). + +Resolver out of the box provides one SPI implementation: one that simply delegates to "trusted checksums". + +### Remote Included Checksums **Note: Remote Included checksums work only with transport-http, they do NOT work with transport-wagon!** @@ -87,6 +107,11 @@ count, since many repository services along Maven Central emits the reference ch the artifact response itself (as HTTP headers). Hence, we are able to get the artifact and reference "expected" checksum using only one HTTP round-trip. +Related configuration keys: +* `aether.connector.basic.smartChecksums` to enable or disable Remote Included checksums. + +The Remote Included checksums support several "strategies" to extract checksums from HTTP response header. + #### Sonatype Nexus 2 @@ -100,21 +125,30 @@ Emitted by: Sonatype Nexus2 only. Maven Central emits headers `x-checksum-sha1` and `x-checksum-md5` along with artifact response. Google GCS on the other hand uses `x-goog-meta-checksum-sha1` and `x-goog-meta-checksum-md5` -headers. Resolver will detect these and use their value. +headers. Resolver will detect all these and use their value. Emitted by: Maven Central, GCS, some CDNs and probably more. +### Remote External checksums + +These are the "classic" checksums existing since Maven 1. They are laid on layout in the remote repository, next +to the payload file (i.e. "lib.jar" and checksum "lib.jar.sha1"). While they are the oldest kind of Resolver checksums, +their shortcoming is that most often only SHA-1 and MD5 are produced. Basically, consumer is tied to those checksum +algorithms only, that are provided by remote repository. Similarly, given both, the payload and the checksum comes +from same origin, unless the origin is trusted (like Maven Central is), it may be seen as a risk. + + ## Trusted Checksums -All the "expected" checksums discussed above are transport bound, they are all +All the "expected" checksums discussed above are used in transport only, they are all about URLs, HTTP requests and responses, or require Transport related API elements. -Trusted checksums is a SPI component that is able to deliver "expected" checksums +`TrustedChecksumsSource` is a SPI component that is able to deliver "expected" checksums for given Artifact, without use of any transport API element. In other words, this -API is not bound to transport. +API is not bound to transport, but is generic. -Since they map almost on-to-one into transport "Provided Checksum" strategy, resolver provides +Since they map almost one-to-one into transport "Provided Checksum" strategy, resolver provides implementation that delegates Provided to Trusted checksums (makes Provided and Trusted checksums equivalent, transport-wise). @@ -124,11 +158,19 @@ Trusted Checksums is ArtifactResolver post-processing. This new functionality, at the cost of checksum calculation overhead, is able to validate all the resolved artifacts against Trusted Checksums, thus, making sure that all resolved artifacts are "validated" with some known (possibly even cryptographically strong) checksum -provided by user. This new feature may become handy in cases when user does not trust the local -repository, as it may be shared with some other unknown or even untrusted party. +provided by user. This new feature may become handy in cases when user cannot trust the local +repository, as it may be shared with some other unknown or even untrusted parties. + +Moreover, using Resolver Trusted Checksum post-processor, one can "record" the checksums, +for example when executed in a known "pristine" and safe environment, and reuse the produced +checksum to distribute within organization. The Trusted Checksums provide two source implementations out of the box. +Related configuration keys: +* `aether.trustedChecksumsSource.*` +* `aether.artifactResolver.postProcessor.trustedChecksums.*` + ### Summary File Trusted Checksums Source The summary file source uses single file that is in GNU coreutils compatible format: each @@ -144,5 +186,6 @@ Each summary file contains information for single checksum algorithm, represente This source mimics Maven local repository layout, and stores checksums in similar layout as Maven local repository stores checksums in local repository. -Hare, just like Maven local repository, the sparse directory can contain multiple algorithm checksums, +Here, just like Maven local repository, the sparse directory can contain multiple algorithm checksums, as they are coded in checksum file path (the extension). + diff --git a/src/site/markdown/remote-repository-filtering.md b/src/site/markdown/remote-repository-filtering.md index 4b62abf4a..f911cab2a 100644 --- a/src/site/markdown/remote-repository-filtering.md +++ b/src/site/markdown/remote-repository-filtering.md @@ -82,6 +82,10 @@ directory. It will be referred to in this document with `${filterBasedir}` place To explicitly set filter basedir, use following setting: `-Daether.remoteRepositoryFilter.${filterName}.basedir=somePath`, where "somePath" can be relative path, then is resolved from local repository root, or absolute path, then is used as is. +Since Maven 3.9.x you can use expression like `${session.rootDirectory}/.mvn/rrf/` to store filter data along with +sources as `session.rootDirectory` will become an absolute path pointing to the root directory of your project (where +usually the `.mvn` directory is). + ### The Prefixes Filter The "prefixes" named filter relies on a file containing a list of "repository prefixes" available from a given repository. diff --git a/src/site/site.xml b/src/site/site.xml index 9ff7b73c2..46df91749 100644 --- a/src/site/site.xml +++ b/src/site/site.xml @@ -33,6 +33,7 @@ under the License. +