Skip to content

Commit 14e86cc

Browse files
committed
More fingerprint and metadata comments.
1 parent cd396f3 commit 14e86cc

File tree

2 files changed

+76
-31
lines changed

2 files changed

+76
-31
lines changed

src/cargo/core/compiler/context/compilation_files.rs

+16-1
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit};
1313
use crate::core::{Target, TargetKind, Workspace};
1414
use crate::util::{self, CargoResult};
1515

16-
/// The `Metadata` is a hash used to make unique file names for each unit in a build.
16+
/// The `Metadata` is a hash used to make unique file names for each unit in a
17+
/// build. It is also use for symbol mangling.
18+
///
1719
/// For example:
1820
/// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
1921
/// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
22+
///
2023
/// In general this must include all things that need to be distinguished in different parts of
2124
/// the same build. This is absolutely required or we override things before
2225
/// we get chance to use them.
2326
///
27+
/// It is also used for symbol mangling, because if you have two versions of
28+
/// the same crate linked together, their symbols need to be differentiated.
29+
///
2430
/// We use a hash because it is an easy way to guarantee
2531
/// that all the inputs can be converted to a valid path.
2632
///
@@ -39,6 +45,15 @@ use crate::util::{self, CargoResult};
3945
/// more space than needed. This makes not including something in `Metadata`
4046
/// a form of cache invalidation.
4147
///
48+
/// You should also avoid anything that would interfere with reproducible
49+
/// builds. For example, *any* absolute path should be avoided. This is one
50+
/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
51+
/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
52+
/// reproducible builds and has absolute paths in it). Also, in some cases the
53+
/// mangled symbols need to be stable between different builds with different
54+
/// settings. For example, profile-guided optimizations need to swap
55+
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
56+
///
4257
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
4358
/// rebuild is needed.
4459
#[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]

src/cargo/core/compiler/fingerprint.rs

+60-30
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,30 @@
55
//! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
66
//! There are several mechanisms that influence a Unit's freshness:
77
//!
8-
//! - The `Metadata` hash isolates each Unit on the filesystem by being
9-
//! embedded in the filename. If something in the hash changes, then the
10-
//! output files will be missing, and the Unit will be dirty (missing
11-
//! outputs are considered "dirty").
12-
//! - The `Fingerprint` is another hash, saved to the filesystem in the
13-
//! `.fingerprint` directory, that tracks information about the inputs to a
14-
//! Unit. If any of the inputs changes from the last compilation, then the
15-
//! Unit is considered dirty. A missing fingerprint (such as during the
16-
//! first build) is also considered dirty.
17-
//! - Whether or not input files are actually present. For example a build
18-
//! script which says it depends on a nonexistent file `foo` is always rerun.
19-
//! - Propagation throughout the dependency graph of file modification time
20-
//! information, used to detect changes on the filesystem. Each `Fingerprint`
21-
//! keeps track of what files it'll be processing, and when necessary it will
22-
//! check the `mtime` of each file (last modification time) and compare it to
23-
//! dependencies and output to see if files have been changed or if a change
24-
//! needs to force recompiles of downstream dependencies.
8+
//! - The `Fingerprint` is a hash, saved to the filesystem in the
9+
//! `.fingerprint` directory, that tracks information about the Unit. If the
10+
//! fingerprint is missing (such as the first time the unit is being
11+
//! compiled), then the unit is dirty. If any of the fingerprint fields
12+
//! change (like the name of the source file), then the Unit is considered
13+
//! dirty.
14+
//!
15+
//! The `Fingerprint` also tracks the fingerprints of all its dependencies,
16+
//! so a change in a dependency will propagate the "dirty" status up.
17+
//!
18+
//! - Filesystem mtime tracking is also used to check if a unit is dirty.
19+
//! See the section below on "Mtime comparison" for more details. There
20+
//! are essentially two parts to mtime tracking:
21+
//!
22+
//! 1. The mtime of a Unit's output files is compared to the mtime of all
23+
//! its dependencies' output file mtimes (see `check_filesystem`). If any
24+
//! output is missing, or is older than a dependency's output, then the
25+
//! unit is dirty.
26+
//! 2. The mtime of a Unit's source files is compared to the mtime of its
27+
//! dep-info file in the fingerprint directory (see `find_stale_file`).
28+
//! The dep-info file is used as an anchor to know when the last build of
29+
//! the unit was done. See the "dep-info files" section below for more
30+
//! details. If any input files are missing, or are newer than the
31+
//! dep-info, then the unit is dirty.
2532
//!
2633
//! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
2734
//! is notoriously imprecise and problematic. Only a small part of the
@@ -33,6 +40,12 @@
3340
//!
3441
//! ## Fingerprints and Metadata
3542
//!
43+
//! The `Metadata` hash is a hash added to the output filenames to isolate
44+
//! each unit. See the documentation in the `compilation_files` module for
45+
//! more details. NOTE: Not all output files are isolated via filename hashes
46+
//! (like dylibs), but the fingerprint directory always has the `Metadata`
47+
//! hash in its directory name.
48+
//!
3649
//! Fingerprints and Metadata are similar, and track some of the same things.
3750
//! The Metadata contains information that is required to keep Units separate.
3851
//! The Fingerprint includes additional information that should cause a
@@ -69,10 +82,11 @@
6982
//!
7083
//! When deciding what should go in the Metadata vs the Fingerprint, consider
7184
//! that some files (like dylibs) do not have a hash in their filename. Thus,
72-
//! if a value changes, only the fingerprint will detect the change. Fields
73-
//! that are only in Metadata generally aren't relevant to the fingerprint
74-
//! because they fundamentally change the output (like target vs host changes
75-
//! the directory where it is emitted).
85+
//! if a value changes, only the fingerprint will detect the change (consider,
86+
//! for example, swapping between different features). Fields that are only in
87+
//! Metadata generally aren't relevant to the fingerprint because they
88+
//! fundamentally change the output (like target vs host changes the directory
89+
//! where it is emitted).
7690
//!
7791
//! ## Fingerprint files
7892
//!
@@ -378,19 +392,35 @@ pub fn prepare_target<'a, 'cfg>(
378392

379393
// Clear out the old fingerprint file if it exists. This protects when
380394
// compilation is interrupted leaving a corrupt file. For example, a
381-
// project with a lib.rs and integration test:
395+
// project with a lib.rs and integration test (two units):
382396
//
383-
// 1. Build the integration test.
384-
// 2. Make a change to lib.rs.
385-
// 3. Build the integration test, hit Ctrl-C while linking (with gcc).
397+
// 1. Build the library and integration test.
398+
// 2. Make a change to lib.rs (NOT the integration test).
399+
// 3. Build the integration test, hit Ctrl-C while linking. With gcc, this
400+
// will leave behind an incomplete executable (zero size, or partially
401+
// written). NOTE: The library builds successfully, it is the linking
402+
// of the integration test that we are interrupting.
386403
// 4. Build the integration test again.
387404
//
388-
// Without this line, then step 4 will think the integration test is
389-
// "fresh" because the mtime of the output file is newer than all of its
390-
// dependencies. But the executable is corrupt and needs to be rebuilt.
391-
// Clearing the fingerprint ensures that Cargo never mistakes it as
392-
// up-to-date until after a successful build.
405+
// Without the following line, then step 3 will leave a valid fingerprint
406+
// on the disk. Then step 4 will think the integration test is "fresh"
407+
// because:
408+
//
409+
// - There is a valid fingerprint hash on disk (written in step 1).
410+
// - The mtime of the output file (the corrupt integration executable
411+
// written in step 3) is newer than all of its dependencies.
412+
// - The mtime of the integration test fingerprint dep-info file (written
413+
// in step 1) is newer than the integration test's source files, because
414+
// we haven't modified any of its source files.
415+
//
416+
// But the executable is corrupt and needs to be rebuilt. Clearing the
417+
// fingerprint at step 3 ensures that Cargo never mistakes a partially
418+
// written output as up-to-date.
393419
if loc.exists() {
420+
// Truncate instead of delete so that compare_old_fingerprint will
421+
// still log the reason for the fingerprint failure instead of just
422+
// reporting "failed to read fingerprint" during the next build if
423+
// this build fails.
394424
paths::write(&loc, b"")?;
395425
}
396426

0 commit comments

Comments
 (0)