Skip to content

Commit e46ca84

Browse files
committed
package: canonicalize tar headers for crate packages
Currently, when reading a file from disk, we include several pieces of data from the on-disk file, including the user and group names and IDs, the device major and minor, the mode, and the timestamp. This means that our archives differ between systems, sometimes in unhelpful ways. In addition, most users probably did not intend to share information about their user and group settings, operating system and disk type, and umask. While these aren't huge privacy leaks, cargo doesn't use them when extracting archives, so there's no value to including them. Since using consistent data means that our archives are reproducible and don't leak user data, both of which are desirable features, let's canonicalize the header to strip out identifying information. We set the user and group information to 0 and root, since that's the only user that's typically consistent among Unix systems. Setting these values doesn't create a security risk since tar can't change the ownership of files when it's running as a normal unprivileged user. Similarly, we set the device major and minor to 0. There is no useful value here that's portable across systems, and it does not affect extraction in any way. We also set the timestamp to the same one that we use for generated files. This is probably the biggest loss of relevant data, but considering that cargo doesn't otherwise use it and honoring it makes the archives unreproducible, we canonicalize it as well. Finally, we canonicalize the mode of an item we're storing by looking at the executable bit and using mode 755 if it's set and mode 644 if it's not. We already use 644 as the default for generated files, and this is the same algorithm that Git uses to determine whether a file should be considered executable. The tests don't test this case because there's no portable way to create executable files on Windows.
1 parent 436b9eb commit e46ca84

File tree

2 files changed

+63
-0
lines changed

2 files changed

+63
-0
lines changed

src/cargo/ops/cargo_package.rs

+20
Original file line numberDiff line numberDiff line change
@@ -484,6 +484,23 @@ fn timestamp() -> u64 {
484484
.as_secs()
485485
}
486486

487+
fn canonicalize_header(header: &mut Header) {
488+
// Let's not include information about the user or their system here.
489+
header.set_username("root").unwrap();
490+
header.set_groupname("root").unwrap();
491+
header.set_uid(0);
492+
header.set_gid(0);
493+
header.set_device_major(0).unwrap();
494+
header.set_device_minor(0).unwrap();
495+
496+
let mode = if header.mode().unwrap() & 0o100 != 0 {
497+
0o755
498+
} else {
499+
0o644
500+
};
501+
header.set_mode(mode);
502+
}
503+
487504
fn tar(
488505
ws: &Workspace<'_>,
489506
ar_files: Vec<ArchiveFile>,
@@ -524,6 +541,8 @@ fn tar(
524541
format!("could not learn metadata for: `{}`", disk_path.display())
525542
})?;
526543
header.set_metadata(&metadata);
544+
header.set_mtime(time);
545+
canonicalize_header(&mut header);
527546
header.set_cksum();
528547
ar.append_data(&mut header, &ar_path, &mut file)
529548
.chain_err(|| {
@@ -540,6 +559,7 @@ fn tar(
540559
header.set_mode(0o644);
541560
header.set_mtime(time);
542561
header.set_size(contents.len() as u64);
562+
canonicalize_header(&mut header);
543563
header.set_cksum();
544564
ar.append_data(&mut header, &ar_path, contents.as_bytes())
545565
.chain_err(|| format!("could not archive source file `{}`", rel_str))?;

tests/testsuite/package.rs

+43
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,10 @@ use cargo_test_support::registry::{self, Package};
66
use cargo_test_support::{
77
basic_manifest, cargo_process, git, path2url, paths, project, symlink_supported, t,
88
};
9+
use flate2::read::GzDecoder;
910
use std::fs::{self, read_to_string, File};
1011
use std::path::Path;
12+
use tar::Archive;
1113

1214
#[cargo_test]
1315
fn simple() {
@@ -1917,3 +1919,44 @@ src/main.rs
19171919
))
19181920
.run();
19191921
}
1922+
1923+
#[cargo_test]
1924+
fn reproducible_output() {
1925+
let p = project()
1926+
.file(
1927+
"Cargo.toml",
1928+
r#"
1929+
[project]
1930+
name = "foo"
1931+
version = "0.0.1"
1932+
authors = []
1933+
exclude = ["*.txt"]
1934+
license = "MIT"
1935+
description = "foo"
1936+
"#,
1937+
)
1938+
.file("src/main.rs", r#"fn main() { println!("hello"); }"#)
1939+
.build();
1940+
1941+
// Timestamp is arbitrary and is the same used by git format-patch.
1942+
p.cargo("package")
1943+
.env("SOURCE_DATE_EPOCH", "1000684800")
1944+
.run();
1945+
assert!(p.root().join("target/package/foo-0.0.1.crate").is_file());
1946+
1947+
let f = File::open(&p.root().join("target/package/foo-0.0.1.crate")).unwrap();
1948+
let decoder = GzDecoder::new(f);
1949+
let mut archive = Archive::new(decoder);
1950+
for ent in archive.entries().unwrap() {
1951+
let ent = ent.unwrap();
1952+
let header = ent.header();
1953+
assert_eq!(header.mode().unwrap(), 0o644);
1954+
assert_eq!(header.uid().unwrap(), 0);
1955+
assert_eq!(header.gid().unwrap(), 0);
1956+
assert_eq!(header.mtime().unwrap(), 1000684800);
1957+
assert_eq!(header.username().unwrap().unwrap(), "root");
1958+
assert_eq!(header.groupname().unwrap().unwrap(), "root");
1959+
assert_eq!(header.device_major().unwrap().unwrap(), 0);
1960+
assert_eq!(header.device_minor().unwrap().unwrap(), 0);
1961+
}
1962+
}

0 commit comments

Comments
 (0)