Skip to content

Commit 3b34b1c

Browse files
committed
Add blob split and splice API (#282)
Depending on the software project, possibly large binary artifacts need to be downloaded from or uploaded to the remote CAS. Examples are executables with debug information, comprehensive libraries, or even whole file system images. Such artifacts generate a lot of traffic when downloaded or uploaded. The blob split API allows to split such artifacts into chunks at the remote side, to fetch only those parts that are locally missing, and finally to locally assemble the requested blob from its chunks. The blob splice API allows to split such artifacts into chunks locally, to upload only those parts that are remotely missing, and finally to remotely splice the requested blob from its chunks. Since only the binary differences from the last download/upload are fetched/uploaded, the blob split and splice API can significantly save network traffic between server and client.
1 parent 9a0af1d commit 3b34b1c

File tree

1 file changed

+199
-0
lines changed

1 file changed

+199
-0
lines changed

build/bazel/remote/execution/v2/remote_execution.proto

+199
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,106 @@ service ContentAddressableStorage {
430430
rpc GetTree(GetTreeRequest) returns (stream GetTreeResponse) {
431431
option (google.api.http) = { get: "/v2/{instance_name=**}/blobs/{root_digest.hash}/{root_digest.size_bytes}:getTree" };
432432
}
433+
434+
// Split a blob into chunks.
435+
//
436+
// This call splits a blob into chunks, stores the chunks in the CAS, and
437+
// returns a list of the chunk digests. Using this list, a client can check
438+
// which chunks are locally available and just fetch the missing ones. The
439+
// desired blob can be assembled by concatenating the fetched chunks in the
440+
// order of the digests from the list.
441+
//
442+
// This rpc can be used to reduce the required data to download a large blob
443+
// from CAS if chunks from earlier downloads of a different version of this
444+
// blob are locally available. For this procedure to work properly, blobs need
445+
// to be split in a content-defined way, rather than with fixed-sized
446+
// chunking.
447+
//
448+
// If a split request is answered successfully, a client can expect the
449+
// following guarantees from the server:
450+
// 1. The blob chunks are stored in CAS.
451+
// 2. Concatenating the blob chunks in the order of the digest list returned
452+
// by the server results in the original blob.
453+
//
454+
// Servers are free to implement this functionality, but they need to declare
455+
// whether they support it or not by setting the
456+
// [CacheCapabilities.blob_split_support][build.bazel.remote.execution.v2.CacheCapabilities.blob_split_support]
457+
// field accordingly.
458+
//
459+
// Clients are free to use this functionality, it is just an optimization to
460+
// reduce download network traffic, when downloading large blobs from the CAS.
461+
// However, clients need to check first the server capabilities, whether blob
462+
// splitting is supported by the server.
463+
//
464+
// Hints:
465+
//
466+
// * For clients, it is recommended to verify whether the digest of the blob
467+
// assembled by the fetched chunks results in the requested blob digest.
468+
//
469+
// * Since the generated chunks are stored as blobs, they underlie the same
470+
// lifetimes as other blobs. However, their lifetimes are extended if they
471+
// are part of the result of a split blob request.
472+
//
473+
// * When blob splitting and splicing is used at the same time, the clients
474+
// and the server should out-of-band agree upon a chunking algorithm used by
475+
// all parties to benefit from each others chunk data and avoid unnecessary
476+
// data duplication.
477+
//
478+
// Errors:
479+
//
480+
// * `NOT_FOUND`: The requested blob is not present in the CAS.
481+
// * `RESOURCE_EXHAUSTED`: There is insufficient disk quota to store the blob
482+
// chunks.
483+
rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) {
484+
option (google.api.http) = { get: "/v2/{instance_name=**}/blobs/{blob_digest.hash}/{blob_digest.size_bytes}:splitBlob" };
485+
}
486+
487+
// Splice a blob from chunks.
488+
//
489+
// This is the complementary operation to the
490+
// [ContentAddressableStorage.SplitBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlob]
491+
// function to handle the chunked upload of large blobs to save upload
492+
// traffic.
493+
//
494+
// If a client needs to upload a large blob and is able to split a blob into
495+
// chunks in such a way that reusable chunks are obtained, e.g., by means of
496+
// content-defined chunking, it can first determine which parts of the blob
497+
// are already available in the remote CAS and upload the missing chunks, and
498+
// then use this API to instruct the server to splice the original blob from
499+
// the remotely available blob chunks.
500+
//
501+
// Servers are free to implement this functionality, but they need to declare
502+
// whether they support it or not by setting the
503+
// [CacheCapabilities.blob_splice_support][build.bazel.remote.execution.v2.CacheCapabilities.blob_splice_support]
504+
// field accordingly.
505+
//
506+
// Clients are free to use this functionality, it is just an optimization to
507+
// reduce upload traffic, when uploading large blobs to the CAS. However,
508+
// clients need to check first the server capabilities, whether blob splicing
509+
// is supported by the server.
510+
//
511+
// Hints:
512+
//
513+
// * In order to ensure data consistency of the CAS, the server will verify
514+
// the spliced result whether digest calculation results in the provided
515+
// digest from the request and will reject a splice request if this check
516+
// fails.
517+
//
518+
// * When blob splitting and splicing is used at the same time, the clients
519+
// and the server should out-of-band agree upon a chunking algorithm used by
520+
// all parties to benefit from each others chunk data and avoid unnecessary
521+
// data duplication.
522+
//
523+
// Errors:
524+
//
525+
// * `NOT_FOUND`: At least one of the blob chunks is not present in the CAS.
526+
// * `RESOURCE_EXHAUSTED`: There is insufficient disk quota to store the
527+
// spliced blob.
528+
// * `INVALID_ARGUMENT`: The digest of the spliced blob is different from the
529+
// provided expected digest.
530+
rpc SpliceBlob(SpliceBlobRequest) returns (SpliceBlobResponse) {
531+
option (google.api.http) = { post: "/v2/{instance_name=**}/blobs:spliceBlob" body: "*" };
532+
}
433533
}
434534

435535
// The Capabilities service may be used by remote execution clients to query
@@ -1837,6 +1937,91 @@ message GetTreeResponse {
18371937
string next_page_token = 2;
18381938
}
18391939

1940+
// A request message for
1941+
// [ContentAddressableStorage.SplitBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlob].
1942+
message SplitBlobRequest {
1943+
// The instance of the execution system to operate against. A server may
1944+
// support multiple instances of the execution system (with their own workers,
1945+
// storage, caches, etc.). The server MAY require use of this field to select
1946+
// between them in an implementation-defined fashion, otherwise it can be
1947+
// omitted.
1948+
string instance_name = 1;
1949+
1950+
// The digest of the blob to be split.
1951+
Digest blob_digest = 2;
1952+
1953+
// The digest function of the blob to be split.
1954+
//
1955+
// If the digest function used is one of MD5, MURMUR3, SHA1, SHA256,
1956+
// SHA384, SHA512, or VSO, the client MAY leave this field unset. In
1957+
// that case the server SHOULD infer the digest function using the
1958+
// length of the blob digest hashes and the digest functions announced
1959+
// in the server's capabilities.
1960+
DigestFunction.Value digest_function = 4;
1961+
}
1962+
1963+
// A response message for
1964+
// [ContentAddressableStorage.SplitBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlob].
1965+
message SplitBlobResponse {
1966+
// The ordered list of digests of the chunks into which the blob was split.
1967+
// The original blob is assembled by concatenating the chunk data according to
1968+
// the order of the digests given by this list.
1969+
repeated Digest chunk_digests = 1;
1970+
1971+
// The digest function of the chunks.
1972+
//
1973+
// If the digest function used is one of MD5, MURMUR3, SHA1, SHA256,
1974+
// SHA384, SHA512, or VSO, the client MAY leave this field unset. In
1975+
// that case the server SHOULD infer the digest function using the
1976+
// length of the blob digest hashes and the digest functions announced
1977+
// in the server's capabilities.
1978+
DigestFunction.Value digest_function = 2;
1979+
}
1980+
1981+
// A request message for
1982+
// [ContentAddressableStorage.SpliceBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SpliceBlob].
1983+
message SpliceBlobRequest {
1984+
// The instance of the execution system to operate against. A server may
1985+
// support multiple instances of the execution system (with their own workers,
1986+
// storage, caches, etc.). The server MAY require use of this field to select
1987+
// between them in an implementation-defined fashion, otherwise it can be
1988+
// omitted.
1989+
string instance_name = 1;
1990+
1991+
// Expected digest of the spliced blob.
1992+
Digest blob_digest = 2;
1993+
1994+
// The ordered list of digests of the chunks which need to be concatenated to
1995+
// assemble the original blob.
1996+
repeated Digest chunk_digests = 3;
1997+
1998+
// The digest function of the blob to be spliced as well as of the chunks to
1999+
// be concatenated.
2000+
//
2001+
// If the digest function used is one of MD5, MURMUR3, SHA1, SHA256,
2002+
// SHA384, SHA512, or VSO, the client MAY leave this field unset. In
2003+
// that case the server SHOULD infer the digest function using the
2004+
// length of the blob digest hashes and the digest functions announced
2005+
// in the server's capabilities.
2006+
DigestFunction.Value digest_function = 4;
2007+
}
2008+
2009+
// A response message for
2010+
// [ContentAddressableStorage.SpliceBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SpliceBlob].
2011+
message SpliceBlobResponse {
2012+
// Computed digest of the spliced blob.
2013+
Digest blob_digest = 1;
2014+
2015+
// The digest function of the spliced blob.
2016+
//
2017+
// If the digest function used is one of MD5, MURMUR3, SHA1, SHA256,
2018+
// SHA384, SHA512, or VSO, the client MAY leave this field unset. In
2019+
// that case the server SHOULD infer the digest function using the
2020+
// length of the blob digest hashes and the digest functions announced
2021+
// in the server's capabilities.
2022+
DigestFunction.Value digest_function = 2;
2023+
}
2024+
18402025
// A request message for
18412026
// [Capabilities.GetCapabilities][build.bazel.remote.execution.v2.Capabilities.GetCapabilities].
18422027
message GetCapabilitiesRequest {
@@ -2056,6 +2241,20 @@ message CacheCapabilities {
20562241
// [BatchUpdateBlobs][build.bazel.remote.execution.v2.ContentAddressableStorage.BatchUpdateBlobs]
20572242
// requests.
20582243
repeated Compressor.Value supported_batch_update_compressors = 7;
2244+
2245+
// Whether blob splitting is supported for the particular server/instance. If
2246+
// yes, the server/instance implements the specified behavior for blob
2247+
// splitting and a meaningful result can be expected from the
2248+
// [ContentAddressableStorage.SplitBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlob]
2249+
// operation.
2250+
bool blob_split_support = 9;
2251+
2252+
// Whether blob splicing is supported for the particular server/instance. If
2253+
// yes, the server/instance implements the specified behavior for blob
2254+
// splicing and a meaningful result can be expected from the
2255+
// [ContentAddressableStorage.SpliceBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SpliceBlob]
2256+
// operation.
2257+
bool blob_splice_support = 10;
20592258
}
20602259

20612260
// Capabilities of the remote execution system.

0 commit comments

Comments
 (0)