Gzip file created with async compression not decodable #135

thmang82 · 2022-01-12T16:06:50Z

Hello,

I try to create a stream compressed gzip file on the fly while receiving chunks of data.
My issue is that the file cannot be decoded from gzip/gunzip :-(

I stripped it down to a test encoding that just encodes "test" to see what the difference is compared to a Node.js based test encoding that works. But I do not get why there is a difference, and if this is really a bug inside the library, or an issue of how i use the async file io.

Here is my sample code:

use tokio::fs::File;
use async_compression::tokio_02::write::{ GzipEncoder };

let mut file = File::create(„test.txt.gz“).await?;
let writer = GzipEncoder::new(file);
writer.write("test".as_bytes()).await?;

Result:

xxd test.txt.gz
00000000: 1f8b 0800 0000 0000 00ff 2a49 2d2e 0100  ..........*I-...
00000010: 0000 ffff                                ....

This file is not decodable! Gunzip says it is corrupt!

When I create the file via this small node.js script

var gz = zlib.createGzip(); // createGzip
gz.pipe(fs.createWriteStream(„test_node.txt.gz“));
gz.write(„test“);
gz.end()

the resulting file has the following content:

00000000: 1f8b 0800 0000 0000 0013 2b49 2d2e 0100  ..........+I-...
00000010: 0c7e 7fd8 0400 0000                      .~......

The issue is not the missing checksum and file length:
I added checksum and length via gzip-header create. The file is still not decodable via gunzip

The interesting bit seems to be the encoded stream, they differ between async-compression and the working Node.js:
RUST: 2a49 2d2e 0100 0000 ffff
Node: 2b49 2d2e 0100

Why are there 4 tailing bytes 0000 ffff ?
And why is the first byte different?

The text was updated successfully, but these errors were encountered:

Nemo157 · 2022-01-25T17:51:03Z

I see two issues with your code:

Calling write without checking the returned amount, each call to write may not actually consume the entire input, so you should be using write_all. (There is a clippy lint that will warn about this for you).
Not calling shutdown to finish the output stream, this is where the gzip trailer is written.

Fixing those issues I see that it gives correct output:

use tokio::io::AsyncWriteExt;
use async_compression::tokio::write::GzipEncoder;

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let mut writer = GzipEncoder::new(Vec::new());
    writer.write_all("test".as_bytes()).await?;
    writer.shutdown().await?;
    tokio::io::stdout().write_all(&writer.into_inner()).await?;
    Ok(())
}

> cargo run | gunzip
   Compiling foo v0.1.0 (/tmp/tmp.xrCjHsQa0S/foo)
    Finished dev [unoptimized + debuginfo] target(s) in 0.77s
     Running `/home/nemo157/.cargo/shared-target/debug/foo`
test

(the encoded stream is still different, but maybe gzip has multiple valid encodings of the same data 🤷)

feikesteenbergen · 2022-02-01T15:11:04Z

I ran into the same issue, the above piece of code would be great as an example on docs.rs!

dmk978 · 2022-08-05T11:00:56Z

writer.shutdown().await?;

But there is the same issue with the following code:

let mut file = tokio::fs::File::create("file.lzma").await.unwrap();
let mut compr = async_compression::tokio::write::LzmaEncoder::new(file);
let mut out = "test data to compress".as_bytes();
tokio::io::copy(&mut out, &mut compr).await.unwrap();

It gives no panics, but:

$ lzma -t file.lzma
lzma: file.lzma: Unexpected end of input

Without compressor tokio::io::copy works fine, file is complete. Of course, it fails also with any other compressor.

P.S.: And I just discovered that tokio::io::copy is not consuming compr, and compr.shutdown() after all of that activity really helps.

tristan-morris mentioned this issue Mar 30, 2024

Writing async pmtile archives is corrupt arma-place/pmtiles-rs#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gzip file created with async compression not decodable #135

Gzip file created with async compression not decodable #135

thmang82 commented Jan 12, 2022

Nemo157 commented Jan 25, 2022

feikesteenbergen commented Feb 1, 2022

dmk978 commented Aug 5, 2022 •

edited

Loading

Gzip file created with async compression not decodable #135

Gzip file created with async compression not decodable #135

Comments

thmang82 commented Jan 12, 2022

Nemo157 commented Jan 25, 2022

feikesteenbergen commented Feb 1, 2022

dmk978 commented Aug 5, 2022 • edited Loading

dmk978 commented Aug 5, 2022 •

edited

Loading