Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gzip file created with async compression not decodable #135

Open
thmang82 opened this issue Jan 12, 2022 · 3 comments
Open

Gzip file created with async compression not decodable #135

thmang82 opened this issue Jan 12, 2022 · 3 comments

Comments

@thmang82
Copy link

Hello,

I try to create a stream compressed gzip file on the fly while receiving chunks of data.
My issue is that the file cannot be decoded from gzip/gunzip :-(

I stripped it down to a test encoding that just encodes "test" to see what the difference is compared to a Node.js based test encoding that works. But I do not get why there is a difference, and if this is really a bug inside the library, or an issue of how i use the async file io.

Here is my sample code:

use tokio::fs::File;
use async_compression::tokio_02::write::{ GzipEncoder };

let mut file = File::create(„test.txt.gz“).await?;
let writer = GzipEncoder::new(file);
writer.write("test".as_bytes()).await?;

Result:

xxd test.txt.gz
00000000: 1f8b 0800 0000 0000 00ff 2a49 2d2e 0100  ..........*I-...
00000010: 0000 ffff                                ....

This file is not decodable! Gunzip says it is corrupt!

When I create the file via this small node.js script

var gz = zlib.createGzip(); // createGzip
gz.pipe(fs.createWriteStream(„test_node.txt.gz“));
gz.write(„test“);
gz.end()

the resulting file has the following content:

00000000: 1f8b 0800 0000 0000 0013 2b49 2d2e 0100  ..........+I-...
00000010: 0c7e 7fd8 0400 0000                      .~......

The issue is not the missing checksum and file length:
I added checksum and length via gzip-header create. The file is still not decodable via gunzip

The interesting bit seems to be the encoded stream, they differ between async-compression and the working Node.js:
RUST: 2a49 2d2e 0100 0000 ffff
Node: 2b49 2d2e 0100

Why are there 4 tailing bytes 0000 ffff ?
And why is the first byte different?

@Nemo157
Copy link
Member

Nemo157 commented Jan 25, 2022

I see two issues with your code:

  1. Calling write without checking the returned amount, each call to write may not actually consume the entire input, so you should be using write_all. (There is a clippy lint that will warn about this for you).
  2. Not calling shutdown to finish the output stream, this is where the gzip trailer is written.

Fixing those issues I see that it gives correct output:

use tokio::io::AsyncWriteExt;
use async_compression::tokio::write::GzipEncoder;

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let mut writer = GzipEncoder::new(Vec::new());
    writer.write_all("test".as_bytes()).await?;
    writer.shutdown().await?;
    tokio::io::stdout().write_all(&writer.into_inner()).await?;
    Ok(())
}
> cargo run | gunzip
   Compiling foo v0.1.0 (/tmp/tmp.xrCjHsQa0S/foo)
    Finished dev [unoptimized + debuginfo] target(s) in 0.77s
     Running `/home/nemo157/.cargo/shared-target/debug/foo`
test

(the encoded stream is still different, but maybe gzip has multiple valid encodings of the same data 🤷)

@feikesteenbergen
Copy link

I ran into the same issue, the above piece of code would be great as an example on docs.rs!

@dmk978
Copy link

dmk978 commented Aug 5, 2022

writer.shutdown().await?;

But there is the same issue with the following code:

let mut file = tokio::fs::File::create("file.lzma").await.unwrap();
let mut compr = async_compression::tokio::write::LzmaEncoder::new(file);
let mut out = "test data to compress".as_bytes();
tokio::io::copy(&mut out, &mut compr).await.unwrap();

It gives no panics, but:

$ lzma -t file.lzma
lzma: file.lzma: Unexpected end of input

Without compressor tokio::io::copy works fine, file is complete. Of course, it fails also with any other compressor.

P.S.: And I just discovered that tokio::io::copy is not consuming compr, and compr.shutdown() after all of that activity really helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants