Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add character set on ZipInputStream class? #64

Closed
jacktang opened this issue Sep 16, 2019 · 4 comments
Closed

Add character set on ZipInputStream class? #64

jacktang opened this issue Sep 16, 2019 · 4 comments
Assignees
Labels
new-feature New feature or request

Comments

@jacktang
Copy link

Hello,

How about add character set on ZipInputStream like java.util.zip.ZipInputStream so that we can handle other character set rather than only UTF8?

@srikanth-lingala srikanth-lingala self-assigned this Sep 17, 2019
@LeeYoung624
Copy link
Contributor

I'm a little confused. Please feel free let me know if I'm wrong.
ZipInputStream can read zipped file by read method, and it's stored as raw byte array byte[]. It can transformed into other charset by new String(buffer, CHARSET). Sample code :

public static void main(String[] args) throws Exception {
    FileInputStream fis = new FileInputStream("/root/test.zip");
    ZipInputStream zis = new ZipInputStream(fis);
    byte[] readBuffer = new byte[1024];
    while(zis.getNextEntry() != null) {
      while(zis.read(readBuffer) != -1) {
        System.out.println(readBuffer);
        System.out.println(new String(readBuffer, StandardCharsets.UTF_8));
      }
    }
}

Why do we need a method to set the charset for ZipInputStream?

@srikanth-lingala
Copy link
Owner

@LeeYoung624 I think what Jack meant was the Charset to be used for filename and file comment for each LocalFileHeader.

@jacktang According to zip specification, only two charsets are supported in a zip format - cp437 and utf8. This makes it easy for zip files to be handled among different tools/libraries/applications without knowing the source charset explicitly (zip headers do not store the charset, but only a flag wether it is utf8 or not). Zip4j was designed to support the default zip format expectations.

However, there have been a lot of requests on this topic (most of them were for ZipFile api). I will analyse this and if feasible, will add this feature.

@srikanth-lingala srikanth-lingala added the new-feature New feature or request label Sep 18, 2019
@jacktang
Copy link
Author

@srikanth-lingala yes, the filename and comment depends on the correct character set. And appreciate your time! 🍺

srikanth-lingala pushed a commit that referenced this issue Sep 29, 2019
* enable charset selection for ZipFile and InputStream

* add test cases for charset specification in ZipFile and ZipInputStream

* charset selection for ZipFile and ZipOutputStream

enable charset selection for ZipFile(output APIs) and ZipOutputStream

* add testcase for charset in ZipOutputStream

* utf-8 bit flag should not be set when charset is specified

* change the type of charset: from String to Charset

* charset is not allowed to be null

* Merge with master and adjust code

* Minor cleanup
@srikanth-lingala
Copy link
Owner

Feature added in v2.2.1 released today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants