Jul 262011
 

This post is a sneak preview of a new feature in TrueZIP 7.3: The ability to add ZIP entries to a ZIP file fast by appending them to its end rather than performing a full update. This feature is the equivalent to a multi-session disc (CD, DVD etc.) for ZIP files and can significantly improve the overall performance of a TrueZIP application.

Motivation

By default, TrueZIP is configured to produce the smallest possible archive files. This is achieved by applying the following strategy:

  1. Select the maximum compression ratio in the archive drivers.
  2. Perform an archive update (see below) upon any of the following events:
    • An existing archive entry is going to get overwritten with new contents, or
    • an existing archive entry is going to get updated with new meta data.

This strategy applies an archive update whenever required to avoid the writing of redundant archive entry contents or meta data to the resulting archive file. An archive update is basically a copy operation where all archive entries which haven’t been written yet get copied from the input archive file to the output archive file.

However, while this strategy produces the smallest possible archive files, it may yield bad performance if the number and contents of the archive entries to create or update are pretty small compared to the total size of the resulting archive file: When an archive update is performed, the overall amount of I/O data is in the order of O(n * se = sa), where n is the total number of archive entries, se is the average size of these archive entries – including content and meta data – and sa is the total size of the resulting archive file.

How To Append Entries To ZIP Files

Therefore, as of TrueZIP 7.3, you can change this strategy by setting the FsOutputOption.GROW output option preference when writing archive entry contents or updating their meta data. When set, this output option preference allows archive files to grow by appending new or updated archive entries to their end and inhibiting archive update operations.

You can set this output option preference in the global configuration as follows:

class MyApplication extends TApplication {

    @Override
    protected void setup() {
        // This should obtain the global configuration.
        TConfig config = TConfig.get();
        // Set FsOutputOption.GROW for appending-to rather than reassembling an
        // archive file.
        config.setOutputPreferences(
                config.getOutputPreferences.set(FsOutputOption.GROW));
    }

    ...
}

Of course, you can also set this output option preference on a case-by-case basis as follows:

// We are going to append "entry" to "archive.zip".
TFile file = new TFile("archive.zip/entry");

// First, push a new current configuration on the inheritable thread local
// stack.
TConfig config = TConfig.push();
try {
    // Set FsOutputOption.GROW for appending-to rather than reassembling an
    // archive file.
    config.setOutputPreferences(
            config.getOutputPreferences.set(FsOutputOption.GROW));

    // Now use the current configuration and append the entry to the archive
    // file even if it's already present.
    TFileOutputStream out = new TFileOutputStream(file);
    try {
        // Do some output here.
        ...
    } finally {
        out.close();
    }
} finally {
    // Pop the current configuration off the inheritable thread local stack.
    config.close();
}

Archive Driver Support

Note that it’s specific to the archive file system driver if this output option preference is supported or not. If it’s not supported, then it gets silently ignored, thereby falling back to the default strategy of performing an archive update whenever required to avoid writing redundant archive entry data. Currently, the situation is like this:

  • The drivers of the module TrueZIP Driver ZIP fully support this output option preference, so it’s available for EAR, JAR, WAR etc.
  • The drivers of the module TrueZIP Driver ZIP.RAES only allow redundant archive entry contents and meta data. You cannot append to an existing ZIP.RAES file, however.
  • The drivers of the module TrueZIP Driver TAR only allow redundant archive entry contents. You cannot append to an existing TAR file, however.

Performance Considerations

Returning to the performance discussion from above, lets assume that an application updates u of the n existing archive entries with new contents. Again, the average entry size including contents and meta data is se. Now with FsOutputOption.GROW set, the overall amount of I/O data is just in the order of O(u * se + n) rather than O(n * se = sa). The additional O(… + n) results from reading the central directory at the end of the ZIP file and appending an updated version to its end again. As you can see, the total size of the archive file sa has been erased from the formula, which is a major performance increase if u is significantly smaller than n.

Following are some corner cases where it might not be very reasonable to use the GROW preference:

  1. If u = 0, then there is no update to the archive file at all, so using the GROW preference makes no difference.
  2. If u ~ n, e.g. by updating all archive entries, then the result is O(u * se + n) ~ O(n * se + n) = O(sa + n), which is a minor performance decrease because of writing the updated central directory. It also results in about double the size of the archive file because almost every archive entry is now duplicated. The latter may be irrelevant if n is small.
  3. If se is very small, e.g. by writing empty archive entries, then the result is O(u * se + n) ~ O(u + n). The resulting archive file might contain more archive entry meta data than content, especially because of the updated central directory.

  10 Responses to “Appending entries to ZIP files with TrueZIP 7.3”

  1. thank you very much for great library and great support.

    this is one and only library for easy reading,writing and modifying zip file system.

  2. i very much like your library 7.3 for appending zip but still there is some complexity…

    i found new zip lbrary called CHILKAT but this is not based on GNU Licenese so dont use it,but it is very simple in used,there is one method (QuickAppend) which directly append the content into exting zip see here http://www.example-code.com/java/zip_appendFilesToExistingZip.asp.

    can you make like this one…?

    • You just need to call TFile.cp_rp() to get the same effect.

    • ok thanks for replay…

      i tried your whole above code for appending zip file

      but i faced following exceptions

      Exception caught after invoking slot
      java.util.ServiceConfigurationError: de.schlichtherle.truezip.fs.spi.FsDriverService: Provider de.schlichtherle.truezip.fs.archive.zip.ZipDriverService could not be instantiated: java.util.ServiceConfigurationError: No provider available for class de.schlichtherle.truezip.socket.spi.IOPoolService.
      at java.util.ServiceLoader.fail(Unknown Source)
      at java.util.ServiceLoader.access$100(Unknown Source)
      at java.util.ServiceLoader$LazyIterator.next(Unknown Source)
      at java.util.ServiceLoader$1.next(Unknown Source)
      at de.schlichtherle.truezip.fs.sl.FsDriverLocator$Boot.(FsDriverLocator.java:70)
      at de.schlichtherle.truezip.fs.sl.FsDriverLocator.get(FsDriverLocator.java:52)
      at de.schlichtherle.truezip.file.TArchiveDetector.(TArchiveDetector.java:119)
      at de.schlichtherle.truezip.file.TArchiveDetector.(TArchiveDetector.java:99)
      at de.schlichtherle.truezip.file.TArchiveDetector.(TArchiveDetector.java:74)
      at de.schlichtherle.truezip.file.TConfig.(TConfig.java:316)
      at de.schlichtherle.truezip.file.TConfig.(TConfig.java:204)
      at de.schlichtherle.truezip.file.TConfig$Global.(TConfig.java:526)
      at de.schlichtherle.truezip.file.TConfig.get(TConfig.java:257)
      at de.schlichtherle.truezip.file.TFile.(TFile.java:470)
      at de.schlichtherle.truezip.file.TFile.(TFile.java:450)
      at com.java.MainWindow.importActionTriggered(MainWindow.java:190)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at com.java.Main.main(Main.java:14)
      Caused by: java.util.ServiceConfigurationError: No provider available for class de.schlichtherle.truezip.socket.spi.IOPoolService.
      at de.schlichtherle.truezip.socket.sl.IOPoolLocator$Boot.(IOPoolLocator.java:93)
      at de.schlichtherle.truezip.socket.sl.IOPoolLocator.get(IOPoolLocator.java:66)
      at de.schlichtherle.truezip.fs.archive.zip.ZipDriver.(ZipDriver.java:105)
      at de.schlichtherle.truezip.fs.archive.zip.ZipDriver.(ZipDriver.java:93)
      at de.schlichtherle.truezip.fs.archive.zip.ZipDriverService.(ZipDriverService.java:59)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Unknown Source)

      i could not understand what is the problem
      even i put all code properly in try catch block.

      do i have done mistake somewhere? sorry for my bad eglish !

  3. […] The TrueZIP Kernel has been improved a lot in order to improve its reliability, robustness and scalability. As a new feature, it now supports appending to archive files. This means that you can choose to append entries to existing ZIP files when using the TrueZIP Driver ZIP. This is particularly useful if a new ZIP entry is small compared to the total size of its ZIP file and improves the overall performance a lot. To use this feature, have a look at this blog post. […]

  4. […] bug fix, some improvements and a new major feature for updating really large archive files: The GROW Output Option Preference. This feature is otherwise also known as archive-appending mode and is […]

  5. […] Appending Entries To ZIP Files With TrueZIP 7.3 […]

Sorry, the comment form is closed at this time.