2013-12-08

How to make an existing 7-Zip archive self-extracting

This blog post explains how to make an existing 7-Zip (.7z) archive self-extracting, i.e. how to convert it to an executable which can extract its own contents.

A 7-Zip self-extracting archive is just a concatenation of the extractor code and a regular .7z archive. So to make an archive self-extracting, one needs to get the extractor code file, append the regular .7z archive file, and (for Unix) make it executable.

Similarly, it's easy to remove the extractor code, by looking for the 6-byte 7-Zip header ("7z\274\257\047\034") in the first 1 megabyte of the self-extracting file, and removing everything before it. The Perl script replace_7z_sfx can be used to automate this. It can add, remove or replace the extractor code.

If you want to create the self-extracting archive from scratch, use 7z a -sfxEXTR FILE.7z ..., replacing EXTR with the name of the file containing the extractor code.

How to get the extractor code for Windows

Please note that the extractor code is specific for the operating system. That is, the self-extracting archive will be able to run on its target system only. It doesn't matter on which system you create the archive though.

Get precompiled extractor code from http://pts.50.hu/files/sfx_for_7zip9.20/:

As a comparison, the i386 Linux RAR 4.2.0 extractor code is dynamically liked (needs the libc), 137068 bytes uncompressed, and 61800 bytes when compressed with upx --ultra-brute. The rar executable is statically linked.

How were the extractor code files generated

The Windows extractors were taken from the Windows 32-bit .exe from the 7-Zip download page (direct link for version 9.20): files 7z.sfx (GUI) and 7zCon.sfx (console). The download is a self-extracting .7z archive, it could be extracted by the Linux 7z tool, in the p7zip-full Ubuntu package.

The Mac OS X console extractor was taken from the Rudix 7-Zip package for Mac OS X 10.6: file 7zCon.sfx. Extracting that file needed several steps:

$ wget -nv -O p7zip-9.20.1-1.pkg \
    http://rudix-snowleopard.googlecode.com/files/p7zip-9.20.1-1.pkg
$ file p7zip-9.20.1-1.pkg
p7zip-9.20.1-1.pkg: xar archive - version 1
$ 7z x p7zip-9.20.1-1.pkg p7zipinstall.pkg/Payload
$ file p7zipinstall.pkg/Payload
p7zipinstall.pkg/Payload: gzip compressed data, from Unix
$ 7z x -so p7zipinstall.pkg/Payload >payload.cpio
$ file payload.cpio 
payload.cpio: ASCII cpio archive (pre-SVR4 or odc)
$ 7z x payload.cpio
$ ls -l usr/local/lib/p7zip/7zCon.sfx
-rw-r--r-- 1 pts pts 521680 Aug  6 20:09 usr/local/lib/p7zip/7zCon.sfx
$ cp -a usr/local/lib/p7zip/7zCon.sfx 7zCon.sfx
$ upx --ultra-brute 7zCon.sfx
$ ls -l 7zCon.sfx
-rw-r--r-- 1 pts pts 143360 Aug  6 20:09 7zCon.sfx

It looks like 7-Zip is very useful: it can extract many different kinds of archives.

The Linux extractor was compiled from the p7zip sources. It was compiled with a gcc-4.1.2 cross-compiler targeting i386, generating statically linked Linux binary, using uClibc, and then compressed with upx --ultra-brute. It is interesting to enumerate which g++ compiler flags were used to further reduce the file size:

  • -Os: Optimize for generated code size.
  • -fno-rtti: Disable RTTI: run-time type identification.
  • (-fno-exceptions): Disable exception handling. We couldn't use it, because the p7zip code base uses exceptions.
  • -fno-stack-protector: Disable protection of the stack against buffer overflows. Saves a few bytes per function call location.
  • -ffunction-sections -fdata-sections: Create a separate section for each symbol. (By default a section is created per source file, and all symbols within a source file are put to the same section.)
  • -Wl,--gc-sections: Remove unused sections from the resulting binary. This is implemented by ld --gc-sections. This is a very important flag (in combination with the other section flags above), because it removes all unused functions and methods.

See also the shell script which does all the download, compilation and compression of the Linux 7zCon.sfx.

No comments: