2014-04-10

Announcing ssweb: single-shot webserver in Python

This blog post announces ssweb, a single-shot HTTP server library in Python 2.x. Single-shot means is that the webserver accepts only a single incoming HTTP connection, handles it, and then exits immediately.

Such a server can be used as a redirect target in the web-based OAuth authentication protocol, to receive the freshly generated OAuth token. In this case the program is a command-line tool running a HTTP server for a short amount of time, to make the passwordsless login more convenient for the user, without having to manually copy-paste token. If it doesn't sound useful, then it's most probably not useful for you right now.

Example usage:

$ wget -O ssweb.py http://raw.githubusercontent.com/pts/ssweb/master/ssweb.py
$ python ssweb.py
URL: http://127.0.0.1:40464/foo
{'res_body': 'Shot.', 'req_head': 'GET /foo HTTP/1.0\r\nHost: 127.0.0.1:40464\r\nUser-Agent: Python-urllib/1.17\r\n\r\n', 'client_addr': ('127.0.0.1', 40026), 'res_head': 'HTTP/1.0 200 OK\r\nContent-Type: text/plain\r\nContent-Length: 5\r\n\r\n', 'req_body': ''}
'Shot.'
$ python ssweb.py x
Please visit URL: http://127.0.0.1:59872/foo
... (after visiting the URL in Firefox)
{'res_body': 'Shot.', 'req_head': 'GET /foo HTTP/1.1\r\nHost: 127.0.0.1:59872\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n', 'client_addr': ('127.0.0.1', 50702), 'res_head': 'HTTP/1.0 200 OK\r\nContent-Type: text/plain\r\nContent-Length: 5\r\n\r\n', 'req_body': ''}

2014-04-04

How to convert all YouTube links from http:// to https:// in Firefox history

This blog post explains how to convert the protocol from http:// to https:// in YouTube URLs in the browsing history of a Mozilla Firefox profile. YouTube has recently started redirecting logged-in users from http:// to https:// . This conversion is useful in the following situation:

  • The user uses his Firefox browsing history to track which YouTube videos he has already seen.
  • The user disables web site colors (e.g. using the PrefBar Firefox extension) so by looking at the color of a YouTube video link he can immediately see if he has visited that video page or not.
  • The user uses some Greasemonkey scripts to normalize YouTube video links on web pages (e.g. to remove &feature=related, so the normalized URLs will be added to the Firefox browsing history.
  • Bonus: The user uses the LinkVisitor Firefox extension to mass-toggle the visited status of URLs without actually visiting them in Firefox.

To do the protocol conversion from http:// to https:// , close Firefox and run this on Linux (without the leading $ sign):

$ sqlite3 ~/.mozilla/firefox/*.default/places.sqlite '
    UPDATE OR IGNORE moz_places   SET url="https" || SUBSTR(url, 5) WHERE url LIKE "http://youtube.com/%" OR url LIKE "http://www.youtube.com/%";
    UPDATE OR IGNORE moz_favicons SET url="https" || SUBSTR(url, 5) WHERE url LIKE "http://youtube.com/%" OR url LIKE "http://www.youtube.com/%";
    ANALYZE;
    VACUUM;
'

The ANALYZE and VACUUM parts are optional, they just speed up Firefox accessing these tables in the future.

The SQLite UPDATE queries above can be run on the relevant places.sqlite file on Mac OS X or Windows as well. (This blog post doesn't provide explicit instructions how to run those queries. Use Google web search or ask on Stack Overflow to figure out how.)

2014-01-19

How to run custom code before and after main in GCC?

This blog post explains how to register C or C++ code to be run at process startup (i.e. just before *main*) and process exit (e.g. when *main* returns or when *exit*(...) is called). Code to be run at loading and unloading of shared libraries and Linux kernel modules are not covered in this post.

The behavior described in this post has been tested with gcc-4.1 ... gcc-4.8 and clang-3.0 ... clang-3.4. Older versions of GCC and Clang may behave differently.

The behavior described in this post has been tested with (e)glibc-2.7 ... (e)glibc-2.15 and uClibc-0.9.30.1 ... uClibc-0.9.33). Earlier versions and other libc implementations may behave differently. For example, dietlibc-0.33 doesn't execute any of the registered code (so the example below prints just MAIN; MYATEX2; MYATEX1).

The new way (available since gcc-2.7) is declaring a function with attribute((constructor)) (this will make it run at process startup) and declaring a function with attribute((destructor)) (this will make it run at process exit). The double underscores around __attribute__ are there to prevent GCC warnings when compiling standard C (or C++) code. Example:

#include <unistd.h>

__attribute__((constructor)) static void myctor(void) {
  (void)!write(2, "HI\n", 3);
}

__attribute__((destructor)) static void mydtor(void) {
  (void)!write(2, "BYE\n", 4);
}

Upon specifying one of these attributes, GCC (or Clang) appends the function pointer to the sections .ctors or .dtors, respectively. (You can take a look at objdump -x prog to see if these sections are present.) The libc initialization and exit code will run all functions in these sections. There is a well-defined order (see below) in which these registered functions get run, but the order is within the same translation unit (C or C++ source file) only. It's undefined in which order the translation units are processed.

Please note that the process exit functions are not always called: for example, if the process receives a signal which terminates it (e.g. either from another process or from itself, or from itself, by calling abort()), or if the process calls _exit(...) (with an underscore), then none of the process exit functions are called.

Please note that it's possible to register more process exit functions at runtime, by calling atexit(3) or on_exit(3).

C++ static initialization is equivalent to attribute((constructor)):

#include <unistd.h>
#include <string>

static int get_answer() {
  (void)!write(1, "GA\n", 3);
  return 42;
}
 
/* The call to get_answer works in C++, but it doesn't work in C, because
 * the value of myanswer1 is not a compile-time constant.
 */
int myanswer = get_answer();
std::string hello("World");  /* Registers both a constructor and destructor. */

There is an older alternative for registering process startup and exit functions: by adding code to the body of the _init function in the .init section and to the body of _fini function in the .fini section. The headers of these functions are defined in crti.o and the footers are defined in crtn.o (both of which are part of the libc, use e.g. objdump -d .../crti.o to disassemble them). GCC itself uses this registration mechanism in crtbegin.o to register __do_global_dtors_aux and in crtend.o to register __do_global_ctors_aux.

It is possible to use this older registration alternative in your C or C++ code, but it's a bit inconvenient. Here are some helper macros which make it easy:

/* Usage: DEFINE_INIT(my_init1) { ... }
 * Defines function my_init1 which will be called at startup, before main().
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_INIT(name) \
    static void name(void); \
    /* If we declared this static, it wouldn't get called. */ \
    __attribute__((section(".trash"))) void __INIT_HELPER__##name(void) { \
      static void (* volatile f)(void) = name; \
      __asm__ __volatile__ (".section .init"); \
      f(); \
      __asm__ __volatile__ (".section .trash"); \
    } \
    static void name(void)

/* Usage: DEFINE_FINI(my_fini1) { ... }
 * Defines function my_fini1 which will be called at process exit.
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_FINI(name) \
    static void name(void); \
    /* If we declared this static, it wouldn't get called. */ \
    __attribute__((section(".trash"))) void __FINI_HELPER__##name(void) { \
      static void (* volatile f)(void) = name; \
      __asm__ __volatile__ (".section .fini"); \
      f(); \
      __asm__ __volatile__ (".section .trash"); \
    } \
    static void name(void)

For your reference, here are the corresponding much simpler macros for attribute((constructor)) and attribute((destructor)):

/* Usage: DEFINE_CONSTRUCTOR(my_init1) { ... }
 * Defines function my_init1 which will be called at startup, before main().
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_CONSTRUCTOR(name) \
    __attribute__((constructor)) static void name(void)

/* Usage: DEFINE_DESTRUCTOR(my_init1) { ... }
 * Defines function my_fini1 which will be called at process exit.
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_DESTRUCTOR(name) \
    __attribute__((destructor)) static void name(void)

It is possible to use the old and the new registration mechanisms at the same time. Here is a sample code which uses both, and C++ static initialization and atexit and on_exit as well.

#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#ifdef __cplusplus
class C {
 public:
  C(const char *msg): msg_(msg) {
    (void)!write(1, "+", 1);  (void)!write(1, msg_, strlen(msg_));
  }
  ~C() {
    (void)!write(1, "-", 1);  (void)!write(1, msg_, strlen(msg_));
  }
 private:
  const char *msg_;
};
#endif

DEFINE_INIT(myinit1) { (void)!write(1, "MYINIT1\n", 8); }
DEFINE_CONSTRUCTOR(myctor1) { (void)!write(1, "MYCTOR1\n", 8); }

#ifdef __cplusplus
static int get_answer(const char *msg) {
  (void)!write(1, msg, strlen(msg));
  return 42;
}
C myobj1("MYOBJ1\n");
int myanswer1 = get_answer("ANSWER1\n");
C myobj2("MYOBJ2\n");
int myanswer2 = get_answer("ANSWER2\n");
#endif

DEFINE_INIT(myinit2) { (void)!write(1, "MYINIT2\n", 8); }
DEFINE_CONSTRUCTOR(myctor2) { (void)!write(1, "MYCTOR2\n", 8); }
DEFINE_FINI(myfini1) { (void)!write(1, "MYFINI1\n", 8); }
DEFINE_DESTRUCTOR(mydtor1) { (void)!write(1, "MYDTOR1\n", 8); }
DEFINE_FINI(myfini2) { (void)!write(1, "MYFINI2\n", 8); }
DEFINE_DESTRUCTOR(mydtor2) { (void)!write(1, "MYDTOR2\n", 8); }
static void myatex1() { (void)!write(1, "MYATEX1\n", 8); }
static void myatex2() { (void)!write(1, "MYATEX2\n", 8); }
static void myonexit(int exitcode, void *arg) {
  const char *msg = (const char*)arg;
  (void)exitcode;
  (void)!write(1, msg, strlen(msg));
}

int main(int argc, char **argv) {
  (void)argc; (void)argv;
  atexit(myatex1);
  on_exit(myonexit, (void*)"MYONEX1\n");
  (void)!write(1, "MAIN\n", 5);
  atexit(myatex2);
  on_exit(myonexit, (void*)"MYONEX2\n");
  return 0;
}

It is not intuitive in which order these are run. Here is the output:

MYINIT1
MYINIT2
+MYOBJ1
ANSWER1
+MYOBJ2
ANSWER2
MYCTOR2
MYCTOR1
MAIN
MYONEX2
MYATEX2
MYONEX1
MYATEX1
-MYOBJ2
-MYOBJ1
MYDTOR1
MYDTOR2
MYFINI1
MYFINI2

Please note that gcc-4.3 and below run MYDTOR1 and MYDTOR2 in the opposite order. All other compilers tested (see above which) use exactly this order. The order is libc-independent, because newer compiler versions with the same libc resulted in different order, while other libc versions with the same compiler version kept the order intact. Please note again that the order is undefined across translation units (C or C++ source files).

2014-01-15

Announcing mplaylist: Audio playlist player using mplayer, with checkpointing

This blog post is the formal announcement of mplaylist, and audio playlist player using mplayer, with checkpointing.

mplaylist is Python script which can play audio playlists (.m3u files), remembering the current playback position (file and time) even when killed, so it will resume playback at the proper position upon restart. The playback position is saved as an .m3u.pos file next to the .m3u file. mplaylist uses mplayer for playing the audio files.

mplayer needs Python and a Unix system with mplayer installed. (It may be easy to port to Windows, but it has not been tried.) Download the script directly from here. There is no GUI. You have to start mplayback from the command-line, in a terminal window.

The reason why I wrote mplaylist is that I needed the following features and I couldn't easily find an audio player for Ubuntu which had all of them:

  • It supports multiple playlists.
  • It remembers the playback position (file and time) for each playlist.
  • Preferably, it remembers playback position even when the process is killed.
  • Lets the user adjust playback speed, without changing the pitch.

mplaylist supports all these features. Checkpointing (i.e. remembering the playback position) works even if both the mplayer and mplaylist processes are killed with kill -9 (SIGKILL). If you have a journaling filesystem with block device barriers properly set up, checkpointing also works if you unplug the power cable.

Please note that mplaylist is not only for music files. It works excellently for playing back (series of) audio books and (series of) talks.

2014-01-11

How to prevent YouTube from using HTTPS

This blog post explains how to configure your web browser (Mozilla Firefox or Google Chrome) to prevent YouTube from redirecting from the http:// protocol to https://. The instructions below work no matter if you are logged in to YouTube.

YouTube has started doing this recently in the last couple of months, and also some browser extensions do it now. Please note that using HTTPS gives you more privacy (e.g. governments and internet service providers spying on you) than HTTP, so please think about it carefully if you want to revert to HTTP on YouTube or not.

Test the protocol: Type youtube.com to your address bar, make sure https:// doesn't show up why typing, and press Enter. Wait for the page to load. If you can't see https:// added to the beginning of the address, and you don't see a lock icon on the left side of the address, then we're done, stop.

If you have the Disconnect browser extension installed, disable it. (You may want to enable or reconfigure it later, after finishing these steps.) If Firefox asks for a browser restart, then restart it. Test the protocol.

If you have the YouTube Center browser extension or the corresponding Greasemonkey script installed, configure it by unticking the Use secure protocol checkbox. Test the protocol.

Remove (delete) all your YouTube cookies. In Chrome, copy-paste chrome://chrome/settings/content to the address bar, press Enter, click on the All cookies and site data... button, search for youtube, make sure that nothing unrelated shows up, and click on the Remove all button. In Firefox, open Edit / Preferences / Privacy / remove individual cookies, search for youtube.com, and click on the Remove all cookies button. Test the protocol.

If you're using Firefox on Linux, remove YouTube from the secure site table. To do it, exit from Firefox, and run the following command in a terminal window (without the leading $):

$ sqlite3.static ~/.mozilla/firefox/*.default/permissions.sqlite "DELETE FROM moz_hosts WHERE type LIKE 'sts%' AND host LIKE '%youtube.com'"

If you get an error message and you don't know how to fix it, or you are using Firefox on non-Linux, you can run the same DELETE FROM ... SQL query (between but without the double quotes above) using the SQLite Manager Firefox extension. Test the protocol.

Test the protocol. If it is still redirecting to https://, then take notes which of your browser extensions are enabled, disable all your browser extensions, and restart the browser. Test the protocol. If it's not redirecting anymore, then enable your browser extensions one-by-one, and figure out which one is the culprit. (There may be multiple ones.) Keep the culprit disabled or change its settings.

If it is still redirecting with all your extensions disabled, then this howto can't help you, try to find a solution on the web, and/or ask a question on webapps.stackexchange.com. Don't forget to reenable your browser extensions.

Some anecdotes: on Firefox, deleting the cookies solve the problem for me, and on Chrome disabling Disconnect solved the problem for me.

2014-01-10

How to remove almost all files from a Git repository

This blog post explains how to remove all files (including their history) from a Git repository, except for files in a whitelist. This can be useful to split a Git repository to two smaller repositories.

This can lead to a data loss, so make sure you have a backup of the repository. Also read the basics about rewriting history and git filter-branch first.

Here is the command which keeps only the files foo and bar/baz (type it without the leading $):

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL";
  set -- $(git ls-files | grep -vFx "$KEEP");
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

This needs a Bourne-compatible shell, so it won't work out-of-the-box in the Windows command-line, but it will work on most modern Unix systems.

This looks like unnecessarily complex, elaborate and bloated, but all the little tricks are necessary to make it work with files with funny characters in their name and with all modern Bourne-compatible shells. (Only newline and apostrophe (') won't work.)

To keep empty commits, omit the --ignore-unmatch flag.

Please note that if the files you are interested it were renamed, then this command doesn't recognize old names of the files: you have to enumerate the old pathnames explicitly to keep them.

To do the other way round, i.e. to keep all files except foo and bar/baz, do this:

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL"; set -- $KEEP;
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

2014-01-05

A short file size comparison of small libc implementations for Linux

This blog post gives a short executable file size comparison when the same statically linked, i386 ELF executable was compiled with various small (tiny) libc implementations for Linux.

TL;DR diet libc is producing the smallest executables.

Compiler used: GCC 4.6.3 in Ubuntu Precise.

libc implementations used:

All file sizes are the size of statically linked, Linux i386 ELF, stripped executable, except for source file (where it is the size of a .c source file) and dynamic (where it is the size of a dynamic executable of the same kind).

The source file size reducing compiler flags and tricks in this blog post were used. The programs used dynamic memory allocation (malloc(3), free(3), realloc(3)), system call I/O (e.g. read(2) and write(2)), but none of the printf*(3) functions or stdio.

Compilation results for clang_trampoline.c:

  • source file: 37889 bytes
  • diet libc: 15176 bytes
  • dynamic: 17644 bytes
  • musl: 22420 bytes
  • uClibc: 22580 bytes
  • static: 709120 bytes

Compliation results for xstatic.c:

  • source file: 30410 bytes
  • diet libc: 12316 bytes
  • dynamic: 13516 bytes
  • musl: 18992 bytes
  • uClibc: 19412 bytes
  • static: 705024 bytes

Interesting observation: the diet libc version is smaller than the dynamic version. That's because linking against dynamic shared libraries has its own overhead (e.g. symbol table, PLT) in the executable.

Announcing pts-xstatic: A tool for creating small, statically linked Linux i386 executables with any compiler

This blog post announces pts-xstatic, a convenient wrapper tool for compiling and creating portable, statically linked Linux i386 executables. It works on Linux i386 and Linux x86_64 host systems. It wraps an existing compiler (GCC or Clang) of your choice, and it links against uClibc and the other base libraries included in the pts-xstatic binary release.

See the most recent README for all details.

C compilers supported: gcc-4.1 ... gcc-4.8, clang-3.0 ... clang-3.3. C++ compilers supported: g++ and clang++ corresponding to the supported C compilers. Compatible uClibc C and C++ headers (.h) and precompiled static libraries (e.g. libc.a, libz.a, libstdc++.a) are also provided by pts-xstatic. To minimize system dependencies, pts-xstatic can compile with pts-clang (for both C and C++), which is portable, and you can install it as non-root.

As an alternative of pts-xstatic, if you want a tiny, self-contained (single-file) for Linux i386, please take a look at pts-tcc. With pts-xstatic, you can create faster and smaller statically linked executables, with the compiler of your choice.

As an alternative for pts-xstatic and uClibc, see diet libc and its diet tool (which is an alternative of the xstatic tool), with which you can create even smaller binaries.

Motivation

  1. Available uClibc GCC toolchain binary releases are very old, e.g. the i686 release contains gcc-4.1.2 compiled on 2009-04-11.
  2. With uClibc Buildroot, the uClibc version is tied to a specific GCC version. It's not possible to compile with your favorite preinstalled C or C++ compiler version, and link against your favorite uClibc version. pts-xstatic makes this possible.
  3. libstdc++ is not easily available for uClibc, and it's a bit cumbersome to compile. pts-xstatic contains a precompiled version.

Minimum installation

If you want to install try pts-xstatic quickly, without root access, without installing any dependencies, and without changing any settings, this is the easiest way:

$ cd /tmp
$ rm -f pts-xstatic-latest.sfx.7z
$ wget http://pts.50.hu/files/pts-xstatic/pts-xstatic-latest.sfx.7z
$ chmod +x pts-xstatic-latest.sfx.7z
$ ./pts-xstatic-latest.sfx.7z -y  # Creates the pts-xstatic directory.
$ rm -f pts-clang-latest.sfx.7z
$ wget http://pts.50.hu/files/pts-clang/pts-clang-latest.sfx.7z
$ chmod +x pts-clang-latest.sfx.7z
$ ./pts-clang-latest.sfx.7z -y  # Creates the pts-clang directory.
$ cat >>hw.c <<'END'
#include <stdio.h>
int main(void) {
  return !printf("Hello, %s!\n", "World");
}
END
$ pts-xstatic/bin/xstatic pts-clang/bin/clang -s -O2 -W -Wall hw.c && ./a.out
Hello, World!
$ strace -e open ./a.out
Hello, World!
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ ls -l a.out
-rwxr-xr-x 1 pts pts 16888 Jan  2 23:17 a.out
Compare the file size with statically linking against regular (e)glibc:
$ gcc -static -m32 -o a.big -s -O2 -W -Wall hw.c && ./a.big
Hello, World!
$ strace -e open ./a.big
Hello, World!
$ file a.big
a.big: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.24, BuildID[sha1]=0x37284f286ffeecdb7ac5d77bfa83ade4310df098, stripped
$ ls -l a.big
-rwxr-xr-x 1 pts eng 684748 Jan  2 23:20 a.big

FYI with diet libc, the generated a.out file is only 8668 bytes long.

See full installation instructions in the most recent README.

Does pts-xstatic create portable executables?

pts-xstatic creates portable, statically linked, Linux ELF i386 executables, linked against uClibc. By default, these executables don't need any external file (not even the file specified by argv[0], not even the /proc filesystem) to run. NSS libraries (the code needed for e.g. getpwent(3) (getting info of Unix system users) and gethostbyname(3) (DNS resolution)) are also included. The executables also work on FreeBSD in Linux mode if the operating system field in the ELF header frm SYSV to Linux.

As an alternative to pts-xstatic: gcc -static (or clang -static) doesn't provide real portability, because for calls such as getpwent(3) (getting info of Unix system users) and gethostbyname(3) (DNS resolution), glibc loads files such as libnss_compat.so, libnss_dns.so. On the target system those libraries may be incompatible with your binary, so you may get a segfault or unintended behavior. pts-xstatic solves this, because it uses uClibc.

It can be useful to embed locale files, gconv libraries, arbitrary data and configuration files needed by the program, Neither `gcc -static', pts-xstatic or statifier can do it, but Ermine can. Ermine is not free software, but you can get a free-of-charge time-limited trial, and you can ask for a discount for noncommercial use. See all details here, and give it a try!

More info

See the most recent README for full installation instructions, usage details, full feature list etc.

2014-01-02

How to detect integer overflow in C and C++ addition and subtraction

This blog post explains how to detect integer overflow (and underflow) in C and C++ addition and subtraction, and it also gives example code.

Overflow (or underflow, we use these terms interchangeably) occurs when the result of an arithmetic operation cannot be represented as an integer of the same type (and size) as the operands. For unsigned addition, overflow indicates that the result is too large. For unsigned subtraction, overflow indicates that the result is negative. For signed addition and subtraction, overflow indicates that the result is either too small or too large.

When chaining additions, it's useful to compute the sum x + y + c, where c is the carry bit (either 0 or 1) resulting from the previous, less significant addition. Similarly, when chaining subtractions, it's useful to compute the difference x - y - c, where c is the borrow bit (either 0 or 1) resulting from the previous, less significant subtraction.

The freely available Chapter 2 (Basics) of the book Hacker's Delight has a detailed and informative subsection about overflow processing. The formulas presented below are based on formulas in that section. Please read the entire section of the book for a detailed explanation and more formulas (which are useful in other environments).

One simple observation is that signed addition overflows iff the sign of the two operands (x and y) are the same, but it's different from the sign of the sum. Based on similar observations we can devise the following formulas:

  • signed x + y + c overflows iff this is negative: ((x+y+c)^x)&((x+y+c)^y)
  • signed x + y + c overflows iff this is negative: z&(((x^z)+y+c)^~y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • signed x - y - c overflows iff this is negative: ((x-y-c)^x)&((x-y-c)^~y)
  • signed x - y - c overflows iff this is negative: z&(((x^z)-y-c)^y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • unsigned x + y + c overflows iff this is negative: (x&y)|((x|y)&~(x+y+c))
  • unsigned x - y - c overflows iff this is negative: (~x&y)|((~x|y)&(x-y-c))

Please note that none of the formulas above contain branches, so the CPU pipeline doesn't have to flushed in order to compute them. To convert the sign bit (i.e. negativity) to a bool (0 or 1), shift it down like this: (int)(((((x+y+c)^x)&((x+y+c)^y))>>(sizeof(x)*8-1))&1).

Please note that in standard C and C++ the result of addition and subtraction is undefined (!) if an overflow occurs. The GCC flags -fwrapv and -fno-strict-overflow disable this undefined behavior. But since our code can't be sure if it's compiled with these flags enabled, we must use an overflow-detection formula in which no temporary overflow occurs. Such formulas are also given above. Another option is casting the operands to the corresponding unsigned type, adding them as unsigned (which happens normally, only the least significant bits are kept, as many as possible), and then casting the result back to signed. To do so, we must add these explicit casts in x+y+c and x-y-c in the signed formulas above. These casts can get tricky if we don't know the type of the operands, because there is no overloaded generic cast in C (which e.g. casts int to unsigned and long long to unsigned long long).

See the final code on Github. It can be included as a .h file in C and C++ code. It works with GCC 4.1 and above and Clang 3.0 and above. It uses the GCC extension typedef (also works in Clang) and it uses function overloading in C++ for the generic unsigned cast. In C, it uses the GCC extension __builtin_choose_expr for this cast. It also uses href="http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html">statement expressions in macro bodies to declare temporary variables to avoid useing the arguments more than once.

Further reading:

  • About the C11 _Generic selections (for implementing overridden functions and macros) in this blog post.
  • P99, a huge macro library for C99 (C dialects earlier than C11).
  • An article about proper overflow detection in all C and C++ arithmetic operations. Overflow detection is much harder to do correctly than what you think. The article contains many incorrect naïve implementations, and also the correct (complicated) implementations. Read it, it's worth it!

2013-12-31

How to implement in-place set intersection in C++

This blog post shows and explains C++ source code implementing in-place set intersection, i.e. removing each element from a set (or another sorted container) *bc which is not also a member of container bc.

The std::intersection function template in the <algorithm> header in C++ standard template library populates a new output set, adding all elements in the intersection into it. This can be too slow and a waste of memory if one of the inputs is not needed afterwards. In this case an in-place intersection is desired instead, but unfortunately such a function template is not part of the C++ standard template library.

Here is a simple in-place implementation which looks up each element of *bc in ac, and removes (erases) it from *bc if not found:

#include <set>

// Remove elements from bc which are missing from ac.
//
// The time required is proportional to log(ac.size()) * bc->size(), so it's
// faster than IntersectionUpdate if ac is large compared to bc.
template<typename Input, typename Output>
static void IntersectionUpdateLargeAc(
    const std::set<Input> &ac, std::set<Output> *bc) {
  const typename std::set<Input >::const_iterator a_end = ac.end();
  const typename std::set<Output>::const_iterator b_end = bc->end();
  for (typename std::set<Output>::iterator b = bc->begin(); b != b_end; ) {
    if (ac.find(*b) == a_end) {  // Not found.
      // Not a const_iterator, erase wouldn't accept it until C++11.
      const typename std::set<Output>::iterator b_old = b++;
      bc->erase(b_old);  // erase doesn't invalidate b.
    } else {
      ++b;
    }
  }
}

Removing from *bc above is a bit tricky, because we don't want to invalidate the iterator b. In C++11 erase returns a new iterator, which is just after the removed elements, but we don't use that just to be backwards-compatible. Instead of that we make use of the fact that iterators to the non-removed elements are kept intact for set, multiset and list, so we create the temporary iterator b_old, which will be invalidated, but b remains valid.

We need the typename keyword in local variable declarations, because they have a dependent type (i.e. a type whose identifier is within another type specified by a template parameter.)

The time complexity is O(log(as) · bs), so it is fast if ac is large when compared to *bc. For example, when as = 3k and bs = k, then it's O(k2).

As an alternative, we could iterate over the two sets in increasing (ascending) order at the same time, similarly to the merge operation (as implemented by std::merge in mergesort, but dropping elements from *bc if there is no corresponding element in ac. One possible implementation:

#include <set>

// Remove elements from bc which are missing from ac.
//
// The time required is proportional to ac.size() + bc->size().
template<typename Input, typename Output>
static void IntersectionUpdate(
    const std::set<Input> &ac, std::set<Output> *bc) {
  typename std::set<Input>::const_iterator a = ac.begin();
  const typename std::set<Input>::const_iterator a_end = ac.begin();
  typename std::set<Output>::iterator b = bc->begin();
  const typename std::set<Output>::iterator b_end = bc->end();
  while (a != a_end && b != b_end) {
    if (*a < *b) {
      ++a;
    } else if (*a > *b) {
      const typename std::set<Output>::iterator b_old = b++;
      bc->erase(b_old);  // erase doesn't invalidate b.
    } else {  // Elements are equal, keep them in the intersection.
      ++a;
      ++b;
    }
  }
  bc->erase(b, b_end);  // Remove remaining elements in bc but not in ac.
}

The time complexity of this above (IntersectionUpdate) is O(as + bs), which is faster than IntersectionUpdateLargeAc if ac is not much smaller than *bc. For example, when as = 3k and bs = k, then it's O(3k + k), so IntersectionUpdateLargeAc is faster.

Example usage of both (just to see if they compile):

int main(int, char**) {
  std::set<int> a, b;
  IntersectionUpdateLargeAc(a, &b);
  IntersectionUpdate(a, &b);
  return 0;
}

It's natural to ask if these function templates can be generalized to C++ containers other than set. They take advantage of the input being sorted, so let's consider sorted std::vector, sorted std::list and std::multiset in addition to std::set. To avoid the complexity of having to distinguish keys from values, let's ignore std::map and std::multimap.

The generalization of IntersectionUpdateLargeAc from set to multiset is trivial: no code change is necessary. The std::multiset::find operation returns any matching element, which is good for us. However, with IntersectionUpdate, the last ++a; must be removed: without the removal subsequent occurrences of the same value in *bc would be removed if ac contains this value only once. No other code change is needed. It is tempting to introduce a loop in the previous (*a > *b) if branch:

for (;;) {
  const typename Output::iterator b_old = b++;
  const bool do_break = b == b_end || *b_old != *b;
  bc->erase(b_old);  // erase doesn't invalidate b.
  if (do_break) break;
}

However, this change is not necessary, because subsequent equal values in *bc would be removed in subsequent iterations of the outer loop.

Here are the full generalized implementations:

#if __cplusplus >= 201103 || __GXX_EXPERIMENTAL_CXX0X__
#include <type_traits>
#endif

// Remove elements from bc which are missing from ac. Supported containers for 
// bc: list (only if sorted), vector (only if sorted), set, multiset. Supported
// containers for ac: set, multiset.
//
// The time required is proportional to log(ac.size()) * bc->size(), so it's
// faster than IntersectionUpdate if ac is large compared to bc.
template<typename Input, typename Output>
static void IntersectionUpdateLargeAc(const Input &ac, Output *bc) {
#if __cplusplus >= 201103 || __GXX_EXPERIMENTAL_CXX0X__
  // We could use std::is_convertible (both ways) instead of std::is_same.
  static_assert(std::is_same<typename Input::value_type,
                             typename Output::value_type>::value,
                "the containers passed to IntersectionUpdateLargeAc() need to "
                "have the same value_type");
#endif
  const typename Input::const_iterator a_end = ac.end();
  const typename Output::const_iterator b_end = bc->end();
  for (typename Output::iterator b = bc->begin(); b != b_end; ) {
    if (ac.find(*b) == a_end) {  // Not found.
      // Not a const_iterator, erase wouldn't accept it until C++11.
      const typename Output::iterator b_old = b++;
      bc->erase(b_old);  // erase doesn't invalidate b.
    } else {
      ++b;
    }
  }
}

// Remove elements from bc which are missing from ac. Supported containers for 
// ac and bc: list (only if sorted), vector (only if sorted), set, multiset.
template<typename Input, typename Output>
static void IntersectionUpdate(const Input &ac, Output *bc) {
#if __cplusplus >= 201103 || __GXX_EXPERIMENTAL_CXX0X__
  static_assert(std::is_same<typename Input::value_type,
                             typename Output::value_type>::value,
                "the containers passed to IntersectionUpdate() need to "
                "have the same value_type");
#endif
  typename Input::const_iterator a = ac.begin();
  const typename Input::const_iterator a_end = ac.end();
  typename Output::iterator b = bc->begin();
  // Can't be a const interator, similarly to b_old.
  const typename Output::iterator b_end = bc->end();
  while (a != a_end && b != b_end) {
    if (*a < *b) {
      ++a;
    } else if (*a > *b) {
      const typename Output::iterator b_old = b++;
      bc->erase(b_old);  // erase doesn't invalidate b.
    } else {  // Elements are equal, keep it in the intersection.
      // Don't do ++a, in case ac is a multiset.
      ++b;
    }
  }
  bc->erase(b, b_end);  // Remove remaining elements in bc but not in ac.
}

These work as expected for set, multiset and sorted list. It also doesn't require that the two containers are of the same kind. For C++0x and C++11, an extra static_assert is present in the code to print a helpful compact error message if the base types are different.

However, when *bc is a vector, we get a compile error, because in C++ older than C++11, std::vector::erase doesn't return an iterator (but it returns void). Even if we could get an iterator, b_end would be invalidated by erase, because it's behind it. This is easy to fix, we should use bc->end() instead of b_end everywhere. However, if we didn't make any other changes, the algorithm would be slower than necessary, because std::vector::erase moves each element behind the erased one. So the time complexity would be O(as + bs2). To speed it up, let's swap the to-be-removed elements with the element with the last element of the vector, and to the actual removal at the end of the function:

#if __cplusplus >= 201103 || __GXX_EXPERIMENTAL_CXX0X__
#include <type_traits>
#include <utility>  // std::swap.
#else
#include <algorithm>  // std::swap.
#endif

// Template specialization for vector output.
template<typename Input, typename T>
static void IntersectionUpdate(const Input &ac, std::vector<T> *bc) {
#if __cplusplus >= 201103 || __GXX_EXPERIMENTAL_CXX0X__
  static_assert(std::is_same<typename Input::value_type, T>::value,
                "the containers passed to IntersectionUpdate() need to "
                "have the same value_type");
#endif
  typename Input::const_iterator a = ac.begin();
  const typename Input::const_iterator a_end = ac.end();
  typename std::vector<T>::iterator b = bc->begin();
  // Elements between b_high an bc->end() will be removed (erased) right before
  // the function returns. We defer their removal to save time.
  typename std::vector<T>::iterator b_high = bc->end();
  while (a != a_end && b != b_high) {
    if (*a < *b) {
      ++a;
    } else if (*a > *b) {
      std::iter_swap(b, --b_high);  // Works even if swapping with itself.
    } else {  // Elements are equal, keep them in the intersection.
      ++a;
      ++b;
    }
  }
  bc->erase(b, bc->end());  // Remove remaining elements in bc but not in ac.
}

Once we have the generic implementation and the special implementation for vector in the same file, the C++ compiler would take care of choosing the right (most specific) one depending on whether *bc is a vector or not. So all these work now:

#include <list>
#include <set>
#include <vector>

int main(int, char**) {
  std::set<int> s;
  std::multiset<int> ms;
  std::vector<int> v;
  // std::list<unsigned> l;  // Won't work in C++0x and C++11.
  std::list<int> l;
  IntersectionUpdate(s, &ms);
  IntersectionUpdate(ms, &v);
  IntersectionUpdate(v, &l);
  IntersectionUpdate(l, &s);
  IntersectionUpdateLargeAc(s, &ms);
  IntersectionUpdateLargeAc(ms, &v);
  // IntersectionUpdateLargeAc(v, &l);  // v is not good as ac.
  // IntersectionUpdateLargeAc(l, &s);  // l is not good as ac.
  IntersectionUpdateLargeAc(s, &l);
  IntersectionUpdateLargeAc(ms, &s);
  return 0;
}

The full source code is available on GitHub.

2013-12-26

Tiny 7z archive extractors and SFX for Linux and Windows

This blog post contains links to the tiniest programs that can extract 7z (7-Zip) archives on Linux and Windows i386. These programs can also be used as SFX (self-extract archives) by appending a .7z file to them.

Please see my previous post about self-extracting 7z archives. In that post the smallest Linux extractor is of 118 kB, and the smallest Windows console extractor (the official one) is of 149 kB. The Linux extractor is based on the p7zip C++ sources, statically linked against uClibc and compiled with some GCC flags optimized for small binary size.

The smallest known Windows SFX extractor, 7zS2Con.sfx can be found in 7z922_extra.7z. It's of 27648 bytes (27 kB). It's based on the official ANSI C 7-Zip extractor sources, i.e. the C subdirectory in 7z922.tar.bz2. The fundamental difference between this smallest SFX and the normal Windows SFX is the implementation language (C vs C++) and the feature set (e.g. larger memory requirements, not all algorithms supported, encryption not supported, see the Limitations section below). See also this original forum topic for a discussion about features and limitations.

I've compiled the ANSI C 7-Zip extractor using uClibc and the size-optimized GCC flags, fixed some bugs, added some features (e.g. mtime, permissions, symlink, SFX mode), and compressed the binary using UPX. The result is the smallest known extractor (and SFX extractor) for Linux: 7z9.22LinuxI386ConTiny.sfx, 24116 bytes (24 kB, smaller than the Windows extractor, linked using diet libc). The sources are available as a GIT repostitory. The sources are optimized for Linux (they work on both i386 and amd64), but they should compile and work other Unix systems as well.

Features of the tiny Linux extractor

  • Small (the Linux statically linked binary is less than 40 kB).
  • Can be used to create a SFX (self-extract) binary by prepending to a 7z archive. (Same as the `7z -sfx' flag.)
  • It supports file and directory attributes (i.e. it calls chmod(2)).
  • It sets the mtime (i.e. it calls utimes(2)).
  • It can extract symlinks.
  • Has a command-line syntax compatible with the regular console SFX binaries.

Limitations of the tiny extractors (Windows and Linux)

  • It supports only: LZMA, LZMA2, BCJ, BCJ2, COPY.
  • It keeps an uncompressed version of each file in RAM.
  • It decompresses solid 7z blocks (it can be whole 7z archive) to RAM. So user that calls SFX installer must have free RAM of size of largest solid 7z block (size of 7z archive at simplest case).
  • The Windows extractor overwrites files without asking.
  • It always extracts to the current directory.
  • It does not support (and may misbehave for) encryption in archives.

2013-12-25

How to make smaller C and C++ binaries

This blog post presents several techniques to make the binaries resulting from C or C++ compilation smaller with GCC (or Clang). Please note that almost all techniques are tradeoffs, i.e. a smaller binary can be slower and harder to debug. So don't use the techniques blindly before understanding the tradeoffs.

The recommended GCC (and Clang) flags:

  • Use -s to strip debug info from the binary (and don't use -g).
  • Use -Os to optimize for output file size. (This will make the code run slower than with -O2 or -O3.
  • Use -m32 to compile a 32-bit binary. 32-bit binaries are smaller than 64-bit binaries because pointers are shorter.
  • In C++, use -fno-exceptions if your code doesn't use exceptions.
  • In C++, use -fno-rtti if your code doesn't use RTTI (run-time type identification) or dynamic_cast.
  • In C++, use -fvtable-gc to let the linker know about and remove unused virtual method tables.
  • Use -fno-stack-protector .
  • Use -fomit-frame-pointer (this may make the code larger on amd64).
  • Use -ffunction-sections -fdata-sections -Wl,--gc-sections . Without this all code from each needed .o file will be included. With this only the needed code will be included.
  • For i386, use -mpreferred-stack-boundary=2 .
  • For i386, use -falign-functions=1 -falign-jumps=1 -falign-loops=1 .
  • In C, use -fno-unwind-tables -fno-asynchronous-unwind-tables . Out of these, -fno-asynchronous-unwind-tables makes the larger difference (can be several kilobytes).
  • Use -fno-math-errno, and don't check the errno after calling math functions.
  • Try -fno-unroll-loops, sometimes it makes the file smaller.
  • Use -fmerge-all-constants.
  • Use -mfpmath=387 -mfancy-math-387 to make floating point computations shorter.
  • If you don't need double precision, but float preecision is enough, use -fshort-double -fsingle-precision-constant .
  • If you don't need IEEE-conformat floating point calculations, use -ffast-math .
  • Use -Wl,-z,norelro for linking, which is equivalent to ld -z norelro .
  • Use -Wl,--hash-style=gnu for linking, which is equivalent to ld --hash-style=gnu . You may also try =sysv instead of =gnu, sometimes it's smaller by a couple of bytes. The goal here is to avoid =both, which is the default on some systems.
  • Use -Wl,--build-id=none for linking, which is equivalent to ld --build-id=none .
  • Get more flags from the Os list in diet.c of diet libc, for about 15 architectures.
  • Don't use these flags: -pie, -fpie, -fPIE, -fpic, -fPIC. Some of these are useful in shared libraries, so enable them only when compiling shared libraries.

Other ways to reduce the binary size:

    http://www.muppetlabs.com/~breadbox/software/elfkickers.html
  • Run strip -S --strip-unneeded --remove-section=.note.gnu.gold-version --remove-section=.comment --remove-section=.note --remove-section=.note.gnu.build-id --remove-section=.note.ABI-tag on the resulting binary to strip even more unneeded parts. This replaces the gcc -s flag with even more aggressive stripping.
  • If you are using uClibc or diet libc, then additionally run strip --remove-section=.jcr --remove-section=.got.plt on the resulting binary.
  • If you are using uClibc or diet libc with C or C++ with -fno-exceptions, then additionally run strip --remove-section=.eh_frame --remove-section=.eh_frame_ptr on the resulting binary.
  • After running strip ... above, also run sstrip on the binary. Download sstrip from ELF Kickers, and compile it for yourself. Or get the 3.0a binary from here.
  • In C++, avoid STL. Use C library functions instead.
  • In C++, use as few template types as possible (i.e. code with vector<int> and vector<unsigned> is twice as long as the code with vector<int> only).
  • In C++, have each of your non-POD (plain old data) classes an explicit constructor, destructor, copy-constructor and assignment operator, and implement them outside the class, in the .c file.
  • In C++, move constructor, destructor and method bodies outside the class, in the .c file.
  • In C++, use fewer virtual methods.
  • Compress the binary using UPX. For small binaries, use upx --brute or upx --ultra-brute . For large binaries, use upx --lzma . If you have large initialized arrays in your code, make sure you declare them const, otherwise UPX won't compress them.
  • Compress the used libraries using UPX.
  • If you use static linking (e.g. gcc -static), use uClibc (most convenient way: pts-xstatic or diet libc (most convenient way: the included diet tool) or musl (most convenient way: the included musl-gcc tool) instead of glibc (GNU C library).
  • Make every function static, create a .c file which includes all other .c files, and compile that with gcc -W -Wall. Remove all code to which the compiler says is unused. Last time this saved about 9.2 bytes per function for me.
  • Don't use __attribute__((regparm(3))) on functions, it tends to make the code larger.
  • If you have several binaries and shared libraries, consider unifying the binaries into a single one (using symlinks and distinguishing in main with argv[0]), and moving the library code to the binary. This is useful, because the shared libraries use position-independent code (PIC), which is larger.
  • If it's feasible, rewrite your C++ code as C. Once it's C, it doesn't matter if you compile it with gcc or g++.
  • If your binary is already less than 10 kilobytes, consider rewriting it in assembly, and generating the ELF headers manually, see the tiny ELF page for inspiration.
  • If your binary is already less than 10 kilobytes, and you don't use any libc functions, use a linker script to generate tiny ELF headers. See the tarball with the linker script.
  • Drop the --hash-style=... flag passed to ld by gcc. To do so, pass the -Bmydir flag to gcc, and create the executable mydir/ld, which drops these flags and calls the real ld.
  • See more flags and ideas in this answer.

2013-12-12

Announcing pts-clang: A portable LLVM Clang C and C++ compiler release running on Linux i386 and Linux x86_64

This blog post announces pts-clang, a portable LLVM Clang C and C++ compiler release running on Linux i386 and Linux x86_64.

pts-clang is a portable Linux i386 version of the clang tool of the LLVM Clang compiler, version 3.3. C and C++ compilation is supported, other frontends (such as Objective C) were not tested. The tool runs on Linix i386 and Linux amd64 systems. It's libc-independent, all code is statically linked to the binary. It also contains statically linked linker (ld), so installing binutils is not necessary.

See all details in the most recent README.

Trying it out

If you don't have root access and you don't have the libc headers (e.g. in the libc6-dev package, sample file /usr/include/stdio.h) installed, you can still try pts-clang, with pts-xstatic. See the details in this blog post.

If you don't have /usr/include/stdio.h, install the libc development package:

$ sudo apt-get install libc6-dev

To be able to compile both i386 and amd64 binaries, and you have a recent Linux distribution (e.g. Ubuntu Precise) you can install the libc6-dev package for both architectures:

$ sudo apt-get intall libc6-dev:i386 libc6-dev:x86_64

Download and try pts-clang like this:

$ cd /tmp
$ rm -f pts-clang-latest.sfx.7z
$ wget http://pts.50.hu/files/pts-clang/pts-clang-latest.sfx.7z
$ chmod +x pts-clang-latest.sfx.7z
$ ./pts-clang-latest.sfx.7z -y  # Creates the pts-clang directory.
$ cat >>hw.c <<'END'
#include 
int main(void) {
  return !printf("Hello, %s!\n", "World");
}
$ pts-clang/bin/clang -s -O2 -W -Wall hw.c
$ ./a.out
Hello, World!
END

Use clang++ for compiling C++ code, but for that you have to install one of the libstdc++...-dev packages first.

Does pts-clang create portable executables?

By default (without the -xstatic or -xermine flags), the executables created by pts-clang are just as portable as those generated by gcc or clang. They are dynamically linked (unless -static is specified), thus they depend on the system libraries (e.g. /lib/libc.so.6).

If the -static flag is specified, then the executable becomes statically linked, but this doesn't provide real portability, because for calls such as getpwent(3) (getting info of Unix system users) and gethostbyname(3) (DNS resolution), glibc loads files such as libnss_compat.so, libnss_dns.so. On the target system those libraries may be incompatible with your binary, so you may get a segfault or unintended behavior. pts-xstatic solves this, because it uses uClibc.

If the -xstatic flag is specified, pts-xstatic is used to create a portable statically linked, Linux i386 executable, linked against uClibc.

If the -xermine flag is specified, Ermine is used to pack library and other dependencies to a single, portable executable. This can be even more portable than -xstatic, because Ermine can pack locale files, gconv libraries etc. Ermine is a Linux ELF portable executable creator: it takes a dynamically linked ELF executable, discovers its dependencies (e.g. dynamic libraries, NSS libaries), and builds a protable, statically linked ELF executable containing all the dependencies. See the features, licensing information and get Ermine from here. The result can be even more portable than -xstatic, because Ermine can pack locale files, gconv libraries etc. Not all the packing is automatic: use -xermine,... to specify packing flags to Ermine.

Portability improvements

  • pts-clang is portable (libc-independent): all shipped binaries are either statically linked. (The clang binary is packed with Ermine, the file is a statically linked executable, which contains a dynamically linked executables and its dependencies (e.g. libc etc.) with itself.) The system libraries are not used for running the compiler (but they are used for linking the output file, except when -xstatic is specified).
  • A statically linked linker (ld, GNU gold) binary is provided, so GNU binutils is not a requirement for compilation on the host system.
  • Some other optional, statically linked binutils tools (ar, ranlib and strip) are also provided for convenience in the pts-static-binu binary release, see more info in its README. These tools can be used for auxiliary tasks such as building static libraries.

Because of these portability improvemenets, it's easy to run pts-clang in a chroot environment.

C++11 and C++0x compatibility

Please note that even though Clang 3.3 supports C++11, much of that is implemented in the C++ standard library (GCC's libstdc++ or Clang's libc++) header files, and no attempt is made in pts-clang to provide the most up-to-date C++ standard library. With -xstatic, an old libstdc++ (the one from gcc-4.4.3) is provided, and without -xstatic the system's default libstdc++ will be used, which can be older than C++11.

Author, copyright and recompilation

The binaries here were created by Péter Szabó, using existing LLVM Clang and uClibc cross compiler and other binaries, and writing some custom trampoline code. See the details in the GitHub repository.

All software mentioned in this blog post is free software and open source, except for Ermine.

Thanks to Ermine

The author of pts-clang is grateful and says thank you to the author of Ermine, who has provided a free-of-charge Ermine license, using which the portable clang.bin binary was created from the official Clang binary release (which is libc-dependent).

If you want to create portable Linux executables (and you don't care too much about file size), give Ermine a try! It's the most comfortable, easy-to-use, and comprehensive tool available.

Installation instructions for the old version (v1)

Download the old version (v1) from here: http://pts.szit.bme.hu/files/pts-clang/pts-clang-xstatic-bin-3.3-linux-i386-v1.sfx.7z You can extract it with 7z (in the p7zip package), but you can also make it executable and run it, because it's a self-extracting archive.

Clang itself is a cross-compiler, so it can generate object files for many architectures and operating systems (see the -target flag), but you also need a system-specific linker (not included) to build binaries or libraries. In newer versions (v2 and later), a copy of the GNU gold linker is also incuded.

This release introduces a non-standard command-line flag -xstatic which enables the Linux i386 target with static linking using the bundled uClibc library. The required .h and .a files, as well as a portable GNU ld linker binary are also included for the use of this flag. In newer versions (v2 and later) the files required by -xstatic are available in a separate download, see the details in this blog post.

Installation instructions for the even old version (v0)

Download version v0 from here: pts-clang-xstatic-bin-3.3-linux-i386.sfx.7z. You can extract it with 7z (in the p7zip package), but you can also make it executable and run it, because it's a self-extracting archive.

More info

See the most recent README for full installation instructions, usage details, full feature list etc.