11. General Guidelines

11.1. Types and Pointers

Type sizes:
Never make any assumptions about the size of a given type, especially pointers [1]. Statements such as x &= 0177770 make implicit use of the size of x. If the intention is to clear the lowest three bits, then it is best to use x &= ~07. The first alternative will also clear the high-order 16 bits if x is 32 bits wide.

Byte ordering:
There are two possibilities for byte ordering: little-endian and big-endian architectures. This problem is illustrated by the code below:

   long int str[2] = {0x41424344, 0x0}; /* ASCII "ABCD" */
   printf ("%s\n", (char *)&str);

A little-endian (e.g., VAX) will print "DCBA" whereas a big-endian (e.g., MC68000 microprocessors) will print "ABCD". (As a side note, there is also PDP-endian that would print "BADC", followed by many smileys.)

Note: The example will only function correctly if sizeof(long int) is 32 bits. Although not portable, it serves well as an example for the given problem.

Alignment constraints:
Beware of alignment constraints when allocating memory and using pointers. Some architectures restrict the addresses that certain operands may be assigned to (that is, addresses of the form 2^k E, where k > 0). Code such as

   char *s = "bla"; /* allocated by compiler */
   int  *v = (int *)s;

would most probably fail if the alignment constraints of int types are more strict than those of char types (the usual case for RISC architectures). The code would not fail due to alignment constraints if the memory indicated by s had been allocated by malloc and friends.

Pointer formats [1]
Pointers to objects may have the same size but different formats. This is illustrated by the code below:

   int *p = (int *) malloc(...); ... free(p);

This code may malfunction in architectures where int * and char * have different representations because free expects a pointer of the latter type.

Pointers to different types of objects may have different sizes as well. For instance, there are platforms where a char * is larger than an int * or where a pointer to a function will not fit in, e.g., char * or void * (although such cross-assignments work on many platforms, void * is only guaranteed to be large enough to hold a pointer to any data object). Therefore, it is not portable to assign to an object of type void * a pointer to a function. Pointers to functions are further discussed below.

Pointers to functions
If you need a generic function pointer, then use void(*)(void). Be sure to cast the pointer back to the original type before using it. That is, the type signature of the function pointer at the point that the function is called must exactly match the type signature at the point at which the function is defined.

For example, it is not possible to (portably) use varargs functions (17) (that is, functions that take a variable number of arguments) and fixed-argument functions interchangeably, even if the overlapping types match (that is, even if the first n arguments to the fixed-argument function are the same as the first n arguments to the varargs function). For instance, a function that is declared as having an integer as the first argument and an optional (integer) second argument cannot be called as a function that takes two integer arguments. Similarly, varargs functions of various type signatures cannot be interchanged. Such type cheating will break on systems that use different conventions for calling fixed-argument and varargs functions and on systems that use different conventions for passing the fixed and varargs parts of the argument lists.

As a corollary, it is necessary that the definitions of external variadic functions be available at the point of their usage, e.g., library functions such as printf.

Pointer operators [1]
Only the operators == and != are defined for all pointers of a given type. The remaining comparison operators (<, <=, >, and >=) can only be used when both operands point into the same array or to the first element after the array. The same applies to arithmetic operators on pointers. (18)

NULL pointer:
Never redefine the NULL symbol. The NULL symbol should always be the constant zero. A null pointer of a given type will always compare equal to the constant zero, whereas comparison with a variable with value zero or to some non-zero constant has implementation-defined behavior. (In other words, the constant zero has two meanings.)

A null pointer of a given type will always convert to a null pointer of another type if implicit or explicit conversion is performed. (See 'Pointer Operators' above.)

The contents of a null pointer may be anything the implementor wishes, and dereferencing it may cause strange things to happen....

11.2. Compiler Differences

11.2.1. Conversion Rules

In arithmetic expressions, integral types may be converted in two ways: unsigned-preserving or value-preserving. In the unsigned-preserving model, chars, shorts, and bit-fields are converted to unsigned int or signed int if the original types have the modifiers unsigned or signed, respectively.

The Standard determines that the value-preserving model must be used, meaning that unsigned values are promoted to signed int, or simply int, if it can represent all the values of the original type; otherwise it is converted to unsigned int. (See section 3.2 of the Standard.)

The following example illustrates the problem. On a machine with a 16-bit short int, and 32-bit int, the code fragment

   unsigned short int x = 1;
   if (x < -1) printf ("unsigned-preserving");
   else printf ("value-preserving");

prints unsigned- or value-preserving accordingly. Plenty of other examples can be derived, such as initializing x with 2^15 and using the predicate (x*x*2 > 0). The expression x*x*2 would probably result in the same bit pattern in both models but would cause arithmetic overflow in the value-preserving model.

11.2.2. Compiler Limitations

In practice, much too frequently one runs into several, unstated compiler limitations:

11.2.3. ANSI C

The Standard has introduced and officialized current practice, but as we all know not many compilers conform to the Standard. Among the features that are not yet widely supported, we mention here only a few:

Constant suffixes:
Many compilers allow for suffixes to be appended to constants, such as 10L to indicate a long constant. The Standard allows further typing of constants, such as 10UL to indicate an unsigned long constant. However, multiple suffixes are not supported by many compilers.

New types:
Besides the type void * which is mentioned in the next section, the Standard has introduced the type long double.

Variadic functions:
Variadic functions, as defined by the Standard, differ significantly from <varargs.h>. Besides the ellipsis notation, it is required by the Standard that the first argument be identified and that <stdarg.h> be used instead (see section 7.7). Therefore, it is not possible to define a variadic function which takes no arguments.

11.2.4. Miscellaneous

char types:
When char types are used in expressions, most implementations will treat them as unsigned but there are many others that treat them as signed (e.g., VAX C and HP-UX). It is advisable to always cast chars when they are used in arithmetic expressions.

Initialization:
Do not rely on the initialization of auto variables and of memory returned by malloc. In particular, since not all NULL pointers are represented by a bit pattern of all-zeroes, it is good practice to always initialize pointers appropriately.

The calloc library function returns an area of memory that has been cleared to zero. Although this can be used to initialize arrays and structs on many architectures, not all architectures represent NULL pointers internally with a zero bit-pattern. Similarly, it is not safe to assume that all architectures represent the floating-point constant 0.0 using a zero bit-pattern.

The semantics of many library functions differ from system to system. Also, the specifications of some library functions have been changed in the ANSI C Standard. For example, realloc is now required to behave like malloc when called with a NULL argument; formerly, many implementations would dump core if handed NULL.

Bit fields:
Some compilers, e.g., VAX C, require that bit fields within structs be of type int or unsigned. Furthermore, the upper bound on the length of the bit field may differ among different implementations.

sizeof:

void and void *:
Some very old compilers do not recognize void [sic]. Although required by the Standard, some compilers recognize void but fail to recognize void *. The following code might prove useful:

#if __STDC__
#  define  HAS_VOIDP
#endif
#ifdef HAS_VOIDP
   typedef void *voidp;
#else
   typedef char *voidp;
#endif

Functions as arguments:
When calling functions passed as arguments, always dereference the pointer. In other words, if f is a pointer to a function, use (*f)() instead of simply (f)(), because some compilers may not recognize the latter.

String constants:
Do not modify string constants since many implementations place them in read-only memory. Furthermore, that is what the Standard requires -- and that is how a constant should behave!

Note: In statements such as "char *s = "string"", "string" is a string constant, whereas in "char s[] = "string" it is not and it is legal to modify s.

struct comparisons:
Some compilers might allow for structs to be compared for equality or inequality. Such an extension is not included in the Standard (meaning it is not portable).

Initialization of aggregates:
Some compilers cannot initialize auto aggregate types. Statements such as:

{
   typedef struct {double x,y} Interval;
   Interval range = {0.0,0.0};
   ...
}

are not allowed by some compilers unless the modifier static is used or if range has file scope. Although declaring all such variables static would handle most situations, the most portable solution is to add code that performs the initialization.

Nested comments:
Nested comments were never allowed in the C language, but they are allowed by some compilers. Nested comments are used by some to comment out source code containing comments. However, the same effect can be obtained using an #if 0 and #endif pair.

Shift operators:
When shifting signed ints right, the vacated bits might be filled with zeroes or with copies of the sign bit. unsigned ints will be filled with zeroes.

Division and remainder:
When both operands are non-negative, then the remainder is non-negative and smaller than the divisor; if not, it is guaranteed only that the absolute value of the remainder is smaller than the absolute value of the divisor.

11.3. Files

11.3.1. General Guidelines

Remember that not all operating systems share Unix's simple notion of a file as a stream of bytes. MS-DOS, for instance, has text files and binary files; it is important to open files in the correct mode. VMS has many different file types and each file is viewed as being a collection of structured records.

MS-DOS provides a "poor man's" implementation of pipes and redirection. It does not expand wildcards, however. The user must do the wildcard expansion using findfirst and findnext. Under VMS, the user must also expand wildcards, and parse argv for redirection directives manually.

Different operating systems use widely different syntax to specify pathnames. This is a potential source of problems. Some compilers may provide run-time pathname translation to translate between Unix syntax and the host's syntax.

11.3.2. Source Files

11.4. Miscellaneous

System dependencies:
Isolate system-dependent code in separate modules and use conditional compilation.

Utilities:
Utilities for compiling and linking such as Make simplify considerably the task of moving an application from one environment to another. Even better, use Imake since Make files are very unportable. Imake is distributed with the X Window System by MIT. One of the authors of this document has used it extensively with very good results.

Many of the tools and libraries that one takes for granted on Unix, such as lex, yacc, curses, sed, awk, and the various shells, are often not available on other operating systems. Public-domain versions of most of the useful tools are available at many archive sites. However, the so-called copyleft restrictions on many of these programs may prove to be problematic to some would-be porters.

Name space pollution:
Minimize the number of global symbols in the application. One of the benefits is the lower probability that any conflicts will arise with system-defined functions.

Character sets:
Do not assume that the character set is ASCII. If the character set in question is not [American] English, then other characters will also be alphabetic, and their lexicographic ordering will not necessarily have any relationship to their positions within the character set. If the character set is Asian, then "characters" may be of type wchar_t, not char, and will, in general, require two or more bytes of storage each. The library string functions should be capable of handling these correctly. Code that iterates through arrays of chars may need to be changed to handle multibyte characters correctly.

If the program's messages are likely to be translated into other languages, take care to modularize the code for easy translation. Consider keeping all text in a "language" file. Be aware that carefully formatted reports and printing routines may need major surgery.

Binary Data:
Great care must be taken when reading and writing binary data. For example, a file of floating-point numbers in binary format written by machine A is unlikely to be usable on machine B.

11.5. Writing Portable Code

Write code under the assumption that it will be ported to many strange machines. It is considerably easier to port code to a new environment when the code has been written with porting in mind, than it is to "retrofit" portability.

One school of thought advocates "Port early, port often." That is, whenever the code reaches a certain level of stability on the development system, port it to other systems. This method has the advantage that portability problems are discovered early, and the possible disadvantage that potentially far more time could be spent in porting than would be the case if the code were just ported once, when complete.

Code in ANSI C whenever possible. Many of the extensions -- prototypes, stronger type-checking, etc. -- enhance portability. The more widely ANSI C is used, the quicker it will gain acceptance. Of course, this may not be an option if the code must be ported to platforms without ANSI C compilers. The short-term solution is to use the various tricks discussed in Recommended C Style and Coding Standards [1] and elsewhere; the long-term solution is to force vendors to release ANSI C compilers for their systems. Alternatively, a converter such as protoize (available via anonymous FTP from prep.ai.mit.edu) can convert between ANSI and non-ANSI programs.

Make complete, correct declarations; don't let parameters default to int. Include all of the necessary header files. Declare functions with no return value as void. Check the results of system calls.

Use lint. Programs that fail to pass lint quietly will undoubtedly be difficult to port. Compile code with as many different compilers as possible with all warnings enabled.

Recommended C Style and Coding Standards [1] has more to say about this.


17. There is a difference between variadic functions defined by the Standard and the pre-Standard varargs as defined by varargs.h which is still widely used. Here we are referring to the former, and the differences between both are explored in section 11.2.3.

18. One of the reasons for these rules is that in some architectures, pointers are represented as a pair of values and only equality is a well-defined operator for arbitrary pairs of values. The other operators are only well-defined when one of the values of both pairs is guaranteed to match, in which case the situation is analogous to "ordinary" architectures.

19. Programs that generate other programs, e.g., yacc, can generate, for instance, very large switch statements.


upcontents previousVMS nextFurther Reading