long int str[2] = {0x41424344, 0x0}; /* ASCII "ABCD" */
printf ("%s\n", (char *)&str);
A little-endian (e.g., VAX) will print "DCBA" whereas a big-endian (e.g., MC68000 microprocessors) will print "ABCD". (As a side note, there is also PDP-endian that would print "BADC", followed by many smileys.)
Note: The example will only function correctly if sizeof(long int) is 32 bits. Although not portable, it serves well as an example for the given problem.
char *s = "bla"; /* allocated by compiler */ int *v = (int *)s;
would most probably fail if the alignment constraints of int types are more strict than those of char types (the usual case for RISC architectures). The code would not fail due to alignment constraints if the memory indicated by s had been allocated by malloc and friends.
int *p = (int *) malloc(...); ... free(p);
This code may malfunction in architectures where int * and char * have different representations because free expects a pointer of the latter type.
Pointers to different types of objects may have different sizes as well. For instance, there are platforms where a char * is larger than an int * or where a pointer to a function will not fit in, e.g., char * or void * (although such cross-assignments work on many platforms, void * is only guaranteed to be large enough to hold a pointer to any data object). Therefore, it is not portable to assign to an object of type void * a pointer to a function. Pointers to functions are further discussed below.
For example, it is not possible to (portably) use varargs functions (17) (that is, functions that take a variable number of arguments) and fixed-argument functions interchangeably, even if the overlapping types match (that is, even if the first n arguments to the fixed-argument function are the same as the first n arguments to the varargs function). For instance, a function that is declared as having an integer as the first argument and an optional (integer) second argument cannot be called as a function that takes two integer arguments. Similarly, varargs functions of various type signatures cannot be interchanged. Such type cheating will break on systems that use different conventions for calling fixed-argument and varargs functions and on systems that use different conventions for passing the fixed and varargs parts of the argument lists.
As a corollary, it is necessary that the definitions of external variadic functions be available at the point of their usage, e.g., library functions such as printf.
A null pointer of a given type will always convert to a null pointer of another type if implicit or explicit conversion is performed. (See 'Pointer Operators' above.)
The contents of a null pointer may be anything the implementor wishes, and dereferencing it may cause strange things to happen....
In arithmetic expressions, integral types may be converted in two ways: unsigned-preserving or value-preserving. In the unsigned-preserving model, chars, shorts, and bit-fields are converted to unsigned int or signed int if the original types have the modifiers unsigned or signed, respectively.
The Standard determines that the value-preserving model must be used, meaning that unsigned values are promoted to signed int, or simply int, if it can represent all the values of the original type; otherwise it is converted to unsigned int. (See section 3.2 of the Standard.)
The following example illustrates the problem. On a machine with a 16-bit short int, and 32-bit int, the code fragment
unsigned short int x = 1;
if (x < -1) printf ("unsigned-preserving");
else printf ("value-preserving");
prints unsigned- or value-preserving accordingly. Plenty of other examples can be derived, such as initializing x with 2^15 and using the predicate (x*x*2 > 0). The expression x*x*2 would probably result in the same bit pattern in both models but would cause arithmetic overflow in the value-preserving model.
In practice, much too frequently one runs into several, unstated compiler limitations:
The Standard has introduced and officialized current practice, but as we all know not many compilers conform to the Standard. Among the features that are not yet widely supported, we mention here only a few:
The calloc library function returns an area of memory that has been cleared to zero. Although this can be used to initialize arrays and structs on many architectures, not all architectures represent NULL pointers internally with a zero bit-pattern. Similarly, it is not safe to assume that all architectures represent the floating-point constant 0.0 using a zero bit-pattern.
The semantics of many library functions differ from system to system. Also, the specifications of some library functions have been changed in the ANSI C Standard. For example, realloc is now required to behave like malloc when called with a NULL argument; formerly, many implementations would dump core if handed NULL.
#if __STDC__ # define HAS_VOIDP #endif #ifdef HAS_VOIDP typedef void *voidp; #else typedef char *voidp; #endif
Note: In statements such as "char *s = "string"", "string" is a string constant, whereas in "char s[] = "string" it is not and it is legal to modify s.
{
typedef struct {double x,y} Interval;
Interval range = {0.0,0.0};
...
}
are not allowed by some compilers unless the modifier static is used or if range has file scope. Although declaring all such variables static would handle most situations, the most portable solution is to add code that performs the initialization.
Remember that not all operating systems share Unix's simple notion of a file as a stream of bytes. MS-DOS, for instance, has text files and binary files; it is important to open files in the correct mode. VMS has many different file types and each file is viewed as being a collection of structured records.
MS-DOS provides a "poor man's" implementation of pipes and redirection. It does not expand wildcards, however. The user must do the wildcard expansion using findfirst and findnext. Under VMS, the user must also expand wildcards, and parse argv for redirection directives manually.
Different operating systems use widely different syntax to specify pathnames. This is a potential source of problems. Some compilers may provide run-time pathname translation to translate between Unix syntax and the host's syntax.
Many of the tools and libraries that one takes for granted on Unix, such as lex, yacc, curses, sed, awk, and the various shells, are often not available on other operating systems. Public-domain versions of most of the useful tools are available at many archive sites. However, the so-called copyleft restrictions on many of these programs may prove to be problematic to some would-be porters.
If the program's messages are likely to be translated into other languages, take care to modularize the code for easy translation. Consider keeping all text in a "language" file. Be aware that carefully formatted reports and printing routines may need major surgery.
Write code under the assumption that it will be ported to many strange machines. It is considerably easier to port code to a new environment when the code has been written with porting in mind, than it is to "retrofit" portability.
One school of thought advocates "Port early, port often." That is, whenever the code reaches a certain level of stability on the development system, port it to other systems. This method has the advantage that portability problems are discovered early, and the possible disadvantage that potentially far more time could be spent in porting than would be the case if the code were just ported once, when complete.
Code in ANSI C whenever possible. Many of the extensions -- prototypes, stronger type-checking, etc. -- enhance portability. The more widely ANSI C is used, the quicker it will gain acceptance. Of course, this may not be an option if the code must be ported to platforms without ANSI C compilers. The short-term solution is to use the various tricks discussed in Recommended C Style and Coding Standards [1] and elsewhere; the long-term solution is to force vendors to release ANSI C compilers for their systems. Alternatively, a converter such as protoize (available via anonymous FTP from prep.ai.mit.edu) can convert between ANSI and non-ANSI programs.
Make complete, correct declarations; don't let parameters default to int. Include all of the necessary header files. Declare functions with no return value as void. Check the results of system calls.
Use lint. Programs that fail to pass lint quietly will undoubtedly be difficult to port. Compile code with as many different compilers as possible with all warnings enabled.
Recommended C Style and Coding Standards [1] has more to say about this.
18. One of the reasons for these rules is that in some architectures, pointers are represented as a pair of values and only equality is a well-defined operator for arbitrary pairs of values. The other operators are only well-defined when one of the values of both pairs is guaranteed to match, in which case the situation is analogous to "ordinary" architectures.
19. Programs that generate other programs, e.g., yacc, can generate, for instance, very large switch statements.
contents
VMS
Further Reading