|
|
|
|
||
|
DEFINITION MODULE DMFloatEnv; (******************************************************************* Module DMFloatEnv ('Dialog Machine' DM_V3.0) Copyright (c) 1991-2011 by Andreas Fischlin and ETH Zurich. Purpose Management of the floating point environment of a computer, such as the SANE (Standard Apple Numerics Environment) environment on the Macintosh, to control dynamically the trapping, examination of accrued exceptions or the clearing and raising of exceptions. Remarks The IEEE Standard 754 specifies number formats and the arithmetic model for the Binary Floating-Point Arithmetic for reals, i.e. the Modula-2 elementary number types REAL (single precision) and LONGREAL (double precision). This standard defines also what happens under special conditions, in particular when a mathematically invalid situation arises such as a division by zero. According to the IEEE standard the behavior of all floating-point arithmetic under normal as well as exceptional conditions should also be user controllable. This module provides such means. Macintosh implementation using MacMETH: -------------------------------------- This implementation supports SANE while using the 'Dialog Machine' with MacMETH. If you partly or fully bypass SANE, e.g. by using the so-called Fink arithmetic for single-precision reals, then the use of this module will have limited or no effect. Note that the MacMETH floating point computations are only partially executed by means of SANE. Note also that the computation of mathematical functions by a module such as DMMathLib depends on the particular module variant currently in use (visible when using the Finder's "File -> Get Info" menu command). Depending on the variant currently having the default name, i.e. only extension .OBM, its implementation may or may not use SANE, depending on the original compilation and the type of the real expressions it uses internally. The following general rules apply: o Direct computations (not controllable via this module) - This holds for all expressions of type REAL compiled with the standard compiler (Compile) and while having flag 'alwaysSANE' in section "SANE" of the User.profile turned off during execution (uses the so called Fink-arithmetic). All expressions of type REAL which are computed directly do bypass SANE. They can't be controlled by this module. The big advantage of direct computations is that they are several orders faster than their SANE counterparts on older 680x0 machines without coprocessor. Direct computations conform to the IEEE floating-point standard 754 for single precision reals (32 bit, type REAL). Note, however, relations, e.g. in IF statements and mathematical operations beyond the basic operations are always handled by SANE, regardless of the current settings and the compiler used. Be also aware of the module DMMathLib which you use. This module is available in at least two different variants, i.e. one uses always SANE, the other does NOT use SANE to compute transcendental functions, i.e. trigonometric and logarithmic functions. The User.Profile and the naming of the object code files determine the use of these modules (MacMETH's dynamic linking loader uses that module which has the default extension .OBM). Be aware that any disabling of halts (exceptions) when using direct computations is not possible and this module is of no use to control the behavior of such computations. At most you may redirect eventual halt messages (aborts) resulting from encountered exceptions by means of module DMMessages. In summary: To gain full control over all REAL computations by this module use SANE only (User.Profile 'alwaysSANE' on, only SANE variants of DMMathLib, DMMathLF etc.) Note, all of above holds for natively booted Classic MacOS as well as Classic under OS X. o SANE computations (controllable via this module) - This holds for all expressions of type LONGREAL (regardless of current settings in the User.Profile) and any functions based on SANE such as relations, implicit conversions, e.g. assignment of LONGREAL expressions to a REAL variable (or vice versa), trigonometric, and logarithmic functions based on SANE, e.g. SANE variant of DMMathLib. Expressions of type LONGREAL are always computed by means of SANE, regardless of the used compiler. All real expressions (regardless of type single (REAL) or double precision (LONGREAL)) are computed by means of SANE if the compiler Compile20 is used. In this case it is of course possible to fully control the floating point environment by means of this module (see below). All computations adhere at least to the IEEE floating-point standard 754. For more details see the SANE manual (reference below). IBM PC implementation: --------------------- - to be completed - Unix (Sun) implementation: ------------------------- Uses module IEEEControl from ecp Modula-2. The latter is a DEFINITION for C and requires on older Unix systems the mathematical library libm.a (SunOS 4.x). On newer systems the library is named libsunmath.a. These libraries are only optionally available and are normally only available as part of a Sun language package, i.e. from one of the so-called SPARCworks compiler products (WorkShop). All procedures behave like those on the Macintosh except that some routines do not work at all and that the the default environment is initialized by this module by a call to SetEnvironment(DMFloatDefaultEnv). This is in contrast to the Macintosh implementation, where this module is optional and the default environment is initially controlled via the User.Profile only (see MacMETH manual for details). From this difference follows that this module belongs to the kernel of the BatchDM of RASS (on the Macintosh this module is optional). Moreover, due to hardware limitations of SPARC based machines, it is not possible to control the rounding precision. P1 Macintosh implementation (PPC): ---------------------------------- Using the P1 Modula-2 compiler and language system on the Macintosh under Mac OS X allows to control the behavior of all real operations, i.e. single-precision as well as double-precision real arithmetics, by this module. Note however, the floating point arithmetic on this architecture is based on the so-called PowerPC Numerics and uses no longer SANE (for differences see e.g. the document "SANE versus PowerPC Numerics" available from Apple's developer web site). As a consequence the precision of these operations is no longer the same as under SANE, where all operations were internally computed in the so-called extended format, i.e. with 80 bits, regardless of the target format, may it be only single (32 bit) or double-precision (64 bit). From the theoretical viewpoint, that poses some problems and can be considered a drawback (see e.g. Kahan, 2000). However, all modern CPU architectures unfortunately seem to disregard such concerns. Note also, that the P1 compiler evaluates any real expression with 64 bits, regardless whether it is programmed as a single-precision (type REAL,, 32 bit IEEE Standard) or double-real precision (type LONGREAL, 64 bit IEEE Standard) expression. Only the result of a real expression is then converted to the format of the target variable to which the expression's value is assigned. This means no conversion takes place when the value is assigned to a variable of type LONGREAL, a conversion to the single-precision format takes place if the value is assigned to a variable of type REAL. P1 Macintosh implementation (Intel): ---------------------------------- This implementation uses again a different approach that is actually very close to what Sun SPARC systems accomplish. Internal precision of single precision reals is confined to 32 bit, which affects the accuracy of the results, yet increases the portability of the results among the various platforms. For more details on the actual behavior of the routines provided from this module see the explanations provided with each routine. Note, towards the end of this definition module you find also many details on the actual storage format of floating point numbers as defined by the IEEE 754 standard and as generally used for the elementary Modula-2 data types REAL (single precision) and LONGREAL (double precision). References: IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985 (IEEE 754) Apple(R) Apple Numerics Manual (SANE Manual), Second Edition, For all Apple II and Macintosh(R) computers, 1988. Addison Wesley Publishing Company, Inc. Reading, MA a.o. 294 pp. ISBN 0-201-17738-2. See in particular p. 162ff., 226ff., p. 57ff. Goldberg, D., 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surveys, 23(1): 5-48. <http://dx.doi.org/10.1145/103162.103163> Go059 Kahan, W., 2000. Marketing versus mathematics and other ruminations on the design of floating-point arithmetic. p. 48. URL: http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf Ka077 http://docs.sun.com This module belongs to the 'Dialog Machine'. Programming o Design Andreas Fischlin 27/08/1991 o Implementation Andreas Fischlin 27/08/1991 ETH Zurich Systems Ecology CHN E 35.1 Universitaetstrasse 16 8092 Zurich SWITZERLAND URLs: <mailto:RAMSES@env.ethz.ch> <http://www.sysecol.ethz.ch> <http://www.sysecol.ethz.ch/SimSoftware/RAMSES> Last revision of definition: 25/07/2011 AF *******************************************************************) (**************************************) (*##### Halts and Exceptions #####*) (**************************************) CONST (* Exception codes used to denote individual exceptions when calling the routines EnableHalt, DisableHalt, HaltEnabled, ExceptionPending, RaiseException, and ClearException: *) invalid = 0; (* invalid operands *) underflow = 1; (* result > MIN(positive REAL) > 0 *) overflow = 2; (* result < MIN(REAL) OR result > MAX(REAL) *) divideByZero = 3; (* division by zero *) inexact = 4; (* rounding result <> mathematically exact result *) (* see also SANE manual p. 54 - 56 *) TYPE Exception = [invalid..inexact]; PROCEDURE EnableHalt(which: Exception); PROCEDURE DisableHalt(which: Exception); PROCEDURE HaltEnabled(which: Exception): BOOLEAN; PROCEDURE RaiseException(which: Exception); PROCEDURE ClearException(which: Exception); PROCEDURE ExceptionPending(which: Exception): BOOLEAN; (* IMPLEMENTATION RESTRICTION on Macintosh platform using MacMETH: If you currently use the so-called Fink arithmetics ("'alwaysSANE' off" in section "SANE" of the User.Profile) for single precision REAL operations, then above procedures have no effect. With Fink arithmetics it is not possible to disable the following halts: invalid, overflow, and divideByZero. The Fink arithmetics always ignores the exceptions underflow and inexact. *) (************************************) (*##### Rounding Direction #####*) (************************************) TYPE RoundDir = (toNearest, upward, downward, towardZero); (* Possible rounding directions are: round towards nearest (toNearest), round towards +. (upward), or round towards -. (downwoard), or round towards zero (towardZero). Rounding towards nearest returns the representable number nearest to the mathematical result when the mathematical result of an operation lies halfway between two representable numbers. If the mathematical result lies exactly in the middle between the two representable numbers, then mode toNearest specifies round to the nearest even number (final bit is zero). Rounding towards zero is the way many pre-IEEE computers work, and corresponds mathematically to truncating the result. For example, if 2/3 is rounded to 6 decimal digits, the result is .666667 when the rounding mode is round towards nearest, but .666666 when the rounding mode is round towards zero. see also SANE manual p. 52 and 53. The IEEE default rounding direction is round towards nearest. All implementations of this module do not alter the the default. *) PROCEDURE SetRound(r: RoundDir); PROCEDURE GetRound(VAR r: RoundDir); (************************************) (*##### Rounding Precision #####*) (************************************) TYPE RoundPre = (extPrecision, dblPrecision, sglPrecision); (* Rounding precision extended, double, or single. E.g. it may be useful while testing numerical algorithms to set in expressions of type LONGREAL the rounding precision to sglPrecision. The results will then be the same as if the expressions would have been of type REAL, without having to modify the source code and to recompile. On the Macintosh the default is extPrecision (see also SANE manual p. 53). *) PROCEDURE SetPrecision(p: RoundPre); PROCEDURE GetPrecision(VAR p: RoundPre); (* Unix (Sun) implementation: ------------------------- SPARC based machines do not support the use of these routines, since rounding is always done with a predefined precision. See http://docs.sun.com for further information. *) (******************************************) (*##### Full environment control #####*) (******************************************) TYPE FloatEnvironment = BITSET; CONST (* If flag is in the environment the halt is called as soon as an exception is raised (encountered during computations or by calling RaiseException). *) haltIfInvalid = 0; haltIfUnderflow = 1; haltIfOverflow = 2; haltIfDivideByZero = 3; haltIfInexact = 4; IEEEFloatDefaultEnv = FloatEnvironment{}; (* all halts disabled *) (* NOTE: implies also RoundDir = toNearest, RoundPre = extPrecision *) DMFloatDefaultEnv = FloatEnvironment{haltIfInvalid, haltIfOverflow, haltIfDivideByZero}; (* NOTE: implies also RoundDir = toNearest, RoundPre = extPrecision *) (* Macintosh MacMETH implementation: -------------------------------- This module preserves the current floating point environment except for any pending exceptions, which are all cleared. It terminates by resuming the original environment of its callee via a so-called TermProcedure (is executed always), regardless of any changes which have been made in the mean-time. A recommended setting is DMFloatDefaultEnv. To temporarily disable any exceptions, you may use IEEEFloatDefaultEnv. Unix (Sun) implementation: ------------------------- This module overwrites the current floating point environment and sets it initially (by default) to DMFloatDefaultEnv. Note, this module belongs to the kernel of the BatchDM and influences all computations done by RASS. *) PROCEDURE GetEnvironment(VAR e: FloatEnvironment); PROCEDURE SetEnvironment(e: FloatEnvironment); (* The environment holds all settings, i.e. those for halts, rounding direction, and rounding precision. Thus SetEnvironment affects all settings at once and may override some individual settings accomplished by calling routines such as SetPrecision, SetRound, or individual EnableHalt etc. Of course GetEnvironment returns all the current settings also at once. *) PROCEDURE ProcEntry(VAR e: FloatEnvironment); PROCEDURE ProcExit(e: FloatEnvironment); (* ProcEntry makes it possible to temporarily disable any halts, e.g. in order to perform some specific computations such as analyzing results, clearing pending exceptions, and raising specific exceptions. In such a case save first the current environment saveEnv and set temporarily the settings to the default (IEEEFloatDefaultEnv) by calling ProcEntry(saveEnv). Then at the end of your computations resume the saved environment via ProcExit. ProcExit will reestablish the previous settings up to the call of ProcExit. There is also another typical usage useful: E.g if you wish to disable temporarily all halts, and then to resume the previous environment you can first call ProcEntry(saveEnv), do your computations eventually producing exceptions, and finally call SetEnvironment(saveEnv). Note: ProcEntry(saveEnv) is an efficient equivalent for the statement sequence GetEnvironment(saveEnv); ClearException(invalid); ClearException(overflow); ClearException(underflow); ClearException(divideByZero); ClearException(inexact); SetEnvironment(IEEEFloatDefaultEnv); ProcExit(saveEnv) is an efficient equivalent for IF ExceptionPending(invalid) THEN RaiseException(invalid) END; IF ExceptionPending(overflow) THEN RaiseException(overflow) END; IF ExceptionPending(underflow) THEN RaiseException(underflow) END; IF ExceptionPending(divideByZero) THEN RaiseException(divideByZero) END; IF ExceptionPending(inexact) THEN RaiseException(inexact) END; SetEnvironment(saveEnv); For a sample program using these features see also the SANE Manual p.59. *) CONST (* The following constants are only meaningful for the SANE implementation of this module using MacMETH on the Macintosh. They denote individual flags used in the environment word (for details see the SANE manual p.276). *) (* Exception flag is set if an exception has ocurred. However, actual exceptions should be inquired by routine ExceptionPending and can't be raised by changing the corresponding flag in the environment (use RaiseException instead). On the other hand, clearing an exception flag in the environment results normally also in the clearing of an eventually pending exception. *) flagIfInvalid = 8; flagIfUnderflow = 9; flagIfOverflow = 10; flagIfDivideByZero = 11; flagIfInexact = 12; (****************************************) (*##### Floating Point Numbers #####*) (****************************************) (* IEEE 754 Floating Point Numbers ------------------------------- Any real number represented as a floating point number consists of a sign (s) part (just a bit, negative if set), an exponent (e), and a significant or fractional part (f). If f0 is set and e <> 0 the number is normalized. If f0 is clear and e = 0, the number is denormalized or subnormal (single < 1.17549435E-38 = 00800000H; double < 2.2250738585072014E-308 = 0010000000000000H). If both e and f are 0, the number is 0. Any + single number with e > 07F7, i.e. e = 07F8, or if - > 0FF7, i.e. 0FF8 is not a number, i.e. a NaN. Double the limits are 07FE and 0FFE, i.e. e of a double NaN is 07FF and 0FFF, respectively. A NaN with bit f0 (see V) set is signaling, otherwise it is quiet. A NaN with f = 0 is called INF; ABS(INF) > ABS(MAX(REAL)) or ABS(MAX(LONGREAL)). A NaN with f1..fn set contains a code. Interpreted are only 8 bits, i.e. f7..f14 to define a code. Thus there are 255 different NaN's [NaN(001).. NaN(255)] apart from INF (+INF and -INF are not NaNs in the strict sense). 98.. bits in number pattern, most significant bits to the left, least significant to the right b0.. bytes in number pattern w0.. words in number pattern V - denotes bit which defines signaling status if number is NaN (clear ~ signaling, set ~ quiet) nn - NaN code (bit f8..f15 or f37..f44, respectively) Extra bits f0..f7 and f16..f21 or f0..f36 and f45..f50, respectively, are ignored by this implementation while coding NaN's (in accordance with SANE practices), however some floating processing (e.g. on Sun) make use of all fraction bits to encode a NaN srix. hardware independent index to 16-bit word . in a single real SRIX. hardware independent index to 32-bit word . in a single real drix. hardware independent index to 16-bit word . in a double real DRIX. hardware independent index to 32-bit word . in a double real REAL = IEEE 754/854 single precision real numbers, SIZE = 4 bytes, 32 bits bits s = 1, e = 8 (e0-e7), f = 23 (f0-f22) (without hidden bit, i.e. p = 24) 3 2 1 1 0987654 3 2109876 54321098 76543210 s e V f _|_______ _|_______ ________ ________| [-3.40282347E+38..+3.40282347E+38] b0 ^ b1 ^ b2 ^ b3 [ 0FF7FFFFFH .. 07F7FFFFFH ] ........w0......... .......w1........ nnnnnnnn ......srix0........ ......srix1...... .................SRIX0............... LONGREAL = IEEE 754/854 double precision real numbers, SIZE = 8 bytes, 64 bits bits s = 1, e = 11 (e0-e10), f = 52 (f0-f51) (without hidden bit, i.e. p = 53) 6 5 4 3 2 1 3 2109876 5432 1098 76543210 98765432 10987654 32109876 54321098 76543210 s e V f _|_______ ____|____ ________ ________ ________ ________ ________ ________| b0 ^ b1 ^ b2 ^ b3 ^ b4 ^ b5 ^ b6 ^ b7 ........w0......... .......w1........ .......w2........ .......w3........ nnnnn nnn ......drix0........ .....drix1....... .....drix2....... .....drix3....... .................DRIX0............... ...............DRIX1............... [-1.7976931348623157E+308..+1.7976931348623157E+308] [ 0FFEFFFFFFFFFFFFFH .. 07FEFFFFFFFFFFFFFH ] Note: Actual memory mapping, i.e. sequence by which bytes, words (16 bit), double words (32 bit), are arranged in memory, differ in a hardware dependent fashion. Intel CPU's use e.g. little-endian memory mapping, SPARC (Sun) or Power PC (Macintosh) don't. Thus array mapped indices of the words w0, w1, w2, w3 differ accordingly (see types SingleReal and DoubleReal). Furthermore, for many real constant assignments, e.g. in binary format, it is widespread to make a double word interpretation of the real number, which calls for an alternative set of indices. Again, this time in case of LONGREAL numbers only, the indices vary depending on memory mapping technique. Consequently, this module uses internally variable array indices, which adjust themselves to the hardware on which this code is executed. Convention: Lower case indices names, i.e. SR (single real): srix0 ~ w0, srix1 ~ w1, DR (double real): drix0 ~ w0, drix1 ~ w1, drix2 ~ w2, drix3 ~ w3 are used for single word interpretations. Capitalized indices names, i.e. SR (single real): SRIX0 ~ {w0,w1}, DR (double real): DRIX0 ~ {w0,w1}, DRIX1 ~ {w2,w3} are used for double word (32 bit) interpretations). The advantage is that these indices can be used independently of underlying hardware to access a particular part of a real number by interpreting the indices as indexing the logical structure of real numbers as described above. E.g. the sign bit can be expected to be always in singleReal.aSW[srix0] or doubleReal.aSW[drix0], respectively, and singleReal.aDW[SRIX0] or doubleReal.aDW[DRIX0], respectively. *) PROCEDURE REALMachineEpsilon(): REAL; (* REALMachineEpsilon returns the actual machine epsilon for the Modula-2 elementary data type REAL used to implement the single precision floating point numbers according to the IEEE 754 standard for floating point arithmetics. The returned machine epsilon characterizes the accuracy of single precision calculations as computed (roundoff to one) on the current platform. Note that the machine epsilon is an important value, since it represents the theoretically to be expected upper limit for relative errors of floating point computations (e.g. Goldberg, 1991). Theoretically this value should be = 2^-p in round to nearest (even) mode (on most platforms the default), where p is the number of binary digits used for the significand (mantissa) including the hidden bit, i.e. p is 24 = 23 (stored) + 1 (hidden) (e.g. Goldberg, 1991, Go59, p.8 equation 2 and p. 16). However, since some platforms such as the PPC Macintosh platform may use internally more than p bits when evaluating floating point expressions, the value returned by REALMachineEpsilon may actually deviate from the theoretically expected one, i.e. be considerably smaller. For instance on a PPC using SANE (Standard Apple Numerics Environment) using the internal IEEE 754 extended 80 bit format, p = 63+1(hidden). Thus the machine epsilon can get as low as 2^64 = 5.42101086242752217003726400434970855712890625E-20 (~REAL(01F800000H) = DRFromHex(03BF00000H,00000000H)). Thus the rules of thump are: For maximum precision, use the value returned by REALMachineEpsilon, for maximum portability among platforms use routine TheoreticalREALMachineEpsilon (see below). Note the values returned by any of those routines differ with current rounding mode (RoundDir, see also routine SetRound). By default the rounding mode is set to toNearest (round to even mode). Make sure you set the rounding mode consistently with the intended use of the obtained machine epsilon as obtained from this routine. *) PROCEDURE TheoreticalREALMachineEpsilon(): REAL; (* TheoreticalREALMachineEpsilon() returns in the default rounding mode toNearest (even) 2^-p = 2^-24 = 5.9604644775390625E-8 (~REAL(33800000H) = DRFromHex(3E700000, 00000000H)). In all other rounding modes the machine epsilon is twice as big, i.e. 2*2^-p = 1.1920928955078125E-7 (~REAL(34000000H) = DRFromHex(3E800000, 00000000H)); for extended format with p = 63+1: 1.08420217248550443400745280086994171142578125E-19 (~REAL(020000000H) = DRFromHex(03C000000H, 00000000H)). *) PROCEDURE LONGREALMachineEpsilon(): LONGREAL; (* For general explanations see above. For the double precision floating point numbers of Modula-2 elementary data type LONGREAL p is 53 = 52 (stored) +1 (hidden) in accordance with the IEEE 754 standard for floating point arithmetics. Note similar to what was explained above for type REAL, routine LONGREALMachineEpsilon may return a very samll value if internally more bits are used than the double precision floating point format requires, e.g. if the extended format using 80 bits is used. It should also be noted that in such cases, in particular when SANE (see above) is used, LONGREALMachineEpsilon can return precisely the same small value as REALMachineEpsilon. *) PROCEDURE TheoreticalLONGREALMachineEpsilon(): LONGREAL; (* TheoreticalLONGREALMachineEpsilon returns in the default rounding mode toNearest (even) 2^-p = 2^-53 = 1.1102230246251565404236316680908203125E-16 (~ DRFromHex(3CA00000H, 00000000H) = REAL(25000000H)). In towardZero rounding mode the machine epsilon is twice as big, i.e. 2*2^-p = 2.220446049250313080847263336181640625E-16 (~ DRFromHex(3CB00000H, 00000000H) = REAL(25800000H)). *) PROCEDURE DRFromHex(dw0,dw1: LONGCARD): LONGREAL; (* Define a LONGREAL value precisely resorting to a hexadecimal notation given in big endian form. E.g. Pi/4 = DRFromHex(03FE921FBH, 054442D18H) or e = DRFromHex(04005BF0AH, 08B145769H). Any decimal binary conversion problems or internal varying precision problems are avoided, since completely circumvented by this coercion technique. This routine is aware of endianess and does the coercion correctly regardless what platform it is currently running on! For single percision floating point numbers of type REAL you can use type coercion such as REAL(hexnumber) and you obtain correct results regardless of endianness. Use procedure BytesToHexString from module DMConversions to convert REAL or LONGREAL numbers to their hex string counterpart. Note this routine is only provided for numerical experimentation to ensure precise real values, which are not always obtainable when using real constants. E.g. 0.1 is not exactly representable and when testing floating point arithmetics for accuracy it may become necessary to resort to assign floating point numbers using this routine. *) PROCEDURE MachineIsBigEndian(): BOOLEAN; (* Returns whether host is big endian (needed only for byte-wise conversion of integers). IMPLEMENTATION RESTRICTION: Middle endian is not detected and assumed not to be present, i.e. if this routine returns FALSE, this means littel endian has been detected. *) END DMFloatEnv.
|
||
|
|
|