Previous: Primitives, Up: Engine [Contents][Index]
On RISCs the Gforth engine is very close to optimal; i.e., it is usually impossible to write a significantly faster threaded-code engine.
On register-starved machines like the 386 architecture processors
improvements are possible, because gcc
does not utilize the registers
as well as a human, even with explicit register declarations; e.g., Bernd
Beuster wrote a Forth system fragment in assembly language and hand-tuned it
for the 486; this system is 1.19 times faster on the Sieve benchmark on a
486DX2/66 than Gforth compiled with gcc-2.6.3
with
-DFORCE_REG
. The situation has improved with gcc-2.95 and
gforth-0.4.9; now the most important virtual machine registers fit in real
registers (and we can even afford to use the TOS optimization), resulting in
a speedup of 1.14 on the sieve over the earlier results. And dynamic
superinstructions provide another speedup (but only around a factor 1.2 on
the 486).
The potential advantage of assembly language implementations is not
necessarily realized in complete Forth systems: We compared Gforth-0.5.9
(direct threaded, compiled with gcc-2.95.1
and -DFORCE_REG
)
with Win32Forth 1.2093 (newer versions are reportedly much faster), LMI’s NT
Forth (Beta, May 1994) and Eforth (with and without peephole (aka pinhole)
optimization of the threaded code); all these systems were written in
assembly language. We also compared Gforth with three systems written in C:
PFE-0.9.14 (compiled with gcc-2.6.3
with the default configuration
for Linux: -O2 -fomit-frame-pointer -DUSE_REGS -DUNROLL_NEXT
),
ThisForth Beta (compiled with gcc-2.6.3 -O3 -fomit-frame-pointer
;
ThisForth employs peephole optimization of the threaded code) and TILE
(compiled with make opt
). We benchmarked Gforth, PFE, ThisForth and
TILE on a 486DX2/66 under Linux. Kenneth O’Heskin kindly provided the
results for Win32Forth and NT Forth on a 486DX2/66 with similar memory
performance under Windows NT. Marcel Hendrix ported Eforth to Linux, then
extended it to run the benchmarks, added the peephole optimizer, ran the
benchmarks and reported the results.
We used four small benchmarks: the ubiquitous Sieve; bubble-sorting and matrix multiplication come from the Stanford integer benchmarks and have been translated into Forth by Martin Fraeman; we used the versions included in the TILE Forth package, but with bigger data set sizes; and a recursive Fibonacci number computation for benchmarking calling performance. The following table shows the time taken for the benchmarks scaled by the time taken by Gforth (in other words, it shows the speedup factor that Gforth achieved over the other systems).
relative Win32- NT eforth This- time Gforth Forth Forth eforth +opt PFE Forth TILE sieve 1.00 2.16 1.78 2.16 1.32 2.46 4.96 13.37 bubble 1.00 1.93 2.07 2.18 1.29 2.21 5.70 matmul 1.00 1.92 1.76 1.90 0.96 2.06 5.32 fib 1.00 2.32 2.03 1.86 1.31 2.64 4.55 6.54
You may be quite surprised by the good performance of Gforth when compared
with systems written in assembly language. One important reason for the
disappointing performance of these other systems is probably that they are
not written optimally for the 486 (e.g., they use the lods
instruction). In addition, Win32Forth uses a comfortable, but costly method
for relocating the Forth image: like cforth
, it computes the actual
addresses at run time, resulting in two address computations per NEXT
(see Image File Background).
The speedup of Gforth over PFE, ThisForth and TILE can be easily explained with the self-imposed restriction of the latter systems to standard C, which makes efficient threading impossible (however, the measured implementation of PFE uses a GNU C extension: see Defining Global Register Variables in GNU C Manual). Moreover, current C compilers have a hard time optimizing other aspects of the ThisForth and the TILE source.
The performance of Gforth on 386 architecture processors varies widely with
the version of gcc
used. E.g., gcc-2.5.8
failed to allocate
any of the virtual machine registers into real machine registers by itself
and would not work correctly with explicit register declarations, giving a
significantly slower engine (on a 486DX2/66 running the Sieve) than the one
measured above.
Note that there have been several releases of Win32Forth since the release presented here, so the results presented above may have little predictive value for the performance of Win32Forth today (results for the current release on an i486DX2/66 are welcome).
In Translating Forth to Efficient C by M. Anton Ertl and Martin Maierhofer (presented at EuroForth ’95), an indirect threaded version of Gforth is compared with Win32Forth, NT Forth, PFE, ThisForth, and several native code systems; that version of Gforth is slower on a 486 than the version used here. You can find a newer version of these measurements at https://www.complang.tuwien.ac.at/forth/performance.html. You can find numbers for Gforth on various machines in Benchres.
Previous: Primitives, Up: Engine [Contents][Index]