- There are more factorial algorithms than you learned at school
-- or at university - because your professors do not know them either, at least
before they have visited this site. ;-))
- Do not use the 'naive factorial'! This ubiquitous dull algorithm.
Though immensely popular in Computer Science to demonstrate the notion of a
recursive function, it is by no means a sane way to compute n!.
- If you work with large factorials, use the
This formula is perhaps all you need in real life. But here we are concerned
with the exact computation of n!.
- What factorial algorithm should we choose for daily use with
moderate values of n? Or for inclusion in a software library with potentially
large values of n? As a starting point, let us look at the performance ranking
for n = 2048007 of the top 10 algorithms relative to the 'PrimeSchoenhage' algorithm.
Performance-Profile for n=2048007!
The smaller the rank the better,
values p<=1 indicate excellent,
values p<=2 indicate good performance.
- Only algorithms with a ranking < 2 qualify for further consideration
as candidates for a software implementation. Nevertheless the other ones
are certainly worth an algorithmic study and analysis for their own sake!
- All algorithms can be divided into two categories: those
which are based on the prime factorization of n! and those which are not. Those
which are - we will call them the 'prime algorithms' - are much more complex
than the others, which we will call the 'simple algorithms'. The prime algorithms
need the support of a prime sieve (a good implementation of the sieve of Eratosthenes
will do). Further they are more demanding on resources, since they need quite
a bit of memory for allocating the arrays of prime factors.
- Let's look first at the performance of the 'prime algorithms'.
These are the fastest known algorithms to compute n!. Thus they should be chosen
for inclusion in a software library where potentially large values of n have
to be computed. Three algorithms score approximately equal: 'PrimeSwingLuschny',
'PrimeVardi' and 'PrimeSchoenhage'. Although my personal preference is
for 'PrimeSwingLuschny' (because of its simplicity and theoretical background),
I propose to test all three of them in your computing environment and then choose the
most appropriate one.
- Let us consider now the category of 'simple' algorithms,
those which do not make use of prime factorization. Here the situation is quite
clear: only the algorithm 'SplitRecursive' achieves good results. So this algorithm
is clearly the choice to be taken. (Added
in 2010: This is no longer true as
stated. In any case compare with the new implementation of the simple Swing
algorithm which is superior on many systems.) If only moderate values of n (say n < 1000000)
are to be computed and a small time penalty (a factor < 2) is tolerable this
algorithm might also be considered as an alternative to the more complex prime
algorithms in the indicated range.
- Let's add a final word of caution. All the above considerations
are valid only if the computations are based on an arithmetical library which
provides an asymptotical fast multiplication routine like the Karatsuba multiplication
or even better a fft-based multiplication like the Schoenhage multiplication.
Such arithmetical libraries are freely available on the internet, look out for
them. If, however, you use a dull library which has implemented only the 'school
algorithm' for multiplication then the efficiency of all algorithms breaks down
dramatically and also their relative ranking might turn out completely different!
Regrettably the Java class 'BigInteger' included in the Java standard distribution
is one of those dull libraries. >:-( If you compute with Java you
had better take the Apfloat library of Mikko Tommila.
- In my computing environment the algorithm 'PrimeSwing(Luschny)'
was the fastest in the category 'prime algorithms' and the 'Split(Recursive)'
algorithm had no rival in the category 'simple algorithms'. Thus the jury declares:
Tempora mutantur and conclusions and recommendations also... (Postscript 2008)
- With the arrival of parallel computing on the desktop some
of the above comments are to be revisited. The algorithms discussed were adapted
to take advantage of the multiple core processor units which became standard
in the years 2007/2008. One of the nice observations is that both the PrimeSwing
algorithm as well as the SplitRecursive algorithm can be easily and efficiently
revised for parallel computing providing improved performance without introducing
many of the complexities of today’s concurrent programming models.
- Let us look at the performance of the top 9 algorithms as
it stands now. The next table displays the computing time of n! for n = 10,000,000
in seconds on a 2-core machine and the ranking relative to the 'PrimeSwing'
|N! where N = 10,000,000
- We see three new algorithms appearing in the list: 'ParallelPrimeSwing',
'ParallelPrimeSplit' and 'ParallelSplit'. And for the
first time the author saw the factorial of 10 millions computed in less than
one minute -- a personal computing record ;-) Thus the recommendation of today
just adds the parallel attribute to the winners of 2004:
There now are better implementations of the simple Swing algorithm which outperform
the Split algorithm on most systems. Look here.
[top of page (2004 recommendations)]