Recommendations (2004)

- There are more factorial algorithms than you learned at school -- or at university - because your professors do not know them either, at least before they have visited this site. ;-))
- Do not use the 'naive factorial'! This ubiquitous dull algorithm. Though immensely popular in Computer Science to demonstrate the notion of a recursive function, it is by no means a sane way to compute n!.
- If you work with large factorials, use the
Stirling approximation.
This formula is perhaps all you need in real life. But here we are concerned
with the
*exact*computation of n!. - What factorial algorithm should we choose for daily use with moderate values of n? Or for inclusion in a software library with potentially large values of n? As a starting point, let us look at the performance ranking for n = 2048007 of the top 10 algorithms relative to the 'PrimeSchoenhage' algorithm.

Algorithm | Rank |

PrimeSwingLuschny | 0,9 |

PrimeVardi | 0,9 |

PrimeSchoenhage | 1,0 |

PrimeSwingList | 1,0 |

PrimeSwingCache | 1,7 |

SplitRecursive | 1,8 |

PrimeLeenstra | 2,1 |

PrimeBorwein | 2,4 |

Hyper | 2,7 |

SquareDifferenceProd | 2,7 |

The smaller the rank the better,

values p<=1 indicate excellent,

values p<=2 indicate good performance.

- Only algorithms with a ranking < 2 qualify for further consideration as candidates for a software implementation. Nevertheless the other ones are certainly worth an algorithmic study and analysis for their own sake!
- All algorithms can be divided into two categories: those which are based on the prime factorization of n! and those which are not. Those which are - we will call them the 'prime algorithms' - are much more complex than the others, which we will call the 'simple algorithms'. The prime algorithms need the support of a prime sieve (a good implementation of the sieve of Eratosthenes will do). Further they are more demanding on resources, since they need quite a bit of memory for allocating the arrays of prime factors.
- Let's look first at the performance of the 'prime algorithms'. These are the fastest known algorithms to compute n!. Thus they should be chosen for inclusion in a software library where potentially large values of n have to be computed. Three algorithms score approximately equal: 'PrimeSwingLuschny', 'PrimeVardi' and 'PrimeSchoenhage'. Although my personal preference is for 'PrimeSwingLuschny' (because of its simplicity and theoretical background), I propose to test all three of them in your computing environment and then choose the most appropriate one.
- Let us consider now the category of 'simple' algorithms, those which do not make use of prime factorization. Here the situation is quite clear: only the algorithm 'SplitRecursive' achieves good results. So this algorithm is clearly the choice to be taken. (Added in 2010: This is no longer true as stated. In any case compare with the new implementation of the simple Swing algorithm which is superior on many systems.) If only moderate values of n (say n < 1000000) are to be computed and a small time penalty (a factor < 2) is tolerable this algorithm might also be considered as an alternative to the more complex prime algorithms in the indicated range.
- Let's add a final word of caution. All the above considerations are valid only if the computations are based on an arithmetical library which provides an asymptotical fast multiplication routine like the Karatsuba multiplication or even better a fft-based multiplication like the Schoenhage multiplication. Such arithmetical libraries are freely available on the internet, look out for them. If, however, you use a dull library which has implemented only the 'school algorithm' for multiplication then the efficiency of all algorithms breaks down dramatically and also their relative ranking might turn out completely different! Regrettably the Java class 'BigInteger' included in the Java standard distribution is one of those dull libraries. >:-( If you compute with Java you had better take the Apfloat library of Mikko Tommila.
- In my computing environment the algorithm 'PrimeSwing(Luschny)' was the fastest in the category 'prime algorithms' and the 'Split(Recursive)' algorithm had no rival in the category 'simple algorithms'. Thus the jury declares:

Heavy Weight Champion |
PrimeSwing |

Light Weight Champion |
SplitRecursive |

- With the arrival of parallel computing on the desktop some of the above comments are to be revisited. The algorithms discussed were adapted to take advantage of the multiple core processor units which became standard in the years 2007/2008. One of the nice observations is that both the PrimeSwing algorithm as well as the SplitRecursive algorithm can be easily and efficiently revised for parallel computing providing improved performance without introducing many of the complexities of today’s concurrent programming models.
- Let us look at the performance of the top 9 algorithms as it stands now. The next table displays the computing time of n! for n = 10,000,000 in seconds on a 2-core machine and the ranking relative to the 'PrimeSwing' algorithm.

N! where N = 10,000,000 | sec | ranking | ||

ParallelPrimeSwing | 57,9 | 0,91 | ||

ParallelPrimeSplit | 59,5 | 0,94 | ||

PrimeSwing | 63,5 | 1,00 | ||

PrimeSchoenhage | 66,5 | 1,05 | ||

PrimeVardi | 65,9 | 1,04 | ||

PrimeSwingList | 66,7 | 1,05 | ||

PrimeSplit | 64,1 | 1,01 | ||

ParallelSplit | 104,1 | 1,64 | ||

Split | 139,2 | 2,19 |

- We see three new algorithms appearing in the list: 'ParallelPrimeSwing', 'ParallelPrimeSplit' and 'ParallelSplit'. And for the first time the author saw the factorial of 10 millions computed in less than one minute -- a personal computing record ;-) Thus the recommendation of today just adds the parallel attribute to the winners of 2004:

Heavy Weight Parallel Champion |
ParallelPrimeSwing |

Light Weight Parallel Champion |
ParallelSplit |

There now are better implementations of the simple Swing algorithm which outperform the Split algorithm on most systems. Look here.