An implementation of the factorial with the prime swing algorithm in
Sage.
For a pure Python implementation you will need additionally a function
*prime_range(a, b)* which returns the prime numbers in the range [a, b).

```
def factorialPS(n):
small_swing = [
1,1,1,3,3,15,5,35,35, 315, 63, 693, 231, 3003, 429, 6435, 6435,
109395,12155,230945,46189,969969,88179,2028117,676039,16900975,
1300075,35102025,5014575,145422675,9694845,300540195,300540195 ]
def swing(n):
if n < 33: return small_swing[n]
sqrtn = n.sqrt()
factors = prime_range(n // 2 + 1, n + 1)
for prime in prime_range(3, sqrtn + 1):
p, q = 1, n
while True:
q //= prime
if q == 0: break
if q & 1 == 1:
p *= prime
if p > 1: factors.append(p)
R = prime_range(sqrtn + 1, n // 3 + 1)
factors += filter(lambda x: (n // x) & 1 == 1, R)
return mul(factors)
def odd_factorial(n):
if n < 2: return 1
return (odd_factorial(n//2)**2)*swing(n)
if n == 0: return 1
if n < 0 : return 0
if n < 20: return mul(range(2, n + 1))
N, bits = n, n
while N != 0:
bits -= N & 1
N >>= 1
return odd_factorial(n)*2**bits
```

The current (0.7.3) SymPy implementation of the factorial function uses the prime swing function and is similar to the above function. However comparing the SymPy implementation (dismantled of the framework) with the above implementation in a Sage worksheet shows:

timeit("sympy_factorial(1000000)", number=10) timeit("factorialPS(1000000)", number=10) 10 loops, best of 3: 2.02 s per loop 10 loops, best of 3: 1.31 s per loop

So what is the main difference? The computation of the product

L_product = R_product = 1 for prime in sieve.primerange(n//2 + 1, n + 1): L_product *= prime for prime in primes: R_product *= prime return L_product*R_product

in the SymPy version is replaced here by a single *mul(factors)*.
The reason why this is more efficient is that an iterated product (a product of the form
*for p in P: r *= p*) is much less efficient than a product of the form *mul(list)*,
provided mul is implemented carefully, for example as a recursive product balancing the factors.

As already said, the implementation and the benchmark shown here were executed in the Sage environment. The same implementation executed with the SymPy functions does not show the speed-up observed with Sage. It remains unclear what the reason is. A standalone implementation of the algorithm in pure Python can be found here. Executed in the Sage environment it gives the following timings:

timeit("factorialPy(1000000)", number=10) 10 loops, best of 3: 1.75 s per loop

The idea of the swing factorial can also be implemented without prime-factorization (by computing the swing numbers by recursion). This leads to a highly sophisticated (albeit short) algorithm which is shown below. Benchmarks show that this is also the fastest algorithm known to compute the factorials without prime-factorization (leaving parallelization apart).

But the most surprising feature of this algorithm is that it makes heavy use of *division*
— something which does not come to ones mind contemplating the factorial in a naive way.

```
def factorialS(n):
smallOddSwing = [
1,1,1,3,3,15,5,35,35, 315, 63, 693, 231, 3003, 429, 6435, 6435,
109395,12155,230945,46189,969969,88179,2028117,676039,16900975,
1300075,35102025,5014575,145422675,9694845,300540195,300540195 ]
smallOddFactorial = [
1,1,1,3,3,15,45,315, 315, 2835, 14175, 155925, 467775, 6081075,
42567525, 638512875, 638512875 ]
def product(m, len):
if len == 1: return m
if len == 2: return m * (m - 2)
hlen = len >> 1
return product(m - hlen * 2, len - hlen) * product(m, hlen)
def oddFactorial(n):
sqrOddFact = Integer(1)
if n < 17:
oddFact = smallOddFactorial[n]
sqrOddFact = smallOddFactorial[n//2]
else:
(sqrOddFact, oldOddFact) = oddFactorial(n//2)
if n < 33:
oddSwing = smallOddSwing[n]
else:
len = (n - 1) // 4
if (n % 4) != 2: len += 1
high = n - ((n + 1) & 1)
oddSwing = product(high, len) // oldOddFact
oddFact = (sqrOddFact**2) * oddSwing
return (oddFact, sqrOddFact)
if n == 0: return 1
if n < 0: return 0
N, bits = n, n
while N != 0:
bits -= N & 1
N >>= 1
F = oddFactorial(n)
return F[0]*2**bits
```

In the same computing environment as above (Sage worksheet) we see this time:

timeit("factorialS(1000000)", number=10) 10 loops, best of 3: 2.31 s per loop

Compare this to SymPy's prime-based implementation in 2.02 s.