Project Euler: basic number theory functionsBear Giles | February 13, 2011
This is another Project Euler-inspired post. Everything above the break is covered in problem descriptions but not put in a single place (as far as I know). Below the break I’ll discuss some implementation details.
Much of number theory concerns the studies of primes and divisors of composite numbers. The PE problems start to scratch the surface of this field (no pun intended!) but I feel that the problem descriptions don’t give you a broader context to see how seemingly unrelated problems are connected.
Here’s a brief list of the various functions with solutions for the composite numbers 28 and 3150. The links go to sites with significantly more information and should not be followed unless you want to peek “into the back of the book”. My usual approach is to solve the problem myself first, even if I can’t do any better than brute force, then follow the links for insights into how the problem relates to others and for more efficient implementations.
|function||x = 28||x = 3150|
|prime factorization (A052369 for largest prime factor)||22 * 7||2 * 32 * 52 * 7|
|ω(n)||number of unique prime factors (A001221)||2||4|
|Ω(n)||total number of factors (A001222)||3||6|
|gpf(n)||greatest prime factor (A006530)||7||7|
|proper divisors||1, 2, 4, 7, 14||1, 2, 3, 5, 6, 7, 9, 10, 14, 15, 18, 21, 25, 30, 35, 42, 45, 50, 63, 70, 75, 90, 105, 126, 150, 175, 210, 225, 315, 350, 450, 525, 630, 1050, 1575|
|d(n), σ0(n), τ(n)||number of divisors (including n) (A000005).||6||36|
|σ(n), σ1(n)||divisor function: sum of divisors (including n) (A000203)||56||9672|
|s(n) ≡ σ(n) – n||restricted divisor function: sum of proper divisors (A001065)||28||6522|
|Σ(n) ≡ σ(n)/n, σ-1(n)||abundancy (A017665 and A017666)||2||1612/525|
|σ() – 2n||abundance (A033880)||0||3372|
|φ(n)||Euler’s totient function (A000010)|
|μ(n)||Möbius function (A008683)|
|π(n)||prime counting function (A000720)|
(The sequence numbers come from Online Encyclopedia of Integer Sequences (OEIS) and are also known as Sloane’s numbers. The OEIS site has additional information about the sequences. You can also use this number when searching for more information.)
Here are the definitions of various functions seen on PE problems. The title are links to Wolfram MathWorld pages with a lot more information. You should be careful before following that link since it may provide more information than you want.
Hyperfect numbers are a generalization of perfect numbers.
Amicable Pairs and Chains
A pair of numbers is called an amicable pair if s(n) = m and s(m) = n and n ¬eq; m. An amicable chain goes through more additional terms.
Some people also use the term “friendly pair” for these numbers but this usage is discouraged since there is a different (but related) definition for them. See following item.
Amicable pairs can also be defined in terms of the aliquot sequence.
A pair of numbers is called a friendly pair if &Sigma(n) = m and Sigma(m) = n and n ¬eq; m. An amicable chain goes through more additional terms.
A solitary number is a number that does not have any friends.
Quoting mathworld: Sociable numbers are numbers that result in a periodic aliquot sequence, where an aliquot sequence is the sequence of numbers obtained by repeatedly applying the restricted divisor function s(n) = σ(n) – n to n and σ(n) is the usual divisor function.
If the period of the aliquot cycle is 1, the number is called a perfect number. If the period is 2, the two numbers are called an amicable pair. In general, if the period is t >= 3, the number is called sociable of order . For example, 1264460 is a sociable number of order four since its aliquot sequence is 1264460, 1547860, 1727636, 1305184, 1264460, ….
Quoting mathworld: An integer m such that if p | m, then p2 | m, is called a powerful number. There are an infinite number of powerful numbers, and the first few are 1, 4, 8, 9, 16, 25, 27, 32, 36, 49, … (Sloane’s A001694). Powerful numbers are always of the form a2b3 for a, b >= 1.
Quoting mathworld: An Achilles number is a positive integer that is powerful (in the sense that each prime factor occurs with exponent greater than one) but imperfect (in the sense that the number is not a perfect power). The first few are 72, 108, 200, 288, 392, 432, … (Sloane’s A052486).
Quoting mathworld: An n-digit number that is the sum of the nth powers of its digits is called an n-narcissistic number. It is also sometimes known as an Armstrong number, perfect digital invariant (Madachy 1979), or plus perfect number. Narcissistic numbers therefore generalize these “unappealing” numbers to other powers (Madachy 1979, p. 164). (Sloane’s A005188)
It is clear that it is the prime factorization of a number is an extremely important piece of information. Below the break I’ll discuss practical details of two aproaches: the Sieve of Eratosthenes and the Pollard rho factorization method and its variant Brent’s factorization method.
The Sieve of Eratosthenes
The Sieve of Eratosthenes does not directly factor numbers. Instead it is a highly efficient mechanism for identifying prime numbers that can then be used in trial division by “small” primes. As always there are tradeoffs – it may be faster to use other techniques than trial division after you’ve tried the first thousand primes. On the other hand even modest computers (with proper software) can quickly calculate the functions given above for over a billion numbers.
The basic sieve is straightforward. Create an array of boolean values at least as large as the largest prime you seek. Set all values to true. Starting at the second position (corresponding to 2), if the value is true then set every n-th following value to false.
That is, at i=2 we see the value is true so we set every second following value (at 4, 6, 8, …) to false.
At i=3 we see the value is true so we set every third following value (at 6, 9, 12, …) to false.
At i=4 we see the value is false so we take no further action.
At i=5 we see the value is true so we set every fifth following value (at 10, 15, 20, …) to false.
You only need to do this for values up to the square root of the size of the array. Any larger composite number would have already been marked off.
There are various optimizations to this algorithm. The most obvious are 1) using bit-mapped integers instead of ‘bool’ values (which are often implemented using a full character), 2) treating 2 as a special case and only storing odd values, and 3) also treating 3 as a special case since all primes greater than 3 are either 1 or 5 modulo 6.
On the other hand we can get more bang for our buck if we store more than a single boolean value.
First option: replace the boolean value with a counter that’s initialized to zero. If a counter is zero then increment every i-th value following it. This will give you ω(i) – the number of distinct prime factors.
Second option: replace the boolean value with an integer initialized to zero. If a value is zero then set every i-th value following it to ‘i’. This will give you the greatest prime factor. You can then quickly factor these composite numbers by doing the division and looking into the array to get the next greatest prime factor. Repeat until you reach the final prime.
One major drawback to the Sieve of Eratosthenes is that you must create a list of a finite size. What if you want a theoretically unlimited list of prime numbers? Euler’s Sieve is an approach that can be used by functional programming languages to create an unlimited list of primes.
Historically people have either rebuilt the sieve on demand or stored it as a simple flat file. Free relational databases such as postgresql and mysql existed but required system administration skills to install and configure.
The advent of memory-based relational databases (with such as sqlite, derby and h2) has changed that since they need, at most, a few files.
Once you have a relational database and a sieve it is straightforward to calculate the sequences listed in the table. Many of the PE problems are then reduced to simple database queries.
Pollard’s factorization method
Trial division by a billion prime numbers can be slow… and will still only guarantee results when all factors are no larger than the biggest prime number in your table. There are better approaches.
The first is Pollard’s rho factorization method, also known as his Monte Carlo factorization method.
This approach is based on the sequence
xn+1 = xn2 + a (mod n)
In fact nearly any polynomial function (with a few exceptions) can be used.
This sequence will eventually fall into a cycle. (Quick proof: it is deterministic and there are a finite number of values, specifically 0 to n-1). This cycle will have a length on the order of sqrt(n) if n is prime, sqrt(p) if p is one of the factors of n. This cycle is can be very long so it’s important to verify that n is prime (e.g., using a probabilistic test) before using this algorithm.
In Pollard’s algorithm you calculate xn and x2n and then determine the greatest common denominator between n and |x2n – xn|. Any value greater than 1 is a factor of n. There is no guarantee that the factor is prime. If the gcd is equal to n then the number is prime. (I’m not sure how you would get to this state.)
This algorithm does not always succeed on composite numbers and must be repeated with a different constant.
Brent’s factorization method and other variants
Brent’s factorization method is similar’s to Pollard’s but chooses different values to compare. It is easiest to refer to the Wikipedia page: http://en.wikipedia.org/wiki/Cycle_detection#Brent.27s_algorithm.
Finally we have the quadratic sieve algorithm. It is not the fastest known algorithm for large numbers but it is much simpler than the faster algorithms and is still the fastest known for numbers under 10^100 or so.
A good project would be implementing this algorithm on a small cloud, e.g., at Amazon or Rackspace.