Archive

For February, 2011

Project Euler: database notes

No Comments

This is a followup to an item in the last post. It’s all general pointers and nothing’s specific to any question.

Know your tools. Using a relational database on a home computer to hold billions of values in a OEIS sequence is probably not a good idea. My gut instinct is that you should be able to use 10k records in an in-memory database (e.g., h2) and up to 100k records in a ‘real’ database (e.g., oracle, sql server, postgresql or mysql). The in-memory estimate is probably too small since these are small records so you won’t chew up much memory. These aren’t hard limits, of course, but just my instinct of where you will start to need to pay attention to database issues beyond ‘how do I get to my data?’.

A second issue is how you store the data. Simple sequences are trivial to store. Collections, e.g., sets of amicable numbers, should be stored using a standard one-to-many approach instead of creating one row with a bunch of columns. It’s more work up front but much more useful later.

Don’t do individual inserts. An individual query may only take 1-2 ms but if you’re adding hundreds of thousands or even millions of records that time adds up. Every serious database allows you to do a bulk load from an external file, e.g., the “COPY” instruction in postgresql. Use it.

Be careful here. There are tools like dbunit that allow you to specify bulk data but as I recall it’s ultimately implemented as individual insert statements. You want to use your database’s tools, not a third party that might not be as efficient.

Use bulk load format compatable with Excel et al. Yes, you can read the file directly. But it’s a pain and you might miss what’s tripping up your database. E.g., are you properly escaping quotes?

In practice this means a tab- or comma-separated values (CSV) file. This is such a widely used format that it should be a safe bet for all databases.

Something to keep in mind is that there are file limitations. E.g., Excel files are limited to a bit over 65k records. Other databases may restrict the number of records in a bulk insert, or they may have a limitation on the total number of bytes. You may need to split your bulk inserts into multiple files.

Create solid file creation applications. By that I mean that the algorithms should be well-documented and self-validating. E.g., if you know that every third value should be even then you should verify that before writing the file. It’s a good way to document what you know in addition to being a Really Good Idea in general.

Capture what you learn. A lot of PE problems will create (or at least use) new OEIS sequences. Add them to your database.

Follow conventions. This is easy – if your data is something with an OEIS sequence then make sure your sequence matches. This is mostly knowing what value to start with, e.g., starting at index 2.

Finally, don’t be afraid to use external resources. You definitely want to do the PE problem first. Sometimes there’s only a few more values, e.g., the list of perfect numbers, and it makes sense to just list them explicitly after you’ve finished the PE problem. You never know when you’ll need those values later.


Implementation note: the H2 database (among others?) supports read-only tables loaded directly from CSV files within a zip file. This is a clean way to support an in-memory database – you’ll need to pregenerate the zip file but once you have it you can quickly create an in-memory relational database containing the OEIS sequences and more without requiring anything more than one jar file and one external data file.


UPDATE 2/28/2011: I goofed. The legacy format for Excel (.xls) was limited to 65k records but the new format (.xlsx – Excel 2007) has a limit of 1,048,576 rows. I was thrown off since the Java library I use most often only supports the older format so I’m acutely aware of its limitations.

Project Euler: basic number theory functions

No Comments

This is another Project Euler-inspired post. Everything above the break is covered in problem descriptions but not put in a single place (as far as I know).  Below the break I’ll discuss some implementation details.

Much of number theory concerns the studies of primes and divisors of composite numbers. The PE problems start to scratch the surface of this field (no pun intended!) but I feel that the problem descriptions don’t give you a broader context to see how seemingly unrelated problems are connected.

Here’s a brief list of the various functions with solutions for the composite numbers 28 and 3150. The links go to sites with significantly more information and should not be followed unless you want to peek “into the back of the book”. My usual approach is to solve the problem myself first, even if I can’t do any better than brute force, then follow the links for insights into how the problem relates to others and for more efficient implementations.

function x = 28 x = 3150
prime factorization (A052369 for largest prime factor) 22 * 7 2 * 32 * 52 * 7
ω(n) number of unique prime factors (A001221) 2 4
Ω(n) total number of factors (A001222) 3 6
gpf(n) greatest prime factor (A006530) 7 7
proper divisors 1, 2, 4, 7, 14 1, 2, 3, 5, 6, 7, 9, 10, 14, 15, 18, 21, 25, 30, 35, 42, 45, 50, 63, 70, 75, 90, 105, 126, 150, 175, 210, 225, 315, 350, 450, 525, 630, 1050, 1575
d(n), σ0(n), τ(n) number of divisors (including n) (A000005). 6 36
σ(n), σ1(n) divisor function: sum of divisors (including n) (A000203) 56 9672
s(n) ≡ σ(n) – n restricted divisor function: sum of proper divisors (A001065) 28 6522
Σ(n) ≡ σ(n)/n, σ-1(n) abundancy (A017665 and A017666) 2 1612/525
σ() – 2n abundance (A033880) 0 3372
φ(n) Euler’s totient function (A000010)
μ(n) Möbius function (A008683)
π(n) prime counting function (A000720)

(The sequence numbers come from Online Encyclopedia of Integer Sequences (OEIS) and are also known as Sloane’s numbers. The OEIS site has additional information about the sequences. You can also use this number when searching for more information.)

Here are the definitions of various functions seen on PE problems. The title are links to Wolfram MathWorld pages with a lot more information. You should be careful before following that link since it may provide more information than you want.

Perfect Numbers

A perfect number is one where s(n) = n. It is deficient (A005100) if s(n) < n and abundant (A005101) if s(n) > n.

Related terms are almost perfect number (A00079) and quasiperfect number (none known).

Hyperfect numbers are a generalization of perfect numbers.

Aspiring numbers are not perfect numbers but lead to one by following the aliquot sequence (A063769).

Amicable Pairs and Chains

A pair of numbers is called an amicable pair if s(n) = m and s(m) = n and n &noteq; m. An amicable chain goes through more additional terms.

Some people also use the term “friendly pair” for these numbers but this usage is discouraged since there is a different (but related) definition for them. See following item.

Amicable pairs can also be defined in terms of the aliquot sequence.

A related term is quasiamicable pairs (also bethrothed numbers and reduced amicable pairs). (A005276)

Friendly Numbers

A pair of numbers is called a friendly pair if &Sigma(n) = m and Sigma(m) = n and n &noteq; m. An amicable chain goes through more additional terms.

A solitary number is a number that does not have any friends.

Socialable Numbers

Quoting mathworld: Sociable numbers are numbers that result in a periodic aliquot sequence, where an aliquot sequence is the sequence of numbers obtained by repeatedly applying the restricted divisor function s(n) = σ(n) – n to n and σ(n) is the usual divisor function.

If the period of the aliquot cycle is 1, the number is called a perfect number. If the period is 2, the two numbers are called an amicable pair. In general, if the period is t >= 3, the number is called sociable of order . For example, 1264460 is a sociable number of order four since its aliquot sequence is 1264460, 1547860, 1727636, 1305184, 1264460, ….

Powerful Number

Quoting mathworld: An integer m such that if p | m, then p2 | m, is called a powerful number. There are an infinite number of powerful numbers, and the first few are 1, 4, 8, 9, 16, 25, 27, 32, 36, 49, … (Sloane’s A001694). Powerful numbers are always of the form a2b3 for a, b >= 1.

Archilles Number

Quoting mathworld: An Achilles number is a positive integer that is powerful (in the sense that each prime factor occurs with exponent greater than one) but imperfect (in the sense that the number is not a perfect power). The first few are 72, 108, 200, 288, 392, 432, … (Sloane’s A052486).

Narcissistic Number

Quoting mathworld: An n-digit number that is the sum of the nth powers of its digits is called an n-narcissistic number. It is also sometimes known as an Armstrong number, perfect digital invariant (Madachy 1979), or plus perfect number. Narcissistic numbers therefore generalize these “unappealing” numbers to other powers (Madachy 1979, p. 164). (Sloane’s A005188)

Prime factorization

It is clear that it is the prime factorization of a number is an extremely important piece of information. Below the break I’ll discuss practical details of two aproaches: the Sieve of Eratosthenes and the Pollard rho factorization method and its variant Brent’s factorization method.

Continue reading this entry »

Project Euler: Fibonacci Sequences

No Comments

Project Euler (http://projecteuler.net/) is an interesting site for somebody who wants to develop their mathematical skills or keep them sharp. It’s really cool that (at the introductory level) there’s usually a brute force approach in a classic procedural language like Java, a cleaner brute force approach in a functional language, and a much cleaner solution if you use a bit of math. You can learn a lot by exploring the last approach.

There’s one problem though – there’s not enough information to know what to search for if you don’t already know what to search for. You can get that information in the comments from others once you solve the problem via brute force methods but that requires you to solve the problem via brute force solutions. That’s not always practical.

Google is another approach but the links often give you the answer without providing insight into how to get there. That’s great if you only want to answer the question, not so great if you want to learn a bit more math.

Some sites provide a bit more insight into the problem. I’ll try to add my own feeble contributions. There will be a brief insight above the break and a fuller discussion below the break. Unfortunately this won’t help much if you’re reading an RSS feed. Sorry.

The first issue is Fibonacci sequences.  That is, slightly rephrased from the Project Euler definition, the sequence defined by:

  1. F(0) = 0
  2. F(1) = 1
  3. F(n) = F(n-1) + F(n-2) where n > 1
F(0) = 0
F(1) = 1
F(n) = F(n-1) + F(n-2) where n > 1

(The other common definition sets the first two values to 1 but the results are equivalent.)

So what insight do we have regarding the sum of even terms?…

Think back to elementary school math.  We  all learned a very simple rule:

  • the sum of two even numbers is even.
  • the sum of an even and odd number is odd.
  • the sum of two odd numbers is even.

This means that a sequence of sums like the Fibonacci can have only two possible forms:

  • all even numbers
  • one even number, two odd numbers, one even number, repeat….

If follows that a sum of even Fibonacci terms will only need to consider every third value.

Continue reading this entry »

Blue Taste Theme created by Jabox