Skip to main content

Notes: 38 Cache, Sieve

·2 mins

Cache #

On my old laptop:

  • 4 cores
  • Each core has 64kB L1 cache + 256kB L2 cache
  • All cores share 8MB of L3 cache

Why am I mentioning this today?

  • Accessing data in cache is much faster than accessing data not in cache.
  • The cache stores the stuff that has been accessed most recently.
  • Smaller caches are faster than big ones.
  • A BitSet with one bit for every integer up to 100 million takes 12 megs of RAM.

So let’s consider two access patterns:

  • For each prime, mark each multiple.
    • This will scan all 12MB sequentially once for each prime.
    • This is just too big for the L3 cache, so no bits you’re marking will ever be in cache.
  • For every block of a million integers, go through every prime and mark every integer in that block.
    • A million integers takes a million bits, or 125kB - that fits in L2 cache.
    • It’s plausible for L2 cache to be 20 times faster than main memory.

For example, my old laptop had these cache specs:

    Main Memory:
        Size: 16GB of DDR3 2133
        Latency: 60ns = 240 cycles
        Throughput: 20 GB/s
    L3 Cache:
        Size: 8 MB (shared)
        Latency: 11ns = 44 cycles
        Throughput: > 200 GB/s
    L2 Cache:
        Size: 256 KB (per core)
        Latency: 3ns = 12 cycles
        Throughput: > 400 GB/s
    L1 Data Cache:
        Size: 32k (per core)
        Latency: 1ns = 4 cycles
        Throughput: > 1000 GB/s

Sieve of Eratosthenes #

https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html

Let’s build this as a sequential program.

  • BitSet means one bit per integer.
  • Sieve means marking every composite number.
  • This requires marking multiples of every prime up to sqrt(n)