Sieve of Eratosthenes

Challenge Inside! : Find out where you stand! Try quiz, solve problems & win rewards!

Learn via video course

DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
By Jitender Punia
Free
star4.9
Enrolled: 1000
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
Jitender Punia
Free
4.9
icon_usercirclecheck-01Enrolled: 1000
Start Learning

Overview

Sieve of Eratosthenes is a simple and ancient algorithm used to find the prime numbers up to any given limit. It is one of the most efficient ways to find small prime numbers.

Scope

  • This article tells about the working of the Sieve of Eratosthenes.
  • Optimizations of Sieve of Eratosthenes.
  • Implementation of Sieve of Eratosthenes.
  • Complexity of Sieve of Eratosthenes.




Takeaways

  • Complexity of Sieve of Eratosthenes
    • Time complexity - O(nlog(log(n))O(n * log(log(n))

Introduction to Sieve of Eratosthenes

Prime numbers are on multiple occasions, the primary focus for mathematicians. They play major roles in converting fractions, factoring numbers or algebraic equations etc. Did you know that cryptography also uses the concept of prime numbers? Only with the help of prime numbers, it is ensured that no one, but the intended recipient can see the digits of your credit card number when you enter your information on a website! Well, if primes are so important, then what is an efficient way of finding them?

Sieve of Eratosthenes, or more commonly known as just "Sieve" is an algorithm that can be used for finding all the prime numbers ranging from 1 to n, where n is the number till where you want to find prime numbers.

Now that you have some motivation to learn the Sieve of Eratosthenes, let's take a hit at it!

What is Sieve of Eratosthenes?

As mentioned above, it is an algorithm used for finding all the prime numbers in a given range from 1 to a given n, the number till where you want to find the primes.

What is the traditional method of finding primes though? Well, say you want to find all the prime numbers till 50, you might take every number from 1 to 50 into consideration to check if it is a prime. If the current number is a prime, then you add it to the result. Taking every number into consideration, and then finding its factors to check if it is a prime is definitely a cumbersome process.

Let's see how the Sieve algorithm makes the process much easier.

Working of Sieve of Eratosthenes

In this algorithm, we start with the number 2, and iteratively mark the multiples of each prime number as composite (= not prime). That way, we know which numbers are primes, and which numbers aren't (as they will be marked as not prime). For now, let's not go into the code, let's just observe the algorithm.

Here are the steps that we will follow :

  1. List out all integers starting from 2 till n
  2. Declare a variable p, that stores the value of the current prime number and initialize it to 2.
  3. Cross out all the multiples of the current variable p - 2p, 3p, 4p etc.
  4. Next step is to find the next smallest number that isn't crossed out as it is definitely a prime, and store it in p. We already crossed out the multiples of p, so all the remaining numbers that have not been crossed out, definitely do not have p as a factor. So the next smallest number after p, too, does not have p as a factor and hence is prime, just like p. If there is no such number, then stop. Since we started off with p = 2, the next non-crossed number is 3, so now p = 3.
  5. Repeat from step 3
  6. All the unmarked numbers remaining are the prime numbers in the range 1- n.

Example of Sieve of Eratosthenes

Let's find out all the prime numbers from 1 - 20 using the above method.

First step is to list out all the numbers.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

We start with p = 2, and cancel out all the numbers that are multiples of p.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

We have now cancelled out all the numbers that are multiples of p, and reached the end of the list. We now increment p, to the next smallest unmarked number. p = 3.

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

The next smallest unmarked number is 5, p = 5. We can see that all the multiples of 5 have already been marked. Another observation to make here is that we need not move any further to check for p = 7, p = 11 etc, they are already marked! This means that we have successfully crossed out all the composite ( non-prime) numbers, and all the unmarked numbers are clearly primes. This list of primes is our final result.

Let's take another example and run our algorithm on it. This time, we'll take n to be a larger number, say 50.

2345678910
11121314151617181920
21222324252627282930
31323334353637383940
41424344454647484950

Just like before, let's start with the number two, and cancel out all the multiples of 2.

2345678910
11121314151617181920
21222324252627282930
31323334353637383940
41424344454647484950

Do you see how the numbers that we have to check for primes have reduced to almost half?

We now start crossing out the multiples of the next un crossed number, that is 3.

2345678910
11121314151617181920
21222324252627282930
31323334353637383940
41424344454647484950

The same way, we cross all multiples of 5.

2345678910
11121314151617181920
31323334353637383940
41424344454647484950

The next number that we have to check and cross is 7, and it is the last number. Why? If you observe, after 7, like in our previous example, the next number that we will check will be 11, and all the multiples of 11 have already been crossed out!

2345678910
11121314151617181920
21222324252627282930
31323334353637383940
41424344454647484950

Whatever numbers that now remain uncrossed in our table, are the prime numbers between 1 and 50.

Now that we have understood the complete Sieve of Eratosthenes, let's take a look at some code while reiterating the algorithm.

Pseudocode for Sieve of Eratosthenes

Input :

Integer n, greater than 1

Output :

All prime numbers in the range 1 - n

Explanation:

First, as we read, we must create an array that stores boolean values. If the value in the boolean array at say index 5 is True, it means that "5" is a prime number. If the value at say index 9 is false, it means that "9" is not a prime number. Initially, we mark all values as true, and mark False if we get to know a number isn't prime. We then declare a variable i, which loops over 2 - n (n is the number till where we need to find primes).

Now, for every value of i, if A\[i] is true (i.e. i is prime), we declare another variable j, and initialize it with value = 2 * i (since every value encountered by j will be marked false, we cannot initialize it to i, as that will be marked false too). It then loops from 2*i till n, marking every multiple of i, except i as false.

This process is continued till our i pointer has reached n. Once we're through with that, every composite number, i.e. non prime number has been marked false in our boolean array and every number that has value True in array A, is a prime number. At the end, we simply return all the numbers that have been marked true

Optimizations of Sieve of Eratosthenes

While discussing the examples, and working of the sieve algorithm, we observed multiple things and found ways to optimize the sieve algorithm. Let's summarize them here.

1. Sieving till root

Our initial idea in the sieve of eratosthenes was to cross out every multiple of every uncrossed number in the array. But let's go back to our first example, finding all primes between 1 and 20. After crossing out all the multiples of 2, and 3, we didn't really need to check further!

The last number that we needed to check was 3. Why? The numbers that we will be checking would either be primes, or crossed out because they are multiples of smaller primes because 5 * 5 = 25 > 20! What we mean by that is, if the square of the current number is greater than our limit (the number till which we're finding primes) then we need not cancel out the multiples of that number, as they would have already been cancelled out by previous primes.

Still unclear why?

Let's take a number n. It will have minimum 2 factors, some a and b if it is not prime. We can write it as:

n = a * b

a and b are two factors of n, and here n is represented as the product of it's two factors. For example, 2 and 8 are two factors of 16, and 16 can be represented as 2 * 8.

We know for a fact that for the product of a and b to be equal to n, both the numbers cannot be greater than sqrt(n), i.e. like in our example, two factors of 16 cannot be both greater than 4. Can you find any factors of 16 whose product is 16, but are both greater than 4? No!

If they are, the product a * b would become greater than sqrt(n) * sqrt(n) = n. Whenever we factorize n, at least one of the factors of n must be smaller than sqrt(n). If we are unable to find the factors of n that are less than or equal to the square root of n, then n must be a prime number.

The most common implementation of the algorithm is sieving till root, and toward the end of this article, we're going to implement it in code, in various languages.

2. Sieving by Odd numbers Only

Another observation while we're crossing out numbers in the sieve of eratosthenes, is that after crossing out multiples of 2, the next number is 3, and it's first multiple (6) is already crossed out. We can start from the next multiple which is 9 and not very surprisingly 32 . All the multiples of 3 are 32 + 3, 32 + 6, and so on - 9, 12, 15, 18, 21…

Did you notice that 12, 18, etc are already crossed out? They're the even numbers we crossed out while crossing multiples of 2! We can simply start looking at the odd multiples of 3, starting from 3 squared. Hence we cross out 32 + 6, 32 + 12 and so on!

Let's also look at the example where we were finding primes till n = 50. Now, we know that we only need to Sieve till the root of 50, from the point 1 - Sieving till root that we just discussed above (because the limit number won't have any factors that wouldn't have already been crossed out by smaller primes beyond its square root).

So we know that we only need to cross out primes till probably 8, because 8 * 8 = 64 > 50. We still need to check till 7, because 7 * 7 = 49 which is less than 50. Now, we have a limit, out loop will not run from i = 2 to i = n, but rather i = 2 to i = 8.

To further reduce extra work, just a while back, we established that all the even multiples of any prime number would have already been crossed out when we marked the multiples of 2 as False. So, we only need to now cross out the odd multiples of the primes starting from 3. Of course, the first multiple of 3 is 6, but since 6 is even, we have already crossed it out, and we can start from the next multiple (odd multiple) 9. Again, after 9, 12 is marked, so we move to 15 and mark it False. Once we are done with the process of marking the odd multiples of 3, let's move to the next number that is marked True - 5.

The first multiple of 5 - 10 has already been marked (even multiple). The next multiple, 15 is marked too, and so is 20! So we just need to start marking from 25? 25 is also the square of 5, and yes!

Were you able to observe the pattern here as well? We only need to start marking numbers starting from their square, continuing on to their odd multiples.

So everytime you encounter an uncrossed number, you can simply start crossing all the odd multiples, starting from it's square, as observed above.

How about we look at some pseudocode for this?

Algorithm:

Sieve - with optimizations - sieving till root + sieving by odd numbers

Input: A number n (till where we want to find primes)

Output: All prime numbers in the range 1 - n

3. Memory Consumption and Speed of Operations in the Sieve of Eratosthenes

To store the crossed and uncrossed numbers, we use a boolean vector (list / array of boolean values). We know that a vector of boolean values, uses n bits of memory. Now, what we must know is that a vector of boolean values is not just a regular vector containing a series of boolean values, rather, it is an optimized version of a vector of type T.

How is it optimized? It only consumes N / 8 bytes of memory. In modern times the architecture of processors supports better performance on bytes than on bits, since they do not have direct access to bits. In a large contiguous memory the bits are stored underneath the boolean vector. The bits are accessed in blocks of few bytes and extracted / set using bit operations like masking and shifting. Due to this, when you read and write bits with a boolean vector, a vector of character seems faster.

In simple implementations of the Sieve, using a vector of boolean values is faster. The speed with which you can load the data into cache is the only thing you are limited by and hence the less usage of memory gives a big advantage.

Sounds too technical and difficult? Don't worry, the crux of the story is that a boolean vector is about 1.4 to 1.7 times faster and more efficient than a character vector.

4. Segmented Sieve

Why do we need another kind of sieve? Well, to begin with, even though the simple sieve(sieve of eratosthenes) looks good, it has a couple problems. Let's consider 'n' to be extremely large.

  1. The boolean array of size 'n' might not fit in the memory
  2. Since the algorithm manipulates only single elements, it "walks" along the memory many times. Hence making it non cache friendly.

So what is the idea of Segmented Sieve?

As the name suggests, "segmented", we divide the range of numbers into different segments, or say a certain size and then find out the primes in each segment one by one using a simple sieve. For example, we have numbers from 1 - 20, and we want to find primes in this range. What we will do is, divide the complete range into small segments (to solve the memory and cache problem). Say we divide it into 4 segments of 5 numbers each - 1 to 5, 6 to 10, 11 to 15 and 16 - 20. This way, in small segments, it will be easier to find the primes.

Back to when we have larger numbers, initially, in the first segment that we cut out, we use the simple sieve to find all primes from 1 to sqrt(n) instead of taking really small segments like 5 numbers. (The size of this segment is all the numbers between 1 and sqrt(n)).

Let's take it step by step on the first segment (all numbers between 1 and sqrt(n)):

  1. We create an array of primes[], and store the values of all the prime numbers between 1 and sqrt(n), by making the use of the simple sieve algorithm, except our limit number is not n, it is sqrt(n) because we're only finding primes of one segment.
  2. Since we need primes from 1 - n, we divide the numbers into segments that are almost of size sqrt(n). (our first one was 1 - sqrt(n), the next will be sqrt(n) - 2*sqrt(n) because in every segment we must have sqrt(n) number of numbers).
  3. For every segment, we call the smallest number = low, and the largest number in the segment = high. Follow the steps for every segment with low and high: a. We create an array mark\[high - low + 1]. The space required for this would be just O(k) where k is the number of elements in that segment. b. Remember the primes we found in step 1? We iterate through those, and mark the multiples of those primes in the segment from low to high.

So basically, we have been performing the basic sieve, except the only difference is that we do it in segments, or parts so as to solve the memory problem.

Our problem with the sieve of Eratosthenes algorithm was the O(n) space that was used, while in the segmented sieve, we only use O(sqrt(n)) space because the only values stored in every array are of size - sqrt(n) i.e. we are storing only sqrt(n) number of elements in the array. However, the time complexity remains unchanged because the segmented sieve does all the same operations as a regular sieve, and hence it's time complexity is the same as a simple sieve. Time complexity of the sieve is discussed in further detail a few sections below.

Let's look at some pseudocode for the segmented sieve. We have already written the pseudocode for the simple sieve algorithm, hence, the function "simple sieve" in the following code is just that.

Algorithm: Segmented Sieve

Input: As mentioned above, the input is a number "low" and a number "high" which denote the start and end points of the current segment in consideration.

Output: All primes in range low - high (all the prime numbers in current segment)

And that's it!

Code for Sieve of Eratosthenes

In the following codes, we have used the optimizations of the sieve of eratosthenes, by only sieving till the root, and crossing out multiples of numbers starting from their squares. All codes here are derived from the pseudocode written under "Sieving by odd numbers", so if you feel like you haven't understood the codes below, go ahead and re-read the sieve optimizations.

C++

Python :

Take a look at some Java code for reference :

Java

C#

JavaScript

Php

Time Complexity

The time complexity of the Sieve of Eratosthenes algorithm is O(nlog log n). How? Let's analyze it!

To go further, let's take a number n, so that we can find all the prime numbers between 1 and n. According to the simple Sieve algorithm, we must cross out the multiples of every prime number we discover starting from 2.

First things first, we are assuming that the time complexity of marking any number as composite (not prime) is constant, i.e. O(1). So if we calculate the number of times the loop runs, we can say it is : n / 2 + n / 3 + n / 5 + n / 7 + ... p

Here in the above pattern, n/2 = all the numbers in range 1 - n divisible by 2, n/3 = all the numbers in range 1 - n divisible by 3 and so on till the last prime number in the range 1 - n that is p. Summing up all the multiples of the prime numbers is the total number of times the loop runs.

From this above equation, we can take 'n' common.

n * (1 / 2 + 1 / 3 + 1 / 5 + 1 / 7 + ... p)

Do you see the pattern in the fractions? It's a harmonic progression of the primes. To understand more about harmonic progression of the primes, and the rest of the mathematics of the proof, refer to the Harmonic series and Taylor series expansion. If you wouldn't like to do so, don't worry! Proof of the time complexity of the Sieve algorithm is optional.

Coming back to our equation, this is the result of the harmonic progression for the sum of primes.

Once we place this back to our original equation, we get:

O(nlog(log(n))O(n * log(log(n))

There! You're now a pro at the sieve!

Conclusion

  • The simple sieve of eratosthenes is an algorithm that is used to find prime numbers in the range 1 to a given n.
  • In the sieve of Eratosthenes algorithm, we maintain a boolean vector of numbers from 1 - n, and mark composite numbers as False.
    • This is done by taking the smallest numbers starting from 2, and then marking it's multiples as False (not prime) and so on.
  • There are 3 optimizations to the simple sieve of Eratosthenes algorithm:
    • Sieving till the root
    • Sieving by Odd numbers
    • Segmented Sieve
  • Time complexity of the simple sieve of Eratosthenes is O(nlog(log(n)))O(n*log(log(n))) while that of the segmented sieve is the same, as segmented sieve essentially aims to solve the memory problem created by the simple sieve of Eratosthenes.
  • The space complexity of the Simple Sieve algorithm is O(n) and segmented sieve reduces it to O(sqrt(n)).
  • The sieve algorithm is used in number theory and very commonly in cryptography.