Although any integer number has a unique decomposition into a product
of primes, finding the prime factors is believed to be a hard problem.
In fact, the security of our online transactions rests on the
assumption that factoring integers with a thousand or more digits is
practically impossible. This assumption has been challenged in 1995
when Peter Shor proposed a polynomial-time quantum algorithm for the
factoring problem. Shor’s algorithm is arguably the most dramatic
example of how the paradigm of quantum computing changed our
perception of which problems should be considered tractable. In this
section we briefly summarize some basic facts about factoring,
highlight main ingredients of the Shor’s algorithm, and illustrate how
it works using a toy factoring problem.

Suppose our task is to factor an integer \(N\) with \(d\) decimal
digits. The brute force algorithm goes through all primes \(p\) up to
\(\sqrt{N}\) and checks whether \(p\) divides \(N\). In the worst case,
this would take time roughly \(\sqrt{N}\) which is exponential in the
number of digits \(d\). A more efficient algorithm known as the
quadratic sieve attempts to construct integers \(a,b\) such that
\(a^2-b^2\) is a multiple of \(N\). Once such \(a,b\) are found, one
checks whether \(a\pm b\) have common factors with \(N\). The
quadratic sieve method has asymptotic runtime exponential in
\(\sqrt{d}\). The most efficient classical factoring algorithm known as
general number field sieve achieves an asymptotic runtime exponential in
\(d^{1/3}\). The exponential runtime scaling limits applicability of
the classical factoring algorithms to numbers with a few hundred digits
with the world record being \(d=232\) (which took roughly 2,000 CPU
years). In contrast, Shor’s factoring algorithm has runtime
*polynomial* in \(d\). The version of the algorithm described below due
to Alexey Kitaev requires roughly \(10d\) qubits and has runtime
roughly \(d^3\).

** **

It has been known to mathematicians since 1970’s that factoring becomes easy if one can solve another hard problem: find a period of the modular exponential function. The period finding problem is defined as follows. Given integers \(N\) and \(a\), find the smallest positive integer \(r\) such that \(a^r-1\) is a multiple of \(N\). The number \(r\) is called the period of \(a\) modulo \(N\). Recall that in modular arithmetics the remainder of a division \(a/N\) is called the value of \(a\) modulo \(N\) and denoted \(a\pmod{N}\). For example, \(1=16=91 \pmod{15}\). Thus the period of \(a\) modulo \(N\) is the smallest positive integer \(r\) such that \(a^r=1{\pmod N}\). For example, suppose \(N=15\) and \(a=7\). Then

that is, \(7\) has period \(4\) modulo \(15\). Note that computing the higher powers of \(7\) would give rise to a periodic sequence: \(7^{x+4}=7^x\pmod{15}\) for any integer \(x\). Thus \(r=4\) is the period of the modular exponential function \(7^x\). In general the period finding problem is well-defined if \(N\) and \(a\) are co-prime (have no common factors).

Assume for a moment that we are given a period finding machine that takes as input co-prime integers \(N,a\) and outputs the period of \(a\) modulo \(N\). Let us show how to use the machine to find all prime factors of \(N\). For simplicity, assume that \(N\) has only two distinct prime factors:

First, pick a random integer \(a\) between \(2\) and \(N-1\) and compute the greatest common divisor gcd:math:(N,a). This can be done very efficiently using Euclid’s algorithm. If we are lucky, \(N\) and \(a\) have some common prime factors in which case gcd:math:(N,a) equals \(p_1\) or \(p_2\), so we are done. From now on let us assume that gcd:math:(N,a)=1, that is, \(N\) and \(a\) are co-prime. Let \(r\) be the period of \(a\) modulo \(N\) computed by the machine. Repeat the above steps with different random choices of \(a\) until \(r\) is even. It can be shown that a significant fraction of all integers \(a\) has even period, see Table 1 for examples, so on average one needs only a few repetitions. At this point we have found some pair \(r,a\) such that \(r\) is even and \(r\) is the smallest integer such that \(a^r-1\) is a multiple of \(N\). Let us use the identity

The above shows that \(a^{r/2}-1\) is not a multiple of \(N\)
(otherwise the period of \(a\) would be \(r/2\)). Assume for a moment
that \(a^{r/2}+1\) is not a multiple of \(N\). Then neither of the
integers \(a^{r/2}\pm 1\) is a multiple of \(N\), but their product
is. This is possible only if \(p_1\) is a prime factor of
\(a^{r/2}-1\) and \(p_2\) is a prime factor of \(a^{r/2}+1\) (or vice
verse). Thus we can find \(p_1\) and \(p_2\) by computing
gcd:math:(N,a^{r/2}pm 1), see Table 1 for examples. In the remaining
``unlucky” case when \(a^{r/2}+1\) is a multiple of \(N\) we give up
and try a different integer \(a\). For example, \(a=14\) is the only
unlucky integer in Table 1. In general, it can be shown that the
unlucky integers \(a\) are not too frequent, so on average only two
calls to the period finding machine are sufficient to factor \(N\).

Let us now show that a quantum computer can efficiently simulate the
period finding machine. As in the case of theDeutsch-Jozsa
algorithm,
we shall exploit quantum parallelism and constructive interference to
determine whether a complicated function has a certain global property
that cannot be learned by evaluating the function only at a few
points. However, instead of detecting the property of being a balanced
function, we seek to detect and measure periodicity of the modular
exponentiation function. The fact that interference makes it easier to
measure periodicity should not come as a big surprise. After all,
physicists routinely use scattering of electromagnetic waves and
interference measurements to determine periodicity of physical objects
such as crystal lattices. Likewise, Shor’s algorithm exploits
interference to measure periodicity of arithmetic objects.

Suppose we are given co-prime integers \(a,N\). Our goal is to compute the
period of \(a\) modulo \(N\), that is, the smallest positive integer
\(r\) such that \(a^r=1\pmod{N}\). The basic idea is to construct a
unitary operator \(U_a\) that implements the modular
multiplication function \(x\to ax \pmod{N}\). It can be shown that
eigenvalues of \(U_a\) are closely related to the period of \(a\).
Namely, each eigenvalue of \(U_a\) has a form \(e^{i\phi}\), where
\(\phi=2\pi k/r\) for some integer \(k\). Furthermore, as we saw in
the previous section, eigenvalues of certain unitary operators can be
measured efficiently using the phase estimation
algorithm. Unfortunately, inferring \(r\) from the measured eigenvalues
of \(U_a\) is only possible if the eigenvalues are measured *exactly*
(or with an exponentially small precision). For example, factoring a
1,000-digit number would require measuring the eigenvalue of \(U_a\)
with a precision \(10^{-2000}\). Such accuracy cannot be achieved by a
direct application of the phase estimation algorithm as this would
require too large pointer system. Here comes the main trick: we shall
estimate the eigenvalue of \(U_a\) by applying the phase estimation
algorithm to a family of unitary operators \(U_b\) with
\(b=a,a^2,a^4,a^8\) etc. We stop at \(b=a^{2^p}\) with \(2^p\approx
N^2\). Why does it work? The first observation is that all operators
\(U_b\) are integer powers of \(U_a\). Namely, if \(b=a^t\) then
\(U_b=(U_a)^t\). This implies that the operators \(U_b\) have the
same eigenvectors. In particular, eigenvalues of the entire family
\(U_b\) can be measured simultaneously. Second, implementing \(U_b\)
is as easy as implementing \(U_a\) - one just need to precompute the
powers \(b=a,a^2,a^4,a^8,\ldots \pmod{N}\) by the repeated squaring
method. Finally, even if the eigenvalues of \(U_b\) are measured with a
poor precision (say 10%), each squaring of \(a\) reduces the error in
the estimated eigenvalue of \(U_a\) by a factor \(1/2\). Indeed,
consider an eigenvector of \(U_a\) with an eigenvalue
\(e^{i\phi}\). If \(b=a^2\) then the eigenvalue of \(U_b\) is
\(e^{2i\phi}\). If \(b=a^4\) then the eigenvalue of \(U_b\) is
\(e^{4i\phi}\) etc. Thus we can estimate
\(\phi,2\phi,4\phi,\ldots,2^p\phi\) with a constant precision (say
10%). We shall see that this is enough to estimate \(\phi\) with a
precision roughly \(2^{-p}\). For example, one can achieve a precision
\(10^{-2000}\) by a sequence of less than \(10^6\) lousy measurements
of \(U_b\) with an error 10%. Furthermore, it can be shown that
estimating a few randomly picked eigenvalues \(\phi=2\pi k/r\) with a
precision less than \(1/N^2\) is enough to determine the period \(r\)
exactly (the idea is to find the best rational approximation to the
estimate of \(k/r\) using continued fractions).

In order to use the phase estimation algorithm we need to construct a
quantum circuit implementing the modular multiplication operator. By
analogy with classical algorithms that can link standard library
functions, a quantum algorithm is allowed to call classical subroutines,
for example a subroutine for computing the modular multiplication.
Importantly, before such classical subroutines are incorporated into a
quantum circuit, they must be transformed into a *reversible
form.*More precisely, a quantum algorithm can call a classical
subroutine only if it is compiled into a sequence of reversible logical
gates such as CNOT or Toffoli gate (in particular, the number of input
and output wires in each gate must be the same). The subroutine is
allowed to use a scratch memory similar to local variables used by the
standard library functions. However, once the subroutine is completed,
the scratch memory must be totally clean (say, all zeros). The reason is
that a quantum algorithm operates on coherent superpositions of
different classical states. Leaving information about the inputs or the
outputs in the scratch memory could potentially destroy quantum
coherence and prevent the algorithm from seeing interference between
different states. Since the notion of reversible classical circuits
plays an important role in the Shor’s algorithm and many other quantum
algorithms, below we briefly discuss methods for constructing such
circuits.

An important insight made in 1973 by our IBM colleague Charles Bennett
is that any classical computation can be transformed into a reversible
form. How does it work? Suppose \(f(x)\) represents some classical
computation that takes as input \(n\)-bit strings \(x\) and outputs
\(m\)-bit strings \(f(x)\). The first observation is that the answer
\(f(x)\) can be computed without erasing any intermediate data if we
are allowed to use some extra memory. Indeed, let us write down an
algorithm for computing \(f(x)\) and compile it into a sequence of
elementary logical gates such as AND, OR, etc. For concreteness,
assume that each gate has two input wires and one output wire. Let
\(L\) be the total number of gates. We shall extend the \(n\)-bit
memory storing the input \(x\) by adding \(L\) bits initialized by
zeros. These extra bits will serve as a scratch memory for storing
intermediate data. We shall write the output of the \(i\)-th gate to
the \(i\)-th bit of the scratch memory and keep the values of
the input bits. Once the computation is completed, the final answer
\(f(x)\) is contained in some designated output register within the
scratch memory. The remaining part of the scratch memory contains some
“garbage” bit string \(g(x)\) (intermediate data). Below we
illustrate how it works for the example when \(f(x)\) computes the
3-bit Majority function.

At this point the circuit is reversible as a whole, but its individual gates are still irreversible. The next step is to transform each gate into a reversible form. Consider as an example the AND gate with input wires \(a,b\) and output wire \(c\) such that \(c=a\wedge b\). Let us define its reversible version R-AND. One of the output wires of R-AND must carry the output bit \(c\) of the standard AND gate. To avoid losing information, R-AND must have at least two other output wires (note that in the case \(c=0\) there are three possible input strings: \(ab=00,01,10\)). The simplest version of R-AND has three input wires and three output wires as shown below.

Here \(d\) is a dummy input wire and \(\oplus\) denotes XOR operation (addition modulo two). The gate expects to receive inputs with \(d=0\) in which case \(c=a\wedge b\). If \(d=1\) then the output data bit if flipped. Note that all inputs of R-AND can be computed from its outputs since \(d=c\oplus (a\wedge b)\). Thus R-AND indeed acts reversibly (technically, R-AND realizes a permutation on the set of 3-bit strings). Note also that R-AND coincides with the Toffoli gate. The same construction can be applied to any other gate with two input wires and one output wire. Namely, if a gate F computes some Boolean function \(c=F(a,b)\) then its reversible version R-F would map inputs \(a,b,d\) to outputs \(a,b,c\) where \(c=d\oplus F(a,b)\), see below. Note that applying R-F twice implements the identity gate, that is, R-F coincides with its own inverse.

Suppose the original circuit is described by a sequence of \(L\) gates \(F_1,\ldots,F_L\). Replace each gate \(F_i\) by its reversible version \(G_i=R\)-\(F_i\) constructed above. We shall connect the dummy input wire of \(G_i\) and its output wire \(c\) to the \(i\)-th bit of the scratch memory such that the gate always receives inputs with \(d=0\). The new circuit has \(n+L\) input and \(n+L\) output wires and is composed from reversible \(3\)-bit gates. The final state generated by the circuit can be written as \(x,g(x),f(x)\), where \(f(x)\) is the final answer stored in the output register somewhere within the scratch memory and \(g(x)\) represents ``garbage” (intermediate data). Here we assumed that the scratch memory is initially clean (all zeros). Thus we have constructed a reversible circuit that maps \(x,0^L\) to \(x,g(x),f(x)\). The final step is to get rid of the garbage \(g(x)\) without erasing any information (which would render the circuit irreversible). A solution is to copy the answer \(f(x)\) to a clean ancillary register of \(m\) bits and then ``uncompute” \(f(x)\) by applying the circuit backwards in time. Below we sketch how this works.

Ignoring for simplicity all ancillary bits that are initialized and returned in the zero state, we obtained a reversible circuit on \(n+m\) bits that maps input strings \(x,y\) to output strings \(x,y\oplus f(x)\). In the special case when the \(f(x)\) is invertible one can use similar tricks to construct a reversible circuit that maps input strings \(x\) to output strings \(f(x)\). In practice, one would never use the method described above since it requires too large scratch memory. Several optimization techniques for constructing reversible circuits have been proposed (such as uncomputing partial results more often and reusing scratch memory bits).

Suppose now that \(f(x)=ax\pmod{N}\) is the modular multiplication function. Let \(n\) be the number of binary digits in \(N\). Using \(n\)-bit strings to represent integers modulo \(N\), one can implement \(f(x)\) by a classical circuit \(U_a\) composed of 3-bit reversible gates with \(n\) input and output wires, as described above. The circuit \(U_a\) may also use ancillary bits that are initialized and returned in the 0 state. The state-of-the-art implementation would require roughly \(n^2\) gates and roughly \(2n\) ancillary bits. For simplicity, below we shall often ignore the ancillary bits. Let us convert \(U_a\) to a quantum circuit by replacing each classical gate with its quantum counterpart. This is possible because, by construction, each gate of \(U_a\) implements some permutation on the set of input bit strings \(000,001,\ldots,111\). The corresponding quantum gate implements the same permutation on the set of basis states \(|000\rangle,|001\rangle,\ldots,|111\rangle\). We obtained a quantum circuit \(U_a\) acting on a register of \(n\) qubits that maps a basis state \(|x\rangle\) to \(|f(x)\rangle\). An example for \(f(x)=7x\pmod{15}\) is shown below. Period finding algorithm requires modular multiplication circuits \(U_b\) for \(b=a,a^2,a^4,\ldots,a^{2^p} \pmod{N}\), where \(2^p\approx N^2\).

This quantum circuit implements \(U_7\) (see Markov and Saeedi 2012)

Let \(U=U_a\) be the modular multiplication operator. At this point we know how to construct a quantum circuit implementing \(U\) as well as repeated squares of \(U\) such as \(U^2,U^4,U^8\), etc. We also know that eigenvalues of \(U\) reveal information about the period of \(a\) modulo \(N\). The final step is to measure the eigenvalues. For that we shall need a controlled version of \(U\). A controlled unitary operator is a quantum analogue of classical conditional statements such as if-then-else. We already saw examples of controlled quantum gatesearlier in the tutorial. In general, suppose \(U\) is a quantum circuit acting on \(n\) qubits. A controlled version of \(U\) is a unitary operator acting on a larger system control+target, where control is a single qubit and target is a register of \(n\) qubits. Controlled-\(U\) applies \(U\) to the target register if the control qubit is \(|1\rangle\) state and does nothing if the control qubit is \(|0\rangle\).

Like their classical counterparts, controlled quantum operations are
used in almost any quantum algorithm. We note that if \(U\) can be
realized by a short quantum circuit then so does
controlled-\(U\). Indeed, one can take the circuit realizing \(U\) and
replace each gate by its controlled version (with the same control
qubit). The main distinction from the classical if-then-else construct
is that the controlled qubit can be in a superposition of state
\(\alpha|0\rangle +\beta|1\rangle\). One could say that in the
quantum world two branches of a conditional statement can be executed
“at the same time”. Consider now a special case when the target
register is prepared in some state \(\psi\) which is an eigenvector
of \(U\), that is \(U|\psi\rangle=e^{i\phi}
|\psi\rangle\). Then the only difference between the two branches
of the controlled-\(U\) operation is the phase shift \(e^{i\phi}\).
In other words, the control qubit gets mapped from
\(\alpha|0\rangle+\beta|1\rangle\) to \(\alpha|0\rangle
+e^{i\phi}\beta |1\rangle\), while the target register remains in
the state \(\psi\). Thus we can describe that the action of
controlled-\(U\) on the composite system control+target by a
single-qubit phase shift gate \(P\) acting on the control qubit.

Below we focus on what happens with the control qubit only (keeping in mind that it is part of the larger system control+target). We shall measure the eigenvalue \(e^{i\phi}\) using a pair of phase estimation circuits shown below.

One can easily check that the probability of observing the measurement
outcome \(0\) is \(0.5(1+\cos{(\phi)})\) for the first circuit and
\(0.5(1-\sin{(\phi)})\) for the second circuit. One should keep in
mind that \(P\) represents the controlled-\(U\) operator, so the
circuit extracts information about the phase \(\phi\) by measuring
interference between two branches of controlled-\(U\) where one branch
accumulates a phase factor \(e^{i\phi}\) and the other branch
accumulates no phase. By repeating each circuit several time and
collecting the measurement statistics we can estimate the
probabilities which gives us an estimate \(\phi\). For concreteness,
assume that we are willing to perform at most 100 measurements. Then
the statistical error in our estimate of \(\phi\) is roughly 10%.

To factor a number \(N\) with 1,000 decimal digits the phase \(\phi\)
has to be estimated with a very high precision \(\epsilon \sim 1/N^2
\sim 10^{-2000}\). To this end we shall perform the phase estimation
for a family of unitary operators \(U^t\), where \(t=1,2,4,8\) etc. We
stop at \(t=2^p\) such that \(2^p\approx 1/\epsilon\). Recall that
we can efficiently implement \(U^t\) for very large values of \(t\)
by classically computing \(b=a^{t}\pmod{N}\) and using the identity
\(U^t=(U_a)^t=U_b\). Since all operators \(U^t\) have the same
eigenvector \(\psi\), we can do all phase estimations with the same
target register (initialized in the eigenvector
\(|\psi\rangle\)). For simplicity, let us assume that the phase
estimations are performed sequentially in which case only one control
qubit is needed. The controlled-\(U^2\) operator gives rise to a phase
shift \(P^2\) by angle \(2\phi\) on the control qubit. Thus we can
estimate \(2\phi\) with a precision 10% by performing roughly 100
measurements. This gives an estimate of \(\phi\) with a precision
5%. More precisely, since the phase \(\phi\) lives on the unit
circuit, we get a pair of candidate angles \(\phi'\) and
\(\phi''=\phi'+\pi\) such that one of them approximates \(\phi\)
with a precision 5% and the other is very far from \(\phi\)
(approximately by \(\pi\)). However, we have already estimated
\(\phi\) itself with a precision 10%. This is enough to select one of
the candidate angles \(\phi'\) and \(\phi''\). Applying this
argument inductively several times shows that estimating
\(\phi,2\phi,\ldots,2^p\phi\) with a constant precision (say,
10%) is enough to estimate \(\phi\) with a precision roughly
\(2^{-p}\sim \epsilon\). Overall we would need approximately
\(M=100\log_2{(1/\epsilon)}\sim 10^6\) measurements which
translates to \(10^6\) controlled modular multiplication operators. In
general, \(M\) scales as \(\log{(N)}\) with some extra factors doubly
logarithmic in \(N\). Since each controlled modular multiplication
operator requires a quantum circuit of size \(\log^2{(N)}\), the
overall complexity of the factoring algorithm scales as
\(\log^3{(N)}\sim d^3\).

We have not explained yet how to initialize the target register in the eigenvector of \(U\). Fortunately, all eigenvectors are equally good for our purposes: we are not interested in any particular eigenvalue but rather want to measure a random eigenvalue drawn from the uniform distribution. Thus one can initialize the target register in an arbitrary state that has equal weight on each eigenvector of \(U\). For example, one can choose the initial state as the basis vector \(|0\ldots01\rangle\) encoding the integer \(x=1\).