# Energy-Efficient Single Flux Quantum Technology

Oleg A. Mukhanov, Senior Member, IEEE

Abstract—Figures of merit connecting processing capabilities with power dissipated (OpS/Watt, Joule/bit, etc.) are becoming dominant factors in choosing technologies for implementing the next generation of computing and communication network systems. Superconductivity is viewed as a technology capable of achieving higher energy efficiencies than other technologies. Static power dissipation of standard RSFQ logic, associated with dc bias resistors, is responsible for most of the circuit power dissipation. In this paper, we review and compare different superconductor digital technology approaches and logic families addressing this problem. We present a novel energy-efficient single flux quantum logic family, ERSFQ/eSFQ. We also discuss energy-efficient approaches for output data interface and overall cryosystem design.

*Index Terms*—low power, power efficient, RSFQ, SFQ, cryogenic, exascale, ballistic, switching energy, voltage regulator.

## I. INTRODUCTION

The network of integrated circuits, computers, and communication devices was far less important than their computational and communication capabilities. This has changed over the last few years, especially as energy prices have increased. Increased computing performances, combined with centralization of computing into data centers, has resulted in the doubling and tripling of data center energy densities. Information and communications technology (ICT) infrastructure already accounts for roughly 3 percent of global electricity usage and the same percentage of greenhouse gasses, and is expected to double in the next 4-5 years. Similarly in the mobile communications arena, total wireless basestation power consumption is expected to double every 8-10 years. No other industry, so vital to the world community, is increasing its energy and carbon footprint at such a drastic rate [1].

Studies for the next generation of computers show that the power consumption for a future Exascale supercomputing system, even under optimistic assumptions, would be 100–200 MW. This power is close to that generated by a small power plant – it is not feasible to bring that much power into a building safely. Moreover, just the electric bill alone would be more than \$100 million per year. The goal of the Exascale

Roadmap is a 20 MW exascale system or a two-orders of magnitude improvement in flops per watt over the technology available today [2]. For 1 exaflops ( $10^9$  Gflops), this limits us to < 20 pJ/flop or > 50 Gflop/Watt. Today, the best supercomputer has 0.5 - 0.7 Gflop/Watt.

The problem is that the currently dominant CMOS digital technology consumes too much power. According to information theory, the minimum limit for a binary transition energy is given by Shannon-Neuman-Landauer as  $E_{BITMIN} = k_BT \ln 2 \sim 4 \times 10^{-21} \text{J}$  (T = 300K). However in modern CMOS chips, the practical switching energy is about 10<sup>6</sup> x  $E_{BITMIN}$  in order to ensure practical requirements for reliability, speed, drivability, and data communication (local and global interconnect). It is hoped that with advances in localizing interconnects (e.g., with 3D packaging), it would be possible to reduce power associated with charging long interconnect lines. This would decrease the practical CMOS switching energy to ~10<sup>5</sup> x  $E_{BITMIN}$  [3]. Yet, even this value is still too high.

The implication of the high power dissipation of modern CMOS chips goes beyond just the raw energy cost. It limits the microprocessor clock speed, which has been stalled around 4-5 GHz for the last several years. This manifests a compromise between integration density and device switching speed imposed by thermal constraints. Heat removal capacity is singled out as the ultimate limit for CMOS scaling. It is argued that even if entirely different electron transport devices are invented for digital logic, they will not exceed the performance and integration densities obtainable with CMOS [4]. The cryogenic cooling of CMOS or other charged-based device technologies (e.g., SET, etc.) are not expected to improve this power dissipation problem. For nanoscale devices, cryogenic operation would even result in lesser efficiency [4]. Alternatives to the charge-based physical mechanisms for device operation are required in order to do better than CMOS [4]. In particular, such technologies should enable a ballistic signal transfer and therefore not be limited by the power necessary to charge capacitance of interconnect lines.

Superconductor single flux quantum technology has such an alternative physical mechanism for device operation. It is based on manipulation of magnetic single flux quanta (SFQ)  $\Phi_0 = h/2e$  with energy of ~2 x10<sup>-19</sup> Joule or 5x10<sup>3</sup>  $k_BT \ln(2)$  at T = 4 K. Superconducting microstrip lines with low loss and dispersion allows the ballistic transfer of the SFQ picosecond signals with speeds close to speed of light [5].

Manuscript received 3 August 2010. This work was supported in part by the US DoD under contract W911NF-09-C-0036.

O. A. Mukhanov is with HYPRES, Inc., Elmsford, NY 10523 USA (phone: 914-592-1190 extension 7801; fax: 914-347-2239; e-mail: mukhanov@hypres.com).

2

Low power, high speed, and high sensitivity of superconductor Rapid Single Flux Quantum (RSFQ) technology (see review [6]) have already attracted much attention for digital and mixed signal applications. Many RSFQ circuits (e.g., [7]-[11]) were demonstrated in which the prime focus was on the use of high speed and sensitivity of single flux quantum circuits. The ultra-low power feature of RSFQ was generally not important until very recently. As a result, practical RSFQ circuits were biased using dc current sources consisting of external voltage sources and on-chip bias resistors. Thus, the RSFQ circuit total power was dominated by Joule heating in bias resistors rather than the actual logic gate SFQ switching. Nevertheless, the resultant  $\sim 10^{-6}$ - $10^{-7}$ W/gate power dissipation is still well below CMOS power levels and quite acceptable for today's practical applications such as digital receivers [9].

Future VLSI technologies will require achieving lower power and higher energy efficiencies than in standard RSFQ circuits in order to be relevant to the challenges of Exascale supercomputers and to overcome the limitations for charge transfer based technologies such as CMOS. Even today, some specific cryogenic applications require much lower power. These are circuits for readout of cryogenic sensor arrays and peripheral circuits for superconducting quantum bits (qubits). Standard RSFQ circuits with resistor-biasing would not satisfy the stringent requirement for thermal budgets at millikelvin stages and thermal influence to the qubits and sensors.

In this paper, we will examine different approaches to reduce power in RSFQ circuits to their fundamental limits. We will compare earlier efforts and the latest techniques. A novel energy-efficient single flux quantum technology, ERSFQ/eSFQ, will be presented. We will discuss possibilities to achieve the ultimate limits in low power with physically and computationally reversible circuits.

We will also consider a cryogenic system aspect - energyefficiency of an entire cryocooled system: energy-efficient packaging, output drivers, and interfaces.

## II. RSFQ: STATIC AND DYNAMIC POWER DISSIPATION

## A. Standard RSFQ

Fundamentally, the power dissipation in RSFQ circuits is defined by energy loss during a  $2\pi$  phase slip (SFQ switching) in a Josephson junction. This *Dynamic* power dissipation is defined as  $P_D = I_b \Phi_0 f$ , where  $I_b$  is a bias dc current through the junction (typically  $I_b \sim 0.75 I_c$ ), and f is the SFQ switching frequency. The minimum value of  $I_c$  is limited by the RSFQ cell stability to fluctuations and is normally about 0.1 mA for 4K operation. For a typical RSFQ gate with several SFQ switches per clock cycle and 20 GHz clock,  $P_D \sim 13$  nW/gate.

In contrast to CMOS or any other charge-transfer devices, RSFQ power dissipation does not grow dramatically when interconnects are included. The RZ (return-to-zero) SFQ pulse can be transferred ballistically over superconductor passive transmission lines (PTLs). Some additional dynamic power will be dissipated in active Josephson Transmission lines (JTLs) and PTL drivers.

In practical RSFQ circuits, the bulk of dissipated power is

*Static* power  $P_{S}$ . It is associated with the resistive bias current distribution network which sets the bias current for each gate. It is constant and independent of clock frequency,  $P_S = I_b V_b$ , where  $I_b$  and  $V_b$  are bias current and dc supply voltage, respectively. Normally  $V_b$  is conveniently selected to be ~10x more than  $I_c R_s$  ( $R_s$  is the junction shunt resistance) or ~2.6 mV for typical circuits. This brings  $P_S$  for an RSFQ gate to ~800 nW or ~60 $P_D$ .

Until recently, this huge disproportion between the fundamental and "engineering" portions of the dissipated power was quite acceptable. This is because the total power dissipated in practical integrated circuits with 10,000-15,000 Josephson junctions (e.g., [9], [10]) is just a few mW, which is still very low. However for future VLSI circuits this will not be adequate.

## B. Reduced Static Power RSFQ

The first attempts to reduce static power were driven by specific applications with stringent power requirements, such as space applications, readout of millikelvin sensor arrays and, more recently, controllers of qubit circuits.

The reduction of  $P_S$  was implemented by lowering dc supply voltage  $V_b$  and value of bias resistors. In order to avoid possible SFQ switching crosstalk of the neighboring gates, inductances were introduced serially with bias resistors [12]-[14]. The limit of supply voltage reduction using such LRbiasing are defined by available operational and timing margins [14], [15]. It was calculated for a generic RSFQ circuits that  $V_b$  can be reduced to 0.6 mV to achieve  $P_s/P_D \sim$ 10 at 30 GHz clock in expense of a 10% margin decrease. For some simpler circuits such as a ripple counter,  $P_S$  can be close to  $P_D$  at  $V_b = 0.1$  mV and 30 GHz with a similar margin decrease. However, in practical autocorrelator circuit operating at 7 GHz,  $V_b = 0.6$  mV,  $P_s/P_D$  was still ~40, although total power was reduced by 4 times [12]. A 242junction clock generator circuit biased using the LR-biasing technique was demonstrated at 4.3 µW to achieve a 14x power reduction [14].

Overall, the LR-biasing technique can achieve significant reduction of power dissipation by lowering static power dissipation. However, given the tradeoff of power reduction levels and affordable reduction in margins, the requirement of extra circuit area limits its applications. Static power still will be dissipated even if a circuit is not actively working, e.g., in a stand-by mode. This is not desirable in some applications such as sensor and qubit readouts.

#### III. ELIMINATION OF STATIC POWER

As described above, the static power is associated with the use of a resistor network to distribute dc bias current. It is known that an inductance network can similarly distribute a dc current. However in RSFQ circuits, the all-superconducting bias distribution network would have to be perfectly balanced for phase (the number of  $2\pi$  increments related to SFQ switching), its frequency (or average voltage) at each bias terminal. Therefore, the same number of "1s" and "0s" should pass through all RSFQ cells, which is not possible in conventional RSFQ logic. Otherwise, a phase and average

voltage imbalance will develop during the circuit operation and create parasitic persistent current flowing through superconducting paths through the bias distribution network skewing the intended gate biases (Fig. 1). In standard RSFQ, the resistor biasing network prevents current redistribution, and breaks this phase relationship, thus no accumulation of phase difference will be possible.

This consideration must also be taken into account for the reduced  $P_s$  schemes based on the LR-biasing networks described above. If L/R time constant is too large compare to the clock period, the average voltage imbalance creates additional currents flowing from higher voltage nodes to the lower voltage ones. This will effectively limit the maximum speed of the LR-biased RSFQ circuits.

In order to eliminate resistors from dc current distribution network, <u>one must prevent imbalances of accumulated phase</u> <u>or average voltage at bias terminals</u> <u>and achieve phase and</u> <u>average voltage equalization.</u>



Fig. 1. A data dependent persistent current  $\Delta I$  will flow from GATE1 to GATE2 due to the relative difference in average voltages during circuit operation and phases in static mode depending on the number of SFQs (logic "1"s) passing through these cells.

## A. Dual Rail Scheme

The first attempt to eliminate dc bias resistors and therefore static power dissipation was suggested in [16]. This approach was based on a dual-rail scheme in which SFQ pulses are used to code both logic "1" and "0" in contrast to conventional RSFQ logic, in which "1" is coded by the presence and "0" by the absence of an SFQ pulse in a clock time window. In the dual-rail SFQ scheme, Josephson junctions will always switch regardless of "0" or "1" and advance their phase by  $2\pi$ . This equalizes the phases over different gates and prevents the phase imbalance shown in Fig. 1. While  $P_S = 0$ ,  $P_D$  is at least twice of that for conventional clocked RSFQ.

In order to construct a complete logic family, the delayinsensitive asynchronous SFQ cells were proposed. This led to a significant hardware overhead. As an example, a singlejunction JTL stage was replaced by a four-junction circuit (cf. Fig. 3 [16]). Overall, the dual-rail logic can avoid phase imbalances across gates. However, the hardware overhead made this approach impractical.

The issue of global current distribution, in the inductive biasing networks was not addressed in [16]. The inevitable phase drop in superconducting bias current lines would make achieving the correct current distribution very difficult.

# B. Self-Clocked Complementary Logic (SCCL)

Another SFQ logic family, SCCL, was suggested in [17]. It is based on a two-junction comparator (a complementary junction pair). The comparator is connected to a voltage rail, so a current dc bias is replaced with a voltage bias. Only one of two junctions is switching at the rail voltage depending on SFQ data. This ensures the phase equalization and establishing the same average voltage at all bias terminals equal to the rail voltage. This is fairly similar to SAIL logic based on serially connected, voltage biased SQUIDs [18]. The suggested SCCL logic gates are somewhat analogous to some RSFQ gates, however no further circuit development was pursued.

In order to provide a controllable movement of SFQ data, a two-phase self-clocking is suggested. Besides the voltage rail, two additional current buses are used to provide current supply and establish a  $\Phi_0/2$  phase shift in the voltage rail. To avoid arbitrary current flowing in voltage rail or in the ground, small resistors are inserted to connect the voltage rail to the current buses. The phase drop in the inductance of the current bus is isolated from the voltage rail, and hence from the logic circuits. Although these resistors can be small, they are still responsible for some small static power dissipation  $P_s$ .

#### C. Reciprocal Quantum Logic

Reciprocal Quantum Logic (RQL) is built on using an ac power supply [19]. In this logic, dc bias current is absent and therefore static power dissipation is not present on a chip. Although, the multi-phase ac power is terminated (static power is dissipated) off chip at room temperature. This avoids costly static power dissipation at a 4K cryogenic stage, but does not completely eliminate it from the system.

All RQL gates are inductively coupled to the ac power line. During logic gate operation, the ac power causes gate junctions to switch twice – first by  $2\pi$ , than by  $-2\pi$  on positive and negative swings of ac power, respectively. This resets the phase of the junctions and achieves the necessary phase balance in the circuit. Similarly to the dual-rail scheme, the RQL gate dynamic power  $P_D$  is larger at least by x2, since circuit junctions switch twice for each ac power cycle (clock period). Since the ac power is always applied, the dynamic power is always dissipated regardless of data. The use of external ac power allows one to avoid the use of an SFQ clock distribution network and reduces the overall circuit junction count.

The logic gates are combinational, and each pipeline stage can accommodate multiple levels of logic. The RQL design methodology is similar to CMOS rather than RSFQ.

The ac power lines are not terminated to ground, and this reduces the effect of ground bounce, which was a significant problem in previous ac-driven voltage-latching superconductor logic families. Still, the multi-phase ac power presents a significant challenge for the design of high-speed VLSI circuits. In particular, the phase control between multi-phase ac lines must be kept with an accuracy of a fraction of a clock period. It is not clear if this would be practically possible in large integrated circuits.

All previous suggestions to eliminate static power described above (dual-rail delay-insensitive logic, SCCL, RQL) required the replacement of the established RSFQ libraries with substantially different logic gates. This would require a major development effort. In some cases, the core advantages of RSFQ logic (e. g., dc power and high speed) were sacrificed. Recently, a new approach has been proposed - energyefficient RSFQ logic [20] with zero static power dissipation and the elimination of the resistor biasing network. In this approach, all RSFQ logic core advantages along with the vast established RSFQ circuit libraries are preserved as much as possible. Until recently, this was deemed impossible in RSFQ circuits without upsetting the circuit operation, since it would lead to phase and average voltage imbalances caused by SFO data flow. There are two somewhat different implementations: ERSFQ and eSFQ. The difference is mostly in the degree of modification of existing RSFQ gates to its energy-efficient versions.

#### A. Junction-based Bias Current Distribution

Similar to the transition from a resistor-based gate interconnect originally used in RSFQ (R for Resistive) [21] to the inductor-junction-based design in present day RSFQ (R for Rapid) [22], Josephson junctions with inductors in series can replace bias resistors as elements setting up the required amount of dc bias current for a logic gate. These bias current junctions  $J_B$  should have a critical current equal to the required bias current  $I_B$  as shown in Fig. 2. As evident from the overdamped junction current-voltage characteristics, such a junction can be an excellent current limiter for  $I = I_{CB}$  (inset in Fig. 2).

The use of Josephson junctions instead of resistors as a bias "current regulator" was also described earlier, although never applied to any reported SFQ circuit implementations [23]. The major difference is that such a junction current regulator was to operate in the highest differential resistance portion of its I-V characteristic. This leads to a non-zero static power dissipation, since the regulator junction will be in a power-dissipative ac mode. As it was stated, the power dissipation of such a current regulator could be very low compared with the dissipation associated with a fixed resistor bias approach [23].

In contrast in our approach, the limiting bias junction is normally operates in a zero-voltage, i.e., a zero-power dissipation mode. If the average voltage at the bias terminal  $V_{GATE}$  is equal to voltage at the common node (bias bus)  $V_B$ , then the bias limiting junction would remain in superconducting (zero-power) state. If the average voltage at the bias terminal  $V_{GATE}$  is lower than voltage at the common node (bias bus)  $V_B$ , then the bias limiting junction  $J_B$  would start to switch at  $V_B$ - $V_{GATE}$  average voltage. This would keep the bias current to a gate at the desired level and prevent flowing an excessive current into a gate. In general, these biasing Josephson junctions automatically generate sufficient voltage to maintain the average voltage at the common node at  $V_B$  in each respective branch, maintaining the bias current in each branch close to critical current of the limiting bias junction [20], [24].

The current limiting junctions also play a role in maintaining the phase balance between gates during static periods (e.g., during a stand-by mode) and during *power-up*. During the power-up procedure, bias current should distribute along the bias bus in order to set a correct global bias distribution. However there is a phase drop in the inductance of the superconducting current bus. Current limiting junctions will automatically switch until the compensation of this phase drop is achieved and proper biasing currents are set. The result of the power-up procedure is the establishment of the *approximately correct* global bias distribution with a zero average voltage across any point in the circuit, i.e., maintaining a zero static power while the bias currents are flowing to each gate.

## B. Voltage Bias Source

As a result of the powering-up, all bias currents would be equal to or somewhat lower (by  $\sim \Phi_0 / L_B$ ) than the required optimal gate bias. The next step is *turning-on* a voltage bias source, i.e. establishing a dc voltage across a bias bus. There is no advantage to have the bias bus voltage higher than that set by the maximum average gate voltage determined by the clock frequency  $f_C$ ,  $V_{GATEMAX} = V_B = \Phi_0 f_C$ . This also corresponds to the lowest power. Once the bias bus voltage is set, it supplies additional bias current increments to gates up to the levels limited by the critical currents ( $I_{CB}$ ) of the bias limiting junctions. These current increments add to the pre-set bias currents established during the power up procedure and establish the optimum design bias value for each gate.

The voltage bias source can be formed by a Josephson Transmission Line (JTL) connected via large inductances to the bias bus. An SFQ clock source inputs SFQ pulses to this *Feeding JTL* and sets a dc voltage  $V_B = \Phi_0 f_C$  across the bias bus. This feeding JTL provides the additional bias current increments to gates [20], [24].



Fig. 2. A junction-inductive bias current distribution network supported by a dc voltage source. Bias current is set by a serially connected  $L_B$ ,  $J_B$ . Inset: I-V characteristic of a shunted (overdamped) Josephson junction.

## C. ERSFQ – Adaptive Average Voltage Balancing

The above junction-limiting dc bias current distribution with a voltage source supported by the Feeding JTL can be used to deliver current bias to regular RSFQ gates. No redesign of the RSFQ gate equivalent circuits is required in order to implement such energy-efficient RSFQ (ERSFQ) circuits. The only difference from standard RSFQ gates is the replacement of bias resistors with the limiting Josephson junctions and series inductances. Switching of current limiting junctions will compensate for imbalance of average voltages across different bias terminals. This process is automatic and will *adaptively track the changes in the average voltages* and phase accumulation during the circuit operation.

The exact moment of switching of the limiting junctions depends on data content and generally is not synchronous with the clock. Therefore, some variations of bias current are possible although not desirable. In order to reduce these variations and smooth out transients caused by switching of the limiting junctions, the series inductance L<sub>B</sub> (Fig. 2) should be sufficiently large. Each SFQ switching event changes the gate bias current by  $\Delta I = \Phi_0 / L_B$ . This current change should be at least less that the current bias margin for a particular RSFQ gate. In fact, a higher inductance L<sub>B</sub> is beneficial in order to minimize circuit timing variations caused by dc bias current variations. Otherwise, it will reduce the maximum clock frequency, since the minimum clock period would have to include these timing variations.

Fig. 3 shows an example of an ERSFQ circuit layout [24] a fragment of a 20-bit binary counter implemented in HYPRES 4.5 kA/cm<sup>2</sup> fabrication process with four superconductor layers [25]. This circuit was successfully demonstrated at over 67 GHz with large parameter margins. It is evident from the layout that bias inductors  $L_B$  occupy a significant circuit area. It is expected that these inductances be implemented more compactly with can more superconductor layers available in future generation fabrication processes.

Standard RSFQ circuits use 1 bias resistor for each 3 junctions on average - 1 resistor per 2 junctions in JTLs and up to 1 resistor per 4 junctions in logic gates. Therefore, it is reasonable to estimate that the transition to ERSFQ from RSFQ cells will increase junction count by ~25%. Additional junctions will be necessary for the feeding JTLs. The overall junction increase can be estimated as ~33-40%.



Fig. 3. A layout fragment of the demonstrated ERSFQ circuit in HYPRES  $4.5 \text{ kA/cm}^2$  fabrication process. The ERSFQ gates are largely equivalent to standard RSFQ gates with the exception of the biasing network.

## D. eSFQ - Synchronous Phase Balancing

The above ERSFQ approach allows us to achieve zero static power dissipation while retaining the conventional RSFQ circuit designs and dc power supply. However, the area of ERSFQ circuits can become larger due to the introduction of sizeable bias inductors and the feeding JTLs. These are necessary to smooth out the bias current variations due to asynchronous SFQ switching of the limiting junctions during circuit operation. As shown below, it is possible to eliminate the need for the large bias inductors by forcing synchronous (at every clock cycle) phase compensation at gate bias terminals. It is also possible to avoid using separate feeding JTLs by combining them with clock JTLs. This is realized in the energy-efficient RSFQ version with *synchronous phase compensation* – eSFQ.

Similar to ERSFQ, the eSFQ approach relies on dc current biasing distributed via current limiting junctions and a voltage bias source. It is worth noting, that the large-value inductances  $L_B$  are not necessary for biasing the clock JTL network. Generally, this network has the highest average voltage  $\Phi_0 f_C$ , and its bias limiting junctions never switch during operation. They *only* switch during *powering-up* to compensate the phase drop along the bias bus. The clock JTL network is omnipresent in digital RSFQ circuits and therefore can play a role of a distributed voltage source with  $V = \Phi_0 f_C$  similar to the feeding JTL in ERSFQ circuits. Consequently, it is possible to use the clock network as a feeding JTL.

Any RSFQ gate with the same phase (average voltage) at bias terminals as one of the clock network will not experience switching of the bias limiting junctions during operation and, therefore, will not require large bias inductors. Every clocked RSFQ gate has a decision-making pair – two seriallyconnected Josephson junctions. Every clock cycle, one of the pair junctions makes a  $2\pi$  phase slip regardless of data content. As a result, the phase and average voltage across the decision-making pair is always the same as across the junctions in the clocking JTL. Unfortunately, this natural phase balance is not utilized, since the bias terminals for standard RSFQ (and therefore ERSFQ) gates are designed without regard to phase (average voltage) balancing.



Fig. 4. Modification of standard RSFQ gate (DFF) into eSFQ version. The dc current bias terminal is moved to the decision-making pair.

Here in the eSFQ approach, we propose to introduce the gate current bias always via the decision-making pair and avoid the necessity for large bias inductor  $L_B$ . Fig. 4 presents the schematic of a standard RSFQ gate that is slightly modified to be compatible with resistor-less biasing. This circuit is the D flip-flop (DFF), which permits a data bit to be stored in the cell until it is released by the SFQ clock. In the conventional RSFQ design on the left, the bias current injects current just above junction  $J_2$ , so the phase and average voltage are data-dependent. The clock line sends an SFQ pulse to the decision-making pair – a series combination of  $J_3$ and  $J_4$ , such that in every case, one or the other (but not both) junctions switch. Therefore, for a clock input at a rate  $f_c$ , the voltage at the clock input is  $\Phi_0 f_c$ . In the eSFQ DFF design on the right, the current bias is inserted instead into the clock line. This permits this circuit to be biased with the same network that biases a clock distribution line, which also has an average voltage of  $\Phi_0 f_c$ . The bias limiting junction  $J_B$  is necessary only to set the gate bias current and may only switch to adjust phase during the powering-up. It is not expected to switch during the normal operation.

This change in bias point is not entirely trivial; the detailed parameters of the circuit have to be reoptimized with changes in selection of critical currents and inductor values, in order to maintain a large margin of operation. It will also pre-set a gate into logic "1" after biasing-up, which requires initial clock cycles to reset. Similar changes are possible for most clocked RSFQ logic gates.



Fig. 5. Possible modifications of (a) standard RSFQ data JTL into eSFQ (b) clocked JTL. The dc current bias is applied via the decision-making pair. (c) Supply-free ballistic JTL with unshunted junctions.

More significant changes are required to data transmission circuits. In standard RSFQ, data is transported between clocked gates using asynchronous JTLs, mergers, splitters and PTLs. For the eSFQ implementation, there are several possible options.

First, clocked data transmission can be used. This can be done with shift register type circuit based on 2-junction cells (Fig. 5(b)), which can be designed as a relatively compact circuit [26]. This RSFQ shift register can transformed to its eSFQ version by a simple replacement of resistors with bias limiting junctions. The unit cell can be easily extended to perform SFQ merging and SFQ splitting functions.

Second, a JTL formed with unshunted Josephson junctions can act a "supply-free" passive transmission line. It avoids any power dissipation by using *ballistic* motion of the flux quanta [27]. It can ballistically transport SFQs between clocked (i.e., powered) SFQ gates or drivers. In fact these ballistic JTLs can be used in standard RSFQ, ERSFQ circuits as well.

Finally, regular PTLs can be used with clocked eSFQ PTL drivers. The clocked PTL drivers can bring better data synchronization and can simplify timing.

Other asynchronous circuits, e.g., toggle flip-flop (TFF), can be made "supply-free" [28], as all biasing done via adjacent JTLs. Similarly for the eSFQ implementation, these gates will be biased via the Clocked JTLs.

All these different options for data transmission can be combined in one circuit to achieve the best performance while minimizing total circuit power and area.

One can also estimate the increase in the junction count relative to standard RSFQ. Similarly to ERSFQ, eSFQ gates will use ~25% more junctions. While avoiding junction used for the feeding JTLs, the eSFQ data lines might use more junctions than those in standard RSFQ. Assuming the use of the clocked JTLs, ballistic JTLs and PTLs for data transmission, the overall circuit junction increase can be estimated as ~33-40%. However, the circuit area is expected not to increase due to avoidance of large bias inductances and feeding JTLs.

In contrast to ERSFQ, the bias limiting junction in eSFQ circuits may switch only during powering up but not during normal circuit operation. Consequently, there will be no additional dynamic power  $P_D$  dissipated in eSFQ gates compared to the standard RSFQ. Small additional dynamic power (~10%) can be associated with the use of data Clocked JTLs.

Since both ERSFQ and eSFQ use very similar dc bias distribution network based on the use of limiting junctions, they can be combined in the same circuit to achieve the best integrated circuit area utilization. The use of the feeding JTL is more natural in biasing asynchronous mixed-signal circuits, e.g., analog-to-digital converter modulators. The use of eSFQ circuits should be more effective in digital circuits.

## E. Bias Voltage Regulator: Zero-Power Mode

Since both ERSFQ and eSFQ have a dc bias voltage source with its voltage determined by the SFQ clock, it is possible to actively manage dynamic power dissipation by controlling the SFQ clock network or feeding JTLs. By turning the clock ON or OFF for all or particular circuit sections, one can set  $V_B = 0$ and effectively stop circuit operation. This will set dynamic power  $P_D = 0$  and achieve a "zero-power mode." There are NDRO-type SFQ gates [29], [30], which can be used to switch the SFQ clock propagation ON and OFF at high speed using an SFQ control. The fast SFQ switching enables the realization of a *fast bias voltage regulator* and therefore fast controllable switching between active and sleeping modes.

"Zero-power at zero-circuit activity" is particularly valuable for circuits operating in "burst regime," e.g., for detector and qubit readout. These applications would benefit from having the readout circuitry non-dissipative and noiseless until the readout event.

Such a "zero-power mode" is particularly difficult to enable in other technologies. Active power management in CMOS chips includes disconnecting them from supply or lowering the supply, e.g., for "sleep modes" in SRAM [31]. It seems to be not possible to have this zero-power mode in any other alternative low power SFQ approaches described in the previous sections including the LR-biased RSFQ, the dual-rail, delay-insensitive logic, SCCL, and RQL. The latter, in particular, has a relatively high-power ac clock applied to the circuit all the time [19].

Moreover in ERSFQ and eSFQ circuits, the ability to control propagation and rate of SFQ clock opens an opportunity to operate different circuit sections at different bias voltages and therefore minimize further the total circuit power. This would be particularly beneficial for the multi-rate circuits, in which there are a high clock section (e.g., ADC oversampling comparator) and lower clock sections (e.g., digital filters) [32]. In order to prevent higher dc bias voltage to propagate to the lower clock circuit section, one should have separate bias current buses for each circuit section with different bias voltage. In this case, each section would have voltage bias determined only by the clock frequency of that section. Such flexibility in active power circuit management further enhances power efficiency of ERSFQ/eSFQ circuits.

Separation of current bias buses for different clock rate sections can be matched with different islands used in the current recycling approach. Current recycling is known to reduce the total bias current by partitioning an RSFQ circuit into the serially-biased islands each having an equal bias current as it was demonstrated for a digital filter circuit [33]. An ERSFQ/eSFQ circuit can be partitioned into islands with equal bias currents, but different bias voltages in order to achieve the lowest overall power dissipation.

TABLE I COMPARISON OF ENERGY EFFICIENT SFQ CIRCUIT TECHNOLOGIES TO STANDARD RSFQ

| Feature                  | LR-<br>biased<br>RSFQ<br>[12]-<br>[14] | Dual-<br>Rail<br>[16] | SCCL<br>[17]  | RQL<br>[19]                          | ERSFQ<br>[20],<br>[23]               | eSFQ<br>[20]                      |
|--------------------------|----------------------------------------|-----------------------|---------------|--------------------------------------|--------------------------------------|-----------------------------------|
| $P_{S}$                  | reduced                                | zero                  | small         | zero                                 | zero                                 | zero                              |
| $P_D$                    | same                                   | x3                    | x2            | x2                                   | x2                                   | x1.1                              |
| Zero<br>Power<br>Mode    | None                                   | None                  | None          | None                                 | Yes                                  | Yes                               |
| Bias<br>distrib<br>ution | DC<br>current,<br>(L, R)               | DC<br>current<br>(L)  | DC<br>voltage | AC<br>current,<br>(transfor<br>mers) | DC<br>voltage,<br>current<br>(L, JJ) | DC<br>voltage,<br>current<br>(JJ) |
| $V_B$                    | less                                   | $\Phi_0 f_C$          | $\Phi_0 f_C$  | Higher                               | $\Phi_0 f_C$                         | $\Phi_0 f_C$                      |
| JJ<br>count              | same                                   | x4                    | TBD           | smaller                              | x1.4                                 | x1.4                              |
| IC<br>Area               | slightly<br>larger                     | x4                    | TBD           | smaller                              | x1.4                                 | same                              |
| Use of<br>RSFQ<br>gates  | RSFQ<br>gates                          | new<br>gates          | new<br>gates  | new<br>gates                         | RSFQ<br>gates                        | modified<br>RSFQ<br>gates         |
| Max<br>clock             | lower                                  | asynch<br>ronous      | same          | lower                                | slightly<br>lower                    | same                              |
| Switch<br>Speed          | same                                   | x0.7                  | x0.7          | same                                 | same                                 | same                              |

TBD – To Be Determined. The dual-rail SFQ logic and SCCL were never developed beyond initial preliminary descriptions. Estimates are author's opinion based on the available information to date.

#### V. COMPARISON OF DIFFERENT SFQ APPROACHES

We described several SFQ logic approaches for minimizing the power dissipation in SFQ circuits. They all focused on reduction or elimination of static power caused by the resistorbased bias distribution method. Table I summarizes basic features of the described low-power SFQ families as compared to Standard RSFQ. Some entries can be estimated quantitatively, while others are given as a qualitative trend. For example, the practical maximum clock frequency in RQL might be lower than that of standard RSFQ due to difficulties in providing a skew-free ac multi-phase clock over multiple parallel microstrip lines. For ERSFQ, clock frequency will be determined by the value of bias inductors: for higher  $L_B$  values the timing variations will be lower and will not significantly limit clock frequency. Reduction of switching speed for the dual-rail and SCCL logics is related to inductive shunting of junctions via bias lines.

## VI. ULTIMATE LOW POWER COMPUTATION: REVERSIBLE CIRCUITS

In order to reduce further circuit power dissipation, it is necessary to reduce its dynamic power  $P_D$ . The information theory thermodynamic limit is  $E_{BITMIN} = k_BT \ln 2$ . However it is known that only erasure of information costs energy and information can be preserved in logically reversible computation which does not erase information [34]. If such computation is performed sufficiently slowly to maintain physical reversibility, then the energy per operation can be smaller than  $k_BT \ln 2$ .

There have been multiple attempts to build such information and physically reversible circuits in different technologies. Superconducting SFQ devices are the most promising candidates for implementation of reversible circuits [35]-[38]. In contrast to RSFQ, the generation and annihilation of SFQ vortices are avoided, since the energy of these vortices is much higher than the thermodynamic threshold. In other words, SFQs are "recycled" and as a result their total number remains unchanged.

Initial versions of superconducting reversible circuits were based on parametric quantrons or quantum flux parametrons designed for multi-phase ac power [35]-[37]. The ac power has proven to be an intractable obstacle for the implementation of working circuits. Subsequently, a dc-powered version of the parametric quantron was suggested [39].

Recently, dc-powered SFQ circuits were implemented based on a negative-inductance SQUID (nSQUID), i.e., a dc SQUID with negative mutual inductance between the arms of the SQUID loop [38]. An experimental nSQUID circuit was fabricated at with HYPRES 100 A/cm<sup>2</sup> process [25] and tested up to 5 GHz. When running at 50 MHz, the measured power dissipation was ~  $2k_BTln2$  for an 8-nSQUID shift register circuit [40].

It is possible to operate nSQUID circuits (or Superconductor Flux Logic (SFL)) in reversible or irreversible regimes depending on clock frequency. The clock frequency should be quite low to achieve the ultimate reversibility. At irreversible mode, the demonstrated circuits remain operational at several GHz clock rates though dissipating a few or even dozens of  $k_BT \ln 2$  per cycle [41].

In practical systems, it would be possible to combine fully reversible SFL modules with SFL circuits working in nonreversible mode (e.g., at higher clock rate). Since both SFL and eSFQ/ERSFQ circuits are dc-powered, it is also possible to integrate both circuit technologies on a same chip or multichip module in order to achieve the optimal power efficiency. It would also be a natural way to integrate the ultimately low power, physically and computationally reversible technology into practical applications.

#### VII. ENERGY EFFICIENT SYSTEM DESIGN ISSUES

Energy-efficiency has a paramount importance in computing system integration. The cooling infrastructure for CMOS-based supercomputers accounts for a substantial portion of the overall system power budget, reaching approximately 50% (industry average) [42]. For cryogenictechnology-based computing systems, an energy-efficiency of system integration is critical. Power savings in digital technologies can be lost, if the system design is not optimized for low power. The system aspects include the data interface, power delivery, cryopackage, and cryocooler. The optimum placement of various system elements and technologies at temperature stages different following the **h**ybrid temperatures, hybrid technologies  $(ht^2)$  integration principle [9] can be used to achieve the lowest overall heat load and overall power consumption for the cryocooler.

## A. Energy-Efficient DC Power Cables

The heat load for the cryocooler must be minimized by preventing heat conduction from room-temperature to the lowest-temperature, i. e. the lowest heat capacity cryocooler stages. High temperature superconductors (HTS) can dramatically lower the overall heat loads at the 4 K stage that supports SFQ circuits by virtue of their lossless transport of DC current and low thermal conductivity. For normal metals, the conflict between Joule heating and thermal conduction by electrons means that there is an optimum resistance for any given current spanning two different temperatures [43]. HTS materials with zero Joule heating up to about 90 K break through this optimum barrier. A multi-line flexible HTS tape on hastelloy substrate with connectors was successfully demonstrated [44]. The heat leak per HTS line is 1/10th of that for optimized leads made from normal metals for the typical values of bias currents used for RSFQ circuitry. By using different substrates (e.g., flexible yttria-stabilized zirconia (YSZ)), line widths, etc. it is possible to reduce the heat leak even further.

#### B. Energy-Efficient Data Cables

Similarly, the HTS material can be used for the implementation of high-speed low-heat conduction data interconnect cables for system data link. In contrast with normal metal data interconnects, superconducting line pitch can be substantially reduced with negligible signal loss and crosstalk [45]. This reduces the overall cable size and further decreases the cable heat conduction.

The greatest advantage of the superconducting data interconnect cables comes from its low loss and low dispersion properties allowing ballistic transfer of signal pulses up to several tens of gigahertz [5]. For this reason, HTS interconnects were even considered for inter-chip communications between Si and GaAs integrated circuits [45].

For our energy-efficient system integration, the HTS data cables should enable a ballistic transport of low-voltage (~millivolt) output signals over a significant distance (~10 cm) to higher temperature stages at tens of gigahertz. This avoids the use of power-hungry and bulky output drivers at 4K necessary to amplify the output signal and charge the

capacitance of normal-metal lines of the output data link. We estimated the optimum data rate for the HTS interconnect cable as ~10-25 Gb/s due to the quadratic HTS surface impedance frequency dependence.

In order to build an HTS data cable with a microstrip line cable geometry, a two-layer HTS thin film fabrication process needs to be developed.

#### C. Energy-Efficient Data Link Electronics

Exascale systems will feature massively parallel high-speed data interfaces with thousands of lines. The disconnect between the *sub-millivolt* signal levels of low power SFQ digital processing circuits and the required *sub-volt* signal levels to drive electro-optical (E/O) devices creates a significant roadblock for energy-efficient system integration. In order to cross this gap, both amplification of output SFQ data signals and the reduction of input signal levels for E/O elements are needed. The optimization of the data link energy-efficiency requires one to determine the optimum signal voltage levels, amplification gains at each temperature level. The lower the temperature level the more external power would be required to cool the amplification stage. The energy efficiency figure of merit  $F_{EE}$  can be expressed as

$$1/F_{\rm EE} \sim G_1/T_1 + G_2/T_2 + G_3/T_3, \tag{1}$$

where G – gain of an amplifier located at temperature *T*, e.g., T = 4K, 40K, 70K, etc. The higher temperature the higher gain stage can be afforded.

The lowest-power and fastest data output amplifier (driver) is an SFQ-to-dc converter capable of delivering close to 1 mV in a differential configuration. It is possible to implement this driver as an eSFQ circuit. The estimated dissipated power difference between such an eSFQ driver and, e.g., a 16 mV equalizing driver [46] is ~500 times. Clearly, the generation of higher output voltage at 4K is very costly in power efficiency even assuming the most optimistic 300 W/W efficiencies for large 4K cryocoolers. Besides the power dissipation, the area of the 16 mV drivers is ~ 30 times larger than the area of the SFQ-to-dc driver, which is an important parameter for a high-density output interface with hundreds of lines per chip. A 1 mV signal level is sufficient for transmitting over distance of several centimeters to higher temperature stages (e.g., to a 50 K stage) at high data rate using the HTS interconnect microstrip ribbon cable described above.

It is important to note, that the energy-efficiency of low power drivers will be fully utilized only in combination with the output superconducting interconnect cable in order to avoid the higher power required otherwise to charge capacitance of the output interconnect line.

The maximum extent of HTS interconnect cable is a 77K temperature stage. At this stage, it is necessary to amplify the signal for further transmission over a normal-metal or optical data link. To maximize the data link energy efficiency, it is necessary to develop a low-power E/O converter or modulator capable of working with a few millivolt input signals. It should be capable of cryogenic operation at as low as ~50-77 K. Placing the E/O devices at lower temperatures will be

less energy-efficient due to lower heat lift efficiency for the lower temperature stages.

Development of low-power E/O devices with low input signal of  $\sim 1 \text{ mV}$  thus becomes an important topic for the development of energy-efficient cryogenic computing systems. Polarization modulating VCSELs seems to be a good candidate for these E/O devices [47].

We estimated an optimum energy-efficient configuration for the data link from cryogenic low-power system to room temperature. This a 20 Gb/s data link consisting of a 1 mV eSFQ driver at 4K, HTS ribbon interconnect cable spanning 4K and 70K temperature stages, 1 mV-input E/O device at 70K, and fiber-optic cable spanning from 70K to room temperature.

## VIII. CONCLUSIONS

Digital SFQ technology based on manipulation and ballistic transfer of magnetic flux quanta provides a viable low-power alternative to CMOS and other charge-transfer based device technologies. We reviewed and compared several possible approaches to achieve the fundamental energy consumption in SFQ circuits defined by SFQ energy  $2x10^{-19}$  J. All of them are focused on reduction and elimination of static power dissipation related to bias distribution.

A novel energy-efficient zero-static-power SFQ technology, eSFQ, is introduced, which retains all advantages of RSFQ circuits: high-speed, dc power, internal memory, etc. This new generation energy-efficient RSFQ logic family retains largely the vast developed libraries of RSFQ gates by modifying gate biasing using the novel junction-based current distribution technique.

The voltage bias regulation, determined by SFQ clock and controlled by SFQ NDRO gates, opens a way to actively manage power dissipation enabling the *zero-power at zero-activity regime*, which can be a valuable feature for many applications including sensor and qubit readout.

As it was discussed above, ERSFQ/eSFQ circuit technology can achieve energy per operation at the order of  $5 \times 10^3 k_B T \ln(2)$ , while CMOS circuits are featuring  $10^6 k_B T \ln(2)$ . This is a ~200 advantage in  $k_B T$  units. If we compare the energy per operation on a chip level, i.e. excluding cryocooling factor which is valid for power density considerations, the ERSFQ/eSFQ advantage will be of the order of  $10^4$  in Joules.

Novel ERSFQ and eSFQ technologies are based on relatively matured standard RSFQ with already demonstrated practical digital ICs. These new circuit technologies compare well in power efficiency and maturity to other *beyond-CMOS* prospective technologies based on carbon nanotubes [48] or nanomagnets [49].

Future improvements in CMOS technology in the elimination of long interconnects, etc. can potentially reduce our advantages by ~10 times. Fortunately, the integration of ERSFQ or eSFQ circuits with nSQUID-based SFL circuits could lead to a further dramatic reduction of dissipated power by  $10^3$  times [41]. This would create a significant power margin over CMOS and perhaps would create an *imperishable discriminator*, i.e. providing advantage unattainable by the competition even with future improvements.

The ultimate low power computation, although at relatively low speed, can be achieved in logically and physically reversible SFQ circuits. Since both eSFQ/ERSFQ and nSQUIDs are dc-powered and based on similar fabrication processes, it would be practical to combine them on a same chip or a multichip module in order to maximize power efficiency in achieving specific system performance goals.

No matter how high the energy efficiency of the digital circuits described above, the cryogenic nature of SFQ devices requires stringent attention to energy efficiency of an entire system. Low heat leak power delivery to 4K can be handled by HTS cables, which have been already proven their feasibility. The output HTS data interconnect cables will enable the ballistic transmission of ~1mV signals from 4K energy-efficient eSFQ drivers to ~70K E/O devices or amplifiers.

## ACKNOWLEDGMENT

I would like to thank V. Semenov, D. Kirichenko, A. Kirichenko, A. Kadin, S. Rylov, T. Filippov, R. Webber, D. Gupta, I. Vernik, A. Marquez, L. Craymer, M. Dorojevets, Q. Herr, A. Herr, A. Silver, A. Rylyakov, M. Johnson, S. Sarwana, A. Pan, K. Choquette, F. Bedard, E. Track, R. Hitt for helpful discussions and advice. Discussions and encouragement from M. Manheimer, S. Holmes are appreciated.

#### REFERENCES

- [1] S. Ruth, "Green IT more than a three percent solution," *IEEE Internet Computing*, pp. 80-84, July/Aug. 2009.
- [2] A. Geist "Paving the roadmap to Exascale," SciDAC Review, No. 16, 2010 [Online]. Available: <u>http://www.scidacreview.org</u>.
- [3] S. Mukhopadhyay, "Switching energy in CMOS logic: how far are we from physical limit," 2006 [Online]. Available: <u>http://nanohub.org/</u> resources/1250.
- [4] V. V. Zhirnov, R. K. Cavin, J. A. Hutchby, and G. I. Bourianoff, "Limits to binary logic switch scaling—a gedanken model," *Proc. IEEE*, vol. 91, pp. 1934-1939, Nov. 2003.
- [5] R. L. Kautz, "Picosecond pulses on superconducting striplines," J. Appl. Phys., vol. 49, pp. 308-314, Jan. 1978.
- [6] K. Likharev and V. Semenov, "RSFQ logic/memory family: A new Josephson-junction technology for sub-terahertz clock-frequency digital systems," *IEEE Trans. Appl. Supercond.*, vol. 1, pp. 3-28, Mar. 1991.
- [7] W. Chen, A. V. Rylyakov, V. Patel, J. E. Lukens, and K. K. Likharev "Superconductor digital frequency divider operating up to 750 GHz," *Appl. Phys. Lett.*, vol. 73, pp 2817 -2819, Nov. 1998.
- [8] O. Mukhanov, D. Gupta, A. Kadin, and V. Semenov, "Superconductor analog-to-digital converters," *Proc. of the IEEE*, vol. 92, pp. 1564-1584, Oct. 2004.
- [9] O. A. Mukhanov, D. Kirichenko, I. V. Vernik, T. V. Filippov, A. Kirichenko, R. Webber, V. Dotsenko, A. Talalaevskii, J. C. Tang, A. Sahu, P. Shevchenko, R. Miller, S. B. Kaplan, S. Sarwana, and D. Gupta, "Superconductor Digital-RF receiver systems," *IEICE Trans. Electron.*, vol. E91-C, pp. 306-317, Mar. 2008.
- [10] A. Fujimaki, M. Tanaka, T. Yamada, Y. Yamanashi, H. Park, N. Yoshikawa, "Bit-serial single flux quantum microprocessor CORE," *IEICE Trans. Electron.*, vol. E91-C, pp. 342-349, Mar. 2008.
- [11] Y. Yamanashi, T. Kainuma, N. Yoshikawa, I. Kataeva, H. Akaike, A. Fujimaki, M. Tanaka, N. Takagi, S. Nagasawa, M. Hidaka, "100 GHz demonstrations based on the single-flux-quantum cell library for the 10 kA/cm<sup>2</sup> Nb fabrication process," *IEICE Trans. Electron.*, vol. E93-C, pp. 440-444, Apr. 2010.
- [12] A. Rylyakov, "New design of single-bit all-digital RSFQ autocorrelator," *IEEE Trans. Appl. Supercond.*, vol. 7, pp. 2709-2712, June 1997.

10

- [13] N. Yoshikawa, Y. Kato, "Reduction of power consumption of RSFQ circuits by inductance-load biasing," *Supercond. Sci. Technol.*, vol. 12, pp. 918-920, Nov. 1999.
- [14] Y. Yamanashi, T. Nishigai, N. Yoshikawa, "Study of LR-loading technique for low-power single flux quantum circuits," *IEEE Trans. Appl. Supercond.*, vol. 17, pp. 150-153, June 2007.
- [15] A. Rylyakov and K. Likharev, "Pulse jitter and timing errors in RSFQ circuits," *IEEE Trans. Appl. Supercond.*, vol. 9, pp. 3539-3444, June 1999.
- [16] S. Polonsky, "Delay insensitive RSFQ circuits with zero static power dissipation," *IEEE Trans. Appl. Supercond.*, vol. 9, pp. 3535-3538, June 1999.
- [17] A. H. Silver, Q. P. Herr, "A new concept for ultra-low power and ultrahigh clock rate circuits," *IEEE Trans. Appl. Supercond.*, vol. 11, pp. 333-336, June 2001.
- [18] S. M. Schwarzbeck, K. Yokoyama, D. Durand, R. Davidheiser, "Operation of SAIL HTS digital circuits near 1 GHz," *IEEE Trans. Appl. Supercond.*, vol. 5, pp. 3176-3178, June 1995.
- [19] Q. P. Herr, "Single flux quantum circuits," US Patent 7 724 020, May 25, 2010. O. T. Oberg, Q. P. Herr, A. G. Ioannidis, A. Y. Herr, "Integrated power divider for superconducting digital circuits," *IEEE Trans. Appl. Supercond.*, to be published.
- [20] O. A. Mukhanov, D. E. Kirichenko, and A. F. Kirichenko, "Low power biasing networks for superconducting integrated circuits," Patent application 61/250,838, Oct. 12, 2009; 12/902,572, Filed 10/12/2010.
- [21] K. K. Likharev, O. A. Mukhanov, and V. K. Semenov, "Resistive Single Flux Quantum logic for the Josephson-junction digital technology," in *SQUID*'85, Berlin, 1985, pp. 1103-1108.
- [22] O. A. Mukhanov, V. K. Semenov, and K. K. Likharev, "Ultimate performance of the RSFQ logic circuits," *IEEE Trans. Magn.*, vol. MAG-23, pp. 759-762, Mar. 1987.
- [23] L. R. Eaton, M. W. Johnson, "Superconducting constant current source," U.S. Patent 7 002 366 B2, Feb. 21, 2006.
- [24] D. E. Kirichenko, A. F. Kirichenko, S. Sarwana, "No static power dissipation biasing of RSFQ circuits," *IEEE Trans. Appl. Supercond.*, submitted for publication.
- [25] HYPRES Design Rules [Online]. Available: http://www.hypres.com.
- [26] O. A. Mukhanov, "RSFQ 1024-bit shift register for acquisition memory," *IEEE Trans. Appl. Supercond.*, vol. 3, pp. 3102-3113, Dec. 1993.
- [27] D. V. Averin, K. Rabenstein, and V. K. Semenov, "Rapid ballistic readout for flux qubits," *Phys. Rev. B*, vol. 73, 094504, 2006.
- [28] S. V. Polonsky, V. K. Semenov, P. Bunyk, A. F. Kirichenko, A. Kidiyarova-Shevchenko, O. A. Mukhanov, P. Shevchenko, D. Schneider, D. Y. Zinoviev, and K. K. Likharev, "New RSFQ Circuits," *IEEE Trans. Appl. Supercond.*, vol. 3, pp. 2566-2577, Mar. 1993.
- [29] O. A. Mukhanov, S. V. Rylov, V. K. Semenov, and S. V. Vyshenskii, "RSFQ logic arithmetic," *IEEE Trans. Magn.*, vol. MAG-25, no. 2, pp. 857-860, Mar. 1989.
- [30] Y. Yamanashi, I. Okawa, N. Yoshikawa, "Design approach of dynamically reconfigurable single flux quantum logic gates," *IEEE Trans. Appl. Supercond.*, to be published.
- [31] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murry, N. Vallepalli, Y.Wang, B. Zheng, and M. Bohr, "SRAM design on 65-nm CMOS technology with dynamic sleep transistor for leakage reduction," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 895–901, Apr. 2005.

- [32] A. Inamdar, S. Rylov, A. Talalaevskii, A. Sahu, S. Sarwana, D. Kirichenko, I. Vernik, T, Filippov, D. Gupta, "Progress in design of improved high dynamic range analog-to-digital converters," *IEEE Trans. Appl. Supercond.*, vol. 19, pp. 670-675, June 2009.
- [33] T. V. Filippov, A. Sahu, S. Sarwana, D. Gupta, and V. Semenov, "Serially biased components for Digital-RF receiver," *IEEE Trans. Appl. Supercond.*, vol. 19, pp. 580-584, June 2009.
- [34] C. Bennett, "Logical reversibility of computation," *IBM J. Res. Devel.*, vol. 17, pp. 525–532, 1973
- [35] K. K. Likharev, "Dynamics of some single flux quantum devices," *IEEE Trans. Magn.*, vol. MAG-13, pp. 242-244, Jan. 1977.
- [36] W. Hioe and E. Goto, *Quantum Flux Parametron*. World Scientific, 1991.
- [37] K. K. Likharev, S. V. Rylov, V. K. Semenov, "Reversible conveyer computations in arrays of parametric quantrons," *IEEE Trans. Magn.*, vol. MAG-21, pp. 947-950, Mar. 1985.
- [38] V. Semenov, G. Danilov, D. Averin, "Negative-inductance SQUID as the basic element of reversible Josephson-junction circuits," *IEEE Trans. Appl. Supercond.*, vol. 13, pp. 938-943, June 2003.
- [39] S. V. Rylov, V. K. Semenov, K. K. Likharev, "DC powered parametric quantron," in *Proc. Int. Supercond. Electr. Conf. (ISEC)*, Tokyo, Aug. 1987, pp.135-138.
- [40] V. Semenov, J. Ren, Yu. Polyakov, D. Averin, J.-S. Tsai, "Reversible computing with nSQUID arrays," in *Proc. Int. Supercond. Electr. Conf.* (*ISEC*), Fukuoka, June 2009, paper SP-P27.
- [41] J. Ren and V. K. Semenov, "Progress with physically and logically reversible superconducting digital circuits," *IEEE Trans. Appl. Supercond.*, submitted for publication.
- [42] G. I. Meijer, "Cooling energy-hungry data centers," Science, vol. 328, pp. 318-319, Apr. 2010.
- [43] A. M. Kadin, R. J. Webber, and D. Gupta, "Current leads and optimized thermal packaging for superconducting systems on multistage cryocoolers," *IEEE Trans. Appl. Supercond.*, vol. 17, pp. 975-978, June 2007.
- [44] R. J. Webber, J. Delmas, B. H. Moeckly, "Ultra-low heat leak YBCO superconducting leads for cryoelectronic applications", *IEEE Trans. Appl. Supercond.*, vol. 19, pp. 999-1002, June 2009.
- [45] O. K. Kwon, B. W. Langley, R. F. W. Pease, and M. R. Beasley, "Superconductors as very high-speed system-level interconnects," *IEEE Elec. Dev. Lett.*, vol. EDL-8, pp. 582-585, Dec. 1987.
- [46] A. Inamdar, S. Rylov, S. Sarwana D. Gupta, "Superconducting switching amplifiers for high speed digital data links," *IEEE Trans. Appl. Supercond.*, vol. 19, pp. 999-1002, June 2009.
- [47] K. D. Choquette, K. L Lear, R. E. Leibenguth, and M. T. Asom, "Polarization modulation of cruciform vertical-cavity laser diodes," *Appl. Phys. Lett.*, vol. 64, pp. 2767-2769, 1994.
- [48] H. Wei, N. Patil, A. Lin, H.-S. P. Wong, S. Mitra, "Monolithic threedimensional integrated circuits using carbon nanotube FETs and interconnects," in *Proc. IEEE Int. Electr. Dev. Meeting (IEDM)*, Baltimore, 2009, paper 23.5.
- [49] M. T. Niemier, X. S. Hu, M. Alam, G. Bernstein, W. Porod, M. Putney, J. DeAngelis, "Clocking structures and power analysis for nanomagnetbased logic devices", in *Proc. ISLPED* '07, Portland, 2007.