# Low Power Correlator Using Signal Range and Sub Word Based Clock Gating Scheme

\*A. Ranganayakulu<sup>1</sup> and K. Satyaprasad<sup>2</sup>

<sup>1</sup>ECE Department, Krishnachaitanya Institute of Technology and Sciences (KITS), Markapur, Prakasam Dt <sup>2</sup>ECE Department, JNTUK, Kakinada East Godavari Dt., A.P., India Email: ranga.nayakulu.kits@gmail.com

#### Abstract

VLSI designers are being motivated to explore the opportunities in low power design at different levels of abstraction in the fast growing mobile and battery power devices market. Research of the past few decades has been resulted in efficient electronic design automation tools which can be applied at several circuit and device level techniques to reduce power consumption. Research is being conducted to explore new techniques to utilize the application of specific signaling characteristics to reduce the power consumption. Few types of clock gating based power reduction techniques are established in present day EDA tools. The proposed research work presents novel sub word partitioned signal range based clock gating technique, which can be very efficient in signal processing applications. A scalable VHDL model is developed for the Correlator architecture with the proposed clock gating scheme. MATLAB script generated test data is used for functional verification. Xilinx FPGA based synthesis and power analysis tools are employed to analyze the power optimization of proposed architecture. The simulation results demonstrate power optimization without compromising on the performance. The results show power saving up to 31% for narrow band signal input conditions.

*Keywords:* Low power design, clock gating, Correlator, dynamic power, register transfer level (*RTL*), Zynq, Xilinx Power Estimator

## **1. Introduction**

The continuously growing mobile and battery powered devices market is motivating Very Large Scale Integration (VLSI) designers to explore opportunities for low power design at different levels of abstraction. Past few decades' research has resulted in efficient Electronic Design Automation (EDA) tools which can apply several circuit and device level techniques to reduce power consumption. Researchers are exploring techniques to utilize application specific signal characteristics to reduce power consumption. Few types of clock gating based power reduction techniques are established in present day EDA tools, by which the benefits of clock gating scheme are already achieved in present day VLSI designs. However researchers see further possibilities of optimization by exploring the special clock gating based design possibilities at circuit level, block level and architecture level.

The research work published here presents novel sub word portioned signal range based clock gating technique, which can be very efficient in signal processing applications. The present work is continuation to the research carried out [1, 2], where the principle of sub word based clock gating is presented. The work published at [1] describes the principle of subword based clock gating at Register Transfer Level (RTL). Analysis results for power optimized Multiply Accumulate (MAC) block are illustrated. Further the research article at [2] gives transistor level design of proposed clock gating scheme. The power optimization levels under different signal value conditions are presented.

### A. Different Clock Gating Methods

As per several researchers work clock gating is a very effective technique to reduce dynamic power of idle clocking subsystems [3-29]. This section of article describes different types of clock gating methods studied by other researchers.

The work published at [3] shows design of encoder and decoder blocks of communication system with clock gating scheme for power optimization without degrading the system performance. Extending the principle of basic clock gating with several circuit level novel blocks is also under consideration by several researchers. A method with adaptive pulse triggered flip-flop (PTFF) is presented at [4]. The work describes PTFF with dynamic power optimization and robust timing characteristics, leading to improved power delay factor. Authors claim power reduction of 51% based on the HSPICE simulations.

The above described clock gating schemes attempt circuit level improvisations. The present work and below discussed researchers work concentrate on Register Transfer Level (RTL) and sub system level features to utilize the clock gating scheme for overall power reduction. These techniques doesn't replace with the circuit level clock gating techniques, instead both together give very low power solution. In addition the RTL level clock gating schemes better integrate with CAD based high level tool flows.

The work published at [5] compares the power optimizations achieved by different Xilinx FPGA families for implementing latch free clock gating scheme. The work compares power consumptions of Spartan-3, 3E,6, Virtex -4,5,6 and Artix-7 devices and concludes power saving above 92% is possible with clock gating scheme. The category of clock gating schemes based on instantaneous value are under continuous study by several researchers [3]. The paper at [6] compares different clock gating techniques which are used in current designs and describes possible applications of each scheme.

The work given at [7], discusses the 16X16 pipelined MAC architecture using Baugh-Wooley multiplier algorithm with high performance multiplier tree, together with clock gating the idle pipeline stages to reduce the power consumption. Simulation results show more than 30% power saving compared with other MAC architectures without clock gating.

The work given at [8], Discusses the grouping of Flip-flops to share a common clock enabling signal to maximize the power reduction. This paper explains the data-driven clock gating, employed for FFs at the gate level and the clock signal driving a FF is gated in the next clock cycle when the FFs state is not supposed to change. The work given at [9], discusses the low power design by Fine-Grained Dynamic Clock Gating with fully non-binary LDPC decoder which offers better coding gain and lower error floor than binary LDPC codes. This paper also gives the information regarding energy efficiency per Iteration.

The work given at [10], presents a methodology that aims at significantly reducing power consumption of streaming application designs based on asynchronous queued blocks. This approach based on controlling the top-level clock of the FPGA by using its clock buffers and gating is coarse-grained because clocks are switched off for relatively large portions of the design. The work given at [11], discusses the look-ahead clock gating technique which is the combination of three gating methods (synthesis based, a data-driven method, Auto-gated flip-flops) for reducing the clock switching power.

The work given at [12], discusses the power analysis for ALU in Network processors after clock gating technique and reduction in dynamic power which is carried for lower frequencies. The work given at [13], discusses Sequential Equivalence Checking (SEC) for Clock-Gated Circuits with different theorems. It also introduces several methods for sequential equivalence checking (SEC), effective in certain special cases resulting in

considerable reduction of computational effort. The work given at [14], discusses the various clock gating techniques such as AND gate, Latch, flip-flop, mux based along with their advantages and disadvantages. It also discusses six types of advanced gating schemes which are classified on basis of enable signal in to two groups. These techniques are used when we consider n number of registers.

The work given at [15], discusses how to reduce the dynamic power in synchronous circuits by using clock gating technique. The work given at [16], discusses the dynamic power reduction by reducing the circuit area, Here AND gate based clock gating circuit for three bit full adder is designed and more than 50% clock power is reduced. The work given at [17], discusses the low power design of shift registers using D flip-flops along with Clock and Power gating integration which reduces dynamic and active leakage power respectively. For this purpose two techniques discussed named Optimized Bus-Specific-Clock-Gating (OBSC) and Run Time Power Gating (RTPG) using Tanner tools.

The work given at [18], discusses a design methodology for reducing ASIC power consumption through use of the RTL clock gating feature in Synopsys Power Compiler. RTL clock gating made it possible for the ease-of-use and portability benefits of using a synchronous design style while maintaining the relatively low power consumption of the non-synchronous versions of the designs. The work given at [19], discusses the different clock gating techniques in which five techniques are existing and a new technique is also suggested that provides more immunity to the existing techniques.

The work given at [20], discusses the clock gating technique to achieve the low power dissipation. Merge and Split clock gated concepts were applied in this paper to find the low power dissipation. This method is suited for the design with millions of transistors. The work given at [21], discusses the Reduced Instruction set Architecture (RISA) which can handle multiple interrupts using clock gating technique. This technique mainly concentrates on reducing power and establishing serial communication between two systems.

The work given at [22], discusses the new clock gating techniques for two novel low power flip-flops. These flip-flops use new gating techniques which reduces the power dissipation. The work given at [23], discusses the adaptive clock gating (ACG) technique for low power IP-core design based on software control. ACG can automatically enable or disable the IP clock to reduce not only dynamic power but also leakage power with power gating technique. The work given at [24], discusses the existing clock gating methods and the methodologies used for reducing the power consumption as compared to non gated circuit.

#### **B.** Correlator Applications

The research work presented here implements the Correlator with novel architecture and subword optimization scheme to achieve low power implementation. Correlator being most widely used resource and power consuming block in signal and image processing applications, it is selected for demonstrating the proposed subword based clock gating scheme.

Correlation is a mathematical operation that is very similar to convolution. The Correlator output for i<sup>th</sup> index is generated by performing the multiply-accumulate operation of two signals which are relatively shifted by i samples. If a signal is correlated with itself, the resulting signal is called autocorrelation, where as correlation output of two different signals is called cross-correlation.

Similarity of signals can be computed with correlation. Normalized cross correlation is commonly used in template matching applications to compare images. The correlation function of a signal also gives information about repetitiveness of patterns within the signal.

Correlation block also key in communication systems development. The heart of spread spectrum system is Correlator which matches the local Pseudo Noise (PN) code

with the incoming signal to find the code offset and further shift local code's phase [25]. In all data aided symbol timing recovery techniques and frame synchronizers correlation block is crucial module. Even in complex telemetry applications for multi channel data alignment [26] the Correlator block is widely used. The core of RADAR signal processing relies on correlation of local copy of transmitted signal and received echo signal.

In biomedical signal analysis for different pattern matching and image processing applications the correlation operation is used. Several of above discussed applications need to design for battery powered and hand held devices. Power saving is crucial to meet this kind of requirements. Timing and synchronization is another area of application, where correlation is key operation. In this synchronization application transmitter sends a known synchronization signal periodically so that the receiver can use it as a point of reference. It correlates the incoming signal stream against the known sequence, and when the correlation peak is above a certain threshold, it will use the peak to establish symbol timing and make decisions on the symbols to extract a binary sequence.

Several researchers [27-29] studied about speed and power efficient Correlator architectures. The Correlator is main component in spread spectrum communications [27] due the necessity of long PN sequence delay estimation requirements. The research work published at [27] has pipelined architecture for Correlator achieving the maximum clock speed of 87 MHz, while consuming 750 milli watt power, with Complementary Metal Oxide Semiconductor (CMOS) VLSI technology. The work given at [28] presents another Correlator architecture which is exclusively optimized considering the PN sequence characteristics, demonstrating power optimization. The necessity of large radio astronomy Correlator is presented at [29]. The paper presents MAC blocks sharing architecture to achieve low power implementation.

With the literature survey carried out, it is noticed no other researchers presented Correlator architecture with subword based clock gating scheme. It is also noticed that none of the researchers explored the power saving opportunity in signal processing applications, under narrow band and low amplitude signals. The remaining part of this paper explains the subword clock gating scheme, optimized Correlator architecture, simulation results, synthesis report and power analysis.

## 2. Proposed Subword based Clock Gating Method

The power dissipation in VLSI circuit consists of primarily two components as given in equation (1)

$$P_{\text{Total}} = P_{\text{Dynamic}} + P_{\text{static}} \tag{1}$$

The power dissipation in digital circuits is classified into static and dynamic. The power is dissipated when the circuit is powered up with no input or output signals by changing their values is called static power. Dynamic power is the power consumed due to changes in node voltages and resulting charging and discharging cycles of associated node capacitances. Clock gating methods achieve power saving by avoiding unnecessary transitions in the circuit. For this the sequential blocks are supplied with gated clock. The gated clock is generated by taking AND operation of clock and the gating signal as shown in Figure 1. At Register Transfer Level (RTL) of abstraction the design is partitioned as combinational logic blocks between register stages. Clock gating realization scheme at this level is illustrated in Figure 1.



Figure 1. Clock Gating Scheme for Typical RTL Design

The criteria for generating the gating signals are different among various clock gating methods. The gating signal when it is '1', the clock is applied as it is to the sequential block. The flip-flops (FFs) function in normal way. Further the combinational logic will generate outputs, which are used as inputs to the next stage of sequential block. When the gating signal is '0', the clock input to sequential block is zero (disabled). This disabled state of the FFs will result in no output change. Since the inputs are constant to the combinational logic the outputs remain the same. The disabled clock to the sequential block avoids unnecessary transitions and glitches in flip flops and further combinational logic stages. As a result the dynamic power dissipation gets reduced. The power saving mechanism by different clock gating methods is similar. However, the principle of gating signal generation differs among these methods.

The proposed clock gating scheme is based on detecting the presence of the information in the magnitude bits in a particular sub word out of a given word. A 2's complement signal with n+1 bit width is considered. The below figure illustrates the sub word representation of the n+1 bit signal, where each sub word has m=(n/p) bits. The n bit magnitude word is divided into p sub words, each representing a particular part in the full dynamic range of signal [1].



Figure 2. (n+1) bit Register Represented as Sub Words

The presence of information is checked in each sub-word to generate the enable for clock gating. The signal X represented with n+1 bits can be described as in equation (2). The sub-word based representation of the same signal is given in equation (3).

$$X = \{ b_{n}, b_{n-1}, b_{n-2}, \dots, b_{1}, b_{0} \}$$
(2)

 $X = \{ b_{n,} \{ b_{m-1,} \dots b_0 \}_{p-1}, \{ b_{m-1,} \dots b_0 \}_1, \{ b_{m-1,} \dots b_0 \}_0, \}$ (3)

The NO Information (NOI) flag at ith sub word  $\{0 < i < p-1\}$  is considered as "1" when all the m bits in sub word  $\{b_{m-1}, \ldots, b_0\}$  are equal to sign bit  $b_n$  and also all the high significant jth (i+1 < j < p-1) sub words has NIO flag "1". The no information

flag at ith stage can be computed by using the OR gate in case of positive signals and NAND gate for negative numbers. In case of signal changing from negative to positive and vice versa the NOI flag must be set to zero.

#### 3. High Level Architecture of Correlator

This section presents high level architecture of Correlator. There are several possible ways of implementing correlation operation. The following are widely discussed implementation variants in VLSI implementation of regular DSP algorithms. [27] Full parallel implementation where all the outputs of Correlator are simultaneously computed. [28] Serial implementation where the signals are initially stored in memories and then only one multiplication followed by accumulation is used to compute the correlation [29]. Serial-Parallel implementation where some extent of parallelism is employed to reduce the computation time when compared to serial and also to achieve low area solution when compared to parallel architectures.

The proposed architecture is here comes under category of serial parallel architecture. The DSPs, FPGAs and ASICs are provided with high speed Multiply-Accumulate (MAC) blocks to provide high speed implementation of digital filtering. To take advantage of this high speed data paths the proposed architecture here uses transposed FIR filter as basic block. The equation (4) gives the correlation function which is to be computed by the Correlator. The Figure 3, shows the high level architecture for Correlator which is implemented around transposed FIR filter.

 $corr(i) = \sum_{i=-M}^{M} \sum_{j=0}^{N-1} x[j]y[i+j]$ 



Figure 3. High Level Architecture of Correlator

The transposed FIR architecture implements pipeline stages between each adder stage, to support high speed designs. Hence, the pipeline delay of transposed FIR architecture is N samples. The correlation start is enable signal for the algorithm to start computing the correlation function. On the rising edge of the correlation start it is considered that the first sample of X and the first sample of Y are applied. The pipeline delay block delays the correlation start signal to match the delay of transposed FIR filter and produces correlation out valid signal. Correlation out valid signal stays high for 2M+1 samples during which the corresponding correlation vector address in the range of –M to +M and correlation value will be generated. All this controlling is achieved through state machine, which is discussed below.

#### A. FSM (Finite State Machine) Controller

The transposed FIR filter is added with a state machine controller to realize X-Y sample Correlator. The below Figure shows the state diagram of implemented Finite State Machine (FSM). From Top module start= '1' issued, the correlation begins. With this

pulse the state machine moves to "INIT" state. After one clock cycle, state machine automatically moves to "LOAD" state. In load state, we have a counter mod=N+2M+1.

Once the state machine is in load state, it starts the counter which continuously for to N+2M+1 clock cycles. Once the counter counts up to N clock cycles, the state machine moves to "MAC" state. In the MAC [7] cycle continuously multiplications, accumulations are performed to produce the correlation function. Correlation function consists of three signals- First, correlation enable which will be generated during last 2M+1 clock cycles (for correlation index -M to +M including zero). Second, the corr\_index (correlation index) which designates the index of correlation function. Third, the corr\_val (correlation value) which is the magnitude of the correlation function. After completing the MAC [7] operation, the state machine automatically moves to end state and waits until the next state signal comes.



**Figure 4. Finite State Machine** 

## 4. Simulation and Power Analysis

### **A. Functional Verification**

The registers of MAC [7] block in Correlator are replaced with subword based clock gated registers. The functionality of this subword clock gating based Correlator is verified with simulation using Modelsim tool. The Figure below shows the simulation results for test condition of delay value applied with 10. The correlation function (Y) peak can be observed at index value of 10.

International Journal of Hybrid Information Technology Vol.9, No.3 (2016)



Figure 5. Simulation Results for Subword Clock Gated Correlator for Delay Value 10

The functional validation of Correlator proves that the clock gating and sub word level partitioning of register are not affecting the main functionality. As per the current implementation sub word the least significant sub word enable is always enabled and the remaining enable signals of the most significant sub words get computed from enable generation logic [2]. It is observed that for input components of low frequency (for high numerical period) the most significant sub words are off for higher amount of time within one sine wave period. Hence the proposed scheme is more effective in power saving, for low frequency signals with less peak amplitude.

#### **B** Power Analysis

The proposed clock gating architecture based Correlator is analyzed with Xilinx's Xpower tool. The following step by step approach is followed to perform power analysis for the clock gated Correlator architecture.

- 1. In Xilinx ISE, invoke option "Generate post place and route simulation model". It generates NCD (Native circuit description), PCF (Physical constraints file) files and post P&R simulation model.
- 2. The timesim.vhd is created in the project directory \netgen\par\timesim.vhd
- 3. Compile this timesim.vhd file first then test bench in modelsim 6.6b.
- 4. Log all signals data (log command) and run the simulation.
- 5. Save waveform information in a vcd file.
- 6. The created three files (NCD, PCF and VCD) use along with XPower tool and perform power analysis.
- 7. Repeat the procedure for design without clock gating.

As per the above procedure the results obtained using XPower for without and with clock gating are shown below respectively.

International Journal of Hybrid Information Technology Vol.9, No.3 (2016)

| Xilinx XPower Analyzer - E:/ST_2                                                                                                                           | 014/services/cad_l                                                    | ow_power_Aug_20: | 15/w | ithout_cloo | :k_gating/t | op_corr7.ncc   | - [Table V | ew]             |                 |                |               |                  |             |   | _ 8 ×    |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|------------------|------|-------------|-------------|----------------|------------|-----------------|-----------------|----------------|---------------|------------------|-------------|---|----------|
|                                                                                                                                                            |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| X                                                                                                                                                          | A                                                                     | В                | C    | D           | E           | F              | G          | н               | I J             | к              | L             | м                | N           | 1 |          |
| View 🔺                                                                                                                                                     | Device                                                                |                  | Í.   | On-Chip     | Power (W)   | Used           | Available  | Utilization (%) | Supply          | Summary        | Total         | Dynamic          | Quiescent   |   |          |
| Default Activity Rates                                                                                                                                     | Family                                                                | Virtex6          | 4    | Clocks      | 0.018       | 1              | -          |                 | Source          | Voltage        | Current (A)   | Current (A)      | Current (A) |   |          |
| Confidence Level                                                                                                                                           | Part                                                                  | xc6vcx130t       | -    | Logic       | 0.070       | 38066          | 8000       | 48              | Vccint          | 1.000          | 1.204         | 0.140            | 0.075       |   |          |
| ⊡- Details                                                                                                                                                 | Temp Grade                                                            | Commercial -     | 1    | IOs         | 0.012       | 52             | 24         | 22              | Vcco25          | 2.500          | 0.006         | 0.005            | 0.001       | 1 |          |
| ··· PB By Hierarchy                                                                                                                                        | Process                                                               | Typical 💌        | -    | Leakage     | 1.813       | 8              |            |                 | MGTAVcc         | 1.000          | 0.303         | 0.000            | 0.303       |   |          |
| ··· 🧭 By Clock Domain                                                                                                                                      | Speed Grade                                                           | -2               |      | Total       | 1.965       | 5              |            |                 | MGTAVtt         | 1.200          | 0.213         | 0.000            | 0.213       |   |          |
| 🖻 🦓 By Resource Type                                                                                                                                       | Environment                                                           |                  |      |             | _           | Effective T.IA | May Ambien | Junction Terms  |                 | _              | Total         | Dynamic          | Quiescent   | 1 |          |
| Logic                                                                                                                                                      | Ambient Temp (C)                                                      | 50.0             |      | Thermal     | Properties  | (C/W)          | (C)        | (C)             | Supply          | Power (W)      | 1.965         | 0.152            | 1.813       |   |          |
| Data                                                                                                                                                       | Use custom TJA?                                                       | No               | 3    |             |             | 2.5            | 80.1       | 54.9            |                 |                |               |                  |             |   |          |
| E- Control                                                                                                                                                 | Custom TJA (C/W)                                                      | NA<br>250        |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Clock Enab -                                                                                                                                               | Heat Sink                                                             | Medium Profile   | -    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
|                                                                                                                                                            | Custom TSA (C/W)                                                      | NA               | 1    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Color Source                                                                                                                                               | Board Selection                                                       | Medium (10"x10") | 1    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Default                                                                                                                                                    | # of Board Layers<br>Custom TJB (C/W)                                 | NA               | 4    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Calculated                                                                                                                                                 | Board Temperature (C                                                  | NA               |      |             |             |                |            |                 |                 |                |               |                  |             |   | <b>•</b> |
|                                                                                                                                                            | The Power Analysis is up to date.                                     |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
|                                                                                                                                                            | (1) Place mouse over the asterisk for more detailed BRAM utilization. |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
|                                                                                                                                                            |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| 1 known                                                                                                                                                    |                                                                       |                  | _    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| views                                                                                                                                                      | Table View                                                            |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| ×                                                                                                                                                          |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Finished Running Vect                                                                                                                                      | cor-less Activ:                                                       | ity Propagatio   | n    |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Finished Running Vector-less Activity Propagation 4 secs                                                                                                   |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Matchen ys (y1291/9042) of design nets<br>Matchen 75% (7257)/9642) of simulation nets                                                                      |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
|                                                                                                                                                            |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
| Design 'E:/ST_2014/services/cad_low_power_Aug_2015/without_clock_gating/top_corr7.ncd' and constraints 'E:/ST_2014/services/cad_low_power_Aug_2015/without |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  | .5/withou   |   |          |
|                                                                                                                                                            |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |
|                                                                                                                                                            |                                                                       |                  |      |             |             |                |            |                 | (               |                |               |                  | 1           |   |          |
| Country D. 1. W. 1.                                                                                                                                        |                                                                       |                  |      |             |             |                |            |                 | A USB<br>One of | of the USB dev | ices attached | d to this comp   | uter has    |   |          |
| Console Report Warning                                                                                                                                     |                                                                       |                  |      |             |             |                |            |                 | malfu<br>For a  | nctioned, and  | Windows doe   | es not recogni   | ize it.     |   |          |
| Ready                                                                                                                                                      | 1                                                                     |                  |      | - L         | - 11        | - (            |            | ( ( m           | L ruia          |                | ang als pro   | oreany click the | s message.  |   |          |
| Arstart 🤔 📴 📑 🔲 📰 🦉 🔕 🚳 📣 🧇 😪 🎇 🖉 🕅 🔛 🍃 🕼 🔹 🗚 🚱 😵 🖉                                                                                                        |                                                                       |                  |      |             |             |                |            |                 |                 |                |               |                  |             |   |          |

Figure 6. Xpower Analysis without Clock Gating

| Xilinx XPower Analyzer - E:/ST_2                                                                                                                           | 014/services/cad_l                                                  | ow_power_Aug_2015  | /with_clock | _gating/top_  | corr7.ncd - ( | Table View  |                 |         |           |             |             |             |          |           |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|--------------------|-------------|---------------|---------------|-------------|-----------------|---------|-----------|-------------|-------------|-------------|----------|-----------|
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| X                                                                                                                                                          | A                                                                   | B                  | Ср          | E             | F             | G           | н               | [] []   | K         | [L          | M           | N           | [        | <b></b>   |
| View                                                                                                                                                       | Device                                                              | -                  | On-Chip     | Power (W)     | Used          | Available   | Utilization (%) | Suppl   | Summary   | Total       | Dynamic     | Quiescent   |          |           |
| Default Activity Kates                                                                                                                                     | Family                                                              | Virtex6            | Clocks      | 0.018         | 128           | -           |                 | Source  | Voltage   | Current (A) | Current (A) | Current (A) |          |           |
| E-Summary                                                                                                                                                  | Part                                                                | xc6vcx130t         | Logic       | 0.040         | 15076         | 80000       | 19              | Vccint  | 1.00      | 0 1.155     | 0.092       | 1.06        |          |           |
| Cu Detaile                                                                                                                                                 | Package                                                             | #484               | Signals     | 0.034         | 17323         |             |                 | Vccaux  | 2.50      | 0.075       | 0.000       | 0.07        |          |           |
| By Hierarchy                                                                                                                                               | Process                                                             | Typical            | Leakage     | 1.811         | 00            | 240         | 20              | MGTAVec | 2.50      | 0.000       | 0.004       | 0.00        |          |           |
| By Clock Domain                                                                                                                                            | Speed Grade                                                         | -2                 | Total       | 1.915         |               |             |                 | MGTAVI  | 1.20      | 0.213       | 0.000       | 0.21        |          |           |
| - W By Resource Type                                                                                                                                       |                                                                     |                    |             |               | ,             |             |                 |         |           |             |             |             | _        |           |
| - Logic                                                                                                                                                    | Environment                                                         | _                  |             |               | Effective TJA | Max Ambient | Junction Temp   |         |           | Total       | Dynamic     | Quiescent   |          |           |
| ⊕- Signals                                                                                                                                                 | Ambient Temp (C)                                                    | 50.0               | Them        | al Properties | (C/W)         | (C)         | (C)             | Supply  | Power (W) | 1.915       | 0.104       | 1.81        |          |           |
| Data                                                                                                                                                       | Use custom TJA?                                                     | No 💌               |             |               | 2.5           | 80.2        | 54.8            |         |           |             |             |             |          |           |
| ⊡-Control                                                                                                                                                  | Airflow (LEM)                                                       | 250                |             |               |               |             |                 |         |           |             |             |             |          |           |
| - Clock Enab                                                                                                                                               | Heat Sink                                                           | Medium Profile     |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            | Custom TSA (C/W)                                                    | NA                 |             |               |               |             |                 |         |           |             |             |             |          |           |
| Color Source                                                                                                                                               | Board Selection                                                     | Medium (10"x10") 💌 |             |               |               |             |                 |         |           |             |             |             |          |           |
| Estimated                                                                                                                                                  | # of Board Layers                                                   | 12 to 15 💌         |             |               |               |             |                 |         |           |             |             |             |          |           |
| Default                                                                                                                                                    | Custom TJB (C/W)                                                    | NA<br>NA           |             |               |               |             |                 |         |           |             |             |             |          | •         |
|                                                                                                                                                            | The Power Analysis is us to date                                    |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            | The Former Principal and up to Date.                                |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            | C Place mouse over the asteriak for more detailed BRAM utilization. |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Views                                                                                                                                                      | Table View                                                          |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| ≝                                                                                                                                                          |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          | <u> </u>  |
| Finished Running Vect                                                                                                                                      | tor-less Activ                                                      | ity Propagation    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Finished Running Vector-less Activity Propagation 2 secs                                                                                                   |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Matchied 318 (35972/3572) ol design nets                                                                                                                   |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Abolica Sto (Star) Star Star Star Star Star Star Star Star                                                                                                 |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Design 'E:/ST_2014/services/cad_low_power_Aug_2015/with_clock_gating/top_corr7.ncd' and constraints 'E:/ST_2014/services/cad_low_power_Aug_2015/with_clock |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             | ith_cloc |           |
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          | •         |
| •                                                                                                                                                          |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          | •         |
| Console Report Warning                                                                                                                                     | Error                                                               |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
| Death                                                                                                                                                      |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          |           |
|                                                                                                                                                            | 1                                                                   |                    |             |               |               |             |                 | - I -   | ( <u></u> |             |             |             |          |           |
| 🖉 Start 🔗 📴 📑                                                                                                                                              |                                                                     |                    | 🗐 📣         | . 📢 🛛         | S 🚞           | 1           | м /             |         |           |             |             | *           | (1) 🗑 😼  | 9:04 PM   |
|                                                                                                                                                            |                                                                     |                    |             |               |               |             |                 |         |           |             |             |             |          | 11 Mug-10 |

Figure 7. Xpower Analysis with Clock Gating

Based on the simulation results, the power saving can be computed by using equation (5). Since the dynamic power is the effected component due to clock gating, the power analysis is considered only for reported dynamic power.

Power saving (%) =  $\frac{\text{(Dynamic power without clock gating - Dynamic power with clock)}}{\text{Dynamic power without clock gating}} * 100$ 

Based on Xpower results, substituting the 152 mW and 104 mW for without and with clock gating methods the power saving will be (0.152 - 0.104)\*100/0.152 = 31.58 %.

## **5.** Conclusion

Scalable architecture for signal value based clock gating is presented in this research work. The signal width is divided into sub words where each subword is enabled with separate clock gating signal. The relationship between clock gating signals of adjacent sub words is fully utilized in realizing area efficient clock gating scheme. By reducing the area overhead for clock gating logic the penalty on increased static and leakage power is less when compared to the saved dynamic power. The Correlator architecture is implemented with proposed subword based clock gating scheme. It is proved that the functionality is not effected with clock gating scheme, as only subwords with no information are disabled. The analysis shows 31% power saving when compared with no clock gating scheme. The work demonstrates novel principle of subword based clock gating scheme and its application for DSP architectures.

## Acknowledgment

The authors wish to thank staff of ECE Dept., Of their institutes, for their support in the research work.

## References

- A. Ranganayakulu and K. Satyaprasad, "Subword Partition based Data Driven Clock Gating Scheme for Low Power VLSI Design", International journal of computer applications, vol. 108.
- [2] A. Ranganayakulu and K. Satya Prasad, "Sub word Partitioning and Signal Value based Clock gating Scheme for Low Power VLSI Applications", International press corporation, (2015) May.
- [3] K. Sahni, K. Rawat, K. Pandey and Z. Ahmad, "Power Optimization of Communication System Using Clock Gating Technique", Advanced Computing & Communication Technologies (ACCT), 2015 Fifth International Conference on, (2015) February 21-22, pp. 375, 378.
- [4] K. Kavali\*, S. Rajendar and R. Naresh, "on Design of Low Power Adaptive Pulse Triggered Flip-Flop Using Modified Clock Gating Schemeat 90nm Technology", 2nd International Conference on Nanomaterial's and Technologies (CNT 2014), vol. 10, (2015), pp. 323–330.
- [5] N. Gupta, "Clock Power Analysis of Low Power Clock Gated Arithmetic Logic Unit on Different FPGA", Computational Intelligence and Communication Networks (CICN), 2014 International Conference on, (2014) November 14-16, pp. 913, 916.
- [6] N. Anand, G. Joseph and S. S. Oommen, "Performance analysis and implementation of clock gating techniques for low power applications", Science Engineering and Management Research (ICSEMR), International Conference on, (2014) November 27-29, pp. 1,4.
- [7] R. Warrier, C. H. Vun and W. Zhang, "A low-power pipelined MAC architecture using Baugh-Wooley based multiplier", Consumer Electronics (GCCE), 2014 IEEE 3rd Global Conference on, (2014) October 7-10, pp. 505,506.
- [8] S. Wimer and I. Koren, "Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating", Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 22, no. 4, (2014) April, pp. 771,778.
- [9] Y. S. Park, Y. Tao and Z. Zhang, "A Fully Parallel Nonbinary LDPC Decoder With Fine-Grained Dynamic Clock Gating", Solid-State Circuits, IEEE Journal of, vol. 50, no. 2, (2015) February, pp. 464,475.
- [10] E. Bezati, S. C. Brunet, M. Mattavelli and J. W. Janneck, "Coarse grain clock gating of streaming applications in programmable logic implementations", Electronic System Level Synthesis Conference (ESLsyn), Proceedings of the, (2014) May 31-June 1, pp. 1, 6.
- [11] V. Murlidharan, M. Vignesh and M. Varatharaj, "clock gating based on auto-gated flip-flops", clock gating based on auto-gated flip-flops, IISN: 2349-252X, (2014).

(5)

- [12] R. Kulkarni and S. Y Kulkarni, "Implementation of clock gating technique and performing power analysis for processor engine (ALU) in network processors", Electronics and Communication Systems (ICECS), 2014 International Conference on, (2014) February 13-14, pp. 1, 5.
- [13] H. Savoj, D. Berthelot, A. Mishchenko and R. Brayton, "Combinational techniques for sequential equivalence checking", Formal Methods in Computer-Aided Design (FMCAD), (2010) October 20-23, pp. 145,149.
- [14] H. Chaudhary, N. Goyal and N. Sah, "Dynamic Power Reduction Using Clock Gating: A Review", Available at http://www.iject.org/vol61/1/4-Himanshu-Chaudhary.pdf.
- [15] R. Neelam and A. Prakash, "Clock Gating for Dynamic Power Reduction in Synchronous Circuits", International Journal of Engineering Trends and Technology (IJETT), vol. 4, Issue 5, (2013) May.
- [16] Padmini, G. Kaushik, Sanjay, M. Gulhane and A. R. Khan, "Dynamic Power Reduction of Digital Circuits by Clock Gating", International Journal of Advancements in Technology, ISSN 0976-4860, vol. 4, (2013) March.
- [17] D. K. Rao and T. R. Pale, "Low Power Register Design with Integration Clock Gating and Power Gating", International Journal of Application or Innovation in Engineering & Management (IJAIEM), vol. 3, Issue 10, (2014) October, ISSN 2319 – 4847.
- [18] F. Emnett and M. Biegel, "Power Reduction through RTL Clock Gating", Automotive Integrated Electronics Corporation, SNUG San Jose, (2000).
- [19] J. Kathuria, M. Ayoubkhan and A. Noor, "A review of clock gating Techniques", Available at http://www.mitpublications.org/yellow\_images/1315565167\_logo\_13.pdf.
- [20] K. Hariharan and C. J. Kumar, "Clock gating for low power circuit design by Merge and split methods", IOSR Journal of Engineering, vol. 2, no. 4, (2012) April, pp. 577-581.
- [21] M. Kamaraju and G. Chinavenkateswararao, "low power reduced instruction set architecture using clock gating technique", Available at http://airccse.org/journal/vlsi/papers/4513vlsi03.pdf.
- [22] A. G. M Strollo, E. Napoli and D. De Caro, "New clock-gating techniques for low-power flip-flops", Low Power Electronics and Design, ISLPED Proceedings of the International Symposium on, (2000), pp. 114, 119.
- [23] X. Chang, M. Zhang, G. Zhang1, Z. Zhang and J. Wang, "Adaptive Clock Gating Technique for Low Power IP Core in SoC Design", IEEE International Symposium (ISCAS), (2007), pp. 2120 – 2123.
- [24] K. Saurabh and M. B. Mali, "A Review of Clock Gating Techniques in Low Power Applications", International Journal of Innovative Research in Science, Engineering and Technology, vol. 4, Issue 6, (2015) June, ISSN (Online): 2319-8753 ISSN (Print): 2347-6710.
- [25] J. Sun, Y. Ding, X. Wang and X. Wu, "Pilot design of data-aided carrier synchronization for short burst transmission", in Signal and Information Processing (China SIP), 2014 IEEE China Summit & International Conference on, (2014) July 9-13, pp. 641-645.
- [26] "Multi channel bit sync and best source selector", Unistring Tech Solutions Pvt. Ltd., available at http://www.unistring.com/prod\_telemetry\_bss.html.
- [27] S. Kulkarni, P. Mazumder and G. I. Haddad, "A high-speed 32-bit parallel correlator for spread spectrum communication in VLSI Design", Proceedings. Ninth International Conference on, (1996) January 3-6, pp. 313-315.
- [28] K. Ma, Y. Hu, W. Meng, C. Yun and X. Zeng, "Specialized convolver & correlator design for PN sequence in Chinese DTMB system", in Solid-State and Integrated Circuit Technology (ICSICT), 12th IEEE International Conference on, (2014) October 28-31, pp. 1-3.
- [29] D. Addario, "Low-power architectures for large radio astronomy correlators", in General Assembly and Scientific Symposium, XXX<sup>th</sup> URSI, (2011) August 13-20, pp.1-4.

## Author



**A. Ranganayakulu**, He obtained his bachelor's degree from Nagarjuna University and Masters degree from JNT University. Areas of interest are VLSI Design, VHDL Programming, Digital system design, Low VLSI design. He is having experience of 22 years in the field of teaching. Presently working as Associate Professor of ECE Department, KITS, Markapur, A.P. INDIA. International Journal of Hybrid Information Technology Vol.9, No.3 (2016)



**K. Satya Prasad**, He is working as professor in ECE Department, JNTUK, Kakinada, India. He received his Ph. D from IIT, Madras. He has more than 34 years of experience in teaching and 25 years of R&D. He is an expert in Digital Signal Processing. He guided 14 Ph. D"s and guiding 10 Ph. D Scholars. He authored Electronic Devices and Circuits, Network Analysis and Signals& systems text books. He held different positions in his carrier like Head of the Department, Vice Principal, Principal for JNTU Engineering College and Director of Evaluation & Rector of JNTUK. He published more than 100 technical papers in National and International Journals and Conferences.