Physical Design Flow III:Clock Tree Synthesis

I. NetlistIn & Floorplan
II. Placement

For synchronized designs, data transfer between functional elements are synchronized by clock signals. In a top level digital design, you will have one more more clock sources, like PLLs or oscillators within the chip. You may also have an external clock source connection through an IO. For a digital only block, you will have a clock pin that will be the clock source for the block in question. Clock balancing is important for meeting the design constraints and clock tree synthesis is done after placement to achieve the performance goals.

After placement you have positions of all the cells, including macros and standard cells. However, you still have an ideal clock. (For simplicity, we will assume that we are dealing with a single clock for the whole design). At this stage, buffer insertion and gate sizing and any other optimization technique is employed on the data paths, but no change is done to the clock net.

The same clock net connects all the synchronous elements in the design, irrespective of the number.

This is how your design’s clock network is at this point.

clock net before CTS
clock net before CTS

This is definitely not something we want. Think just about the load of one clock net. No driver can drive that many flops! But when it is a synchronising signal like clock, load or fanout is not the only thing we are worried about. We also want a “balanced” tree, that is the skew value for the clock tree should be zero. After clock tree synthesis, the clock net will be buffered as below.

clocktree
Clock Net After CTS.

The main concerns in CTS are:

  1. Skew – One of the major goals of CTS is to reduce clock skew.
    Let is see some definitions before we go into clock skew.

    • Clock Source

      Clock sources may be external or internal to your chip/block. But for CTS, what we are concerned about is the point from where the clock propagation starts for the digital circuitry. The can be a IO port, outputs or PLL,Oscillators, or even the outputs of a gate down the line. (e.g a mux output).A clock source for CTS may also be specified using ‘create_generated_clock’ command. This defines an internally generated clock for which you want to build a separate tree, with it’s own skew, timing and inter-clock relations.

      You specify the clock source(s), using the command create_clock.

    • Clock Sinks
      Sinks or clock stop points are nodes which receive the clock. Default sinks are the clock pins of your synchronous elements like Flipflops.

    Now let us define skew as the maximum difference among the delays from the clock source to clock sinks..

    clockskew

    In the picture above, the delay to clock sinks are given. The skew in this case is the difference between the maximum delay and minimum delay.
    Skew = 20ns-5ns = 15ns

    The goal of clock tree synthesis is to get the skew in the design to be close to zero. i.e. every clock sink should get the clock at the same time.

  2. Power – Clock is a major power consumer in your design. Clock power consumption depends on switching activity and wire length. Switching activity is high, since clock toggles constantly. Clock gating is a common technique for reducing clock power by shutting off the clock to unused sinks. Clock gating per se is not done in layout; it should be incorporated in the design. However,lock tree synthesis tools can recognise the clock gates, and also do a power aware CTS.
    clockgating

    In the picture above, FF1 gets the ungated clock CLK, and FF2 and any subsequent flop gets a gated clock. This clock is turned on only when the signal EN is present. (See ICG cells)

Make sure that you specify the clock as propagated at CTS stage. i.e. instead of ideal delay for clock, you are now calculating the actual delay value for the clock. This will in turn give you a more realistic report of the timing of the design. You can propagate the clock using the command set_propgated_clock [all_clocks]

IV. Routing
V.Physical Verification

Sini Mukundan

Sini Mukundan

Staff Engineer at Texas Instruments
Sini is an expert on physical design flow and related methodologies. Outside work, she is an avid reader and generally loves being lazy.
Sini Mukundan

Latest posts by Sini Mukundan (see all)

44 Comments

    1. I am not sure about your question, but some of the things I follow are:

      1. Your CTS tool probably has a clock browser or interactive clock tree browser. Use this to look at any unwanted buffer chains you have. This will help you pinpoint issues with your clock tree.

      An example..
      Two branches of tree, but one clock sink has a very high insertion delay before CTS.(Maybe due to gates, muxes etc). CTS will then be required to match this delay in all sinks(plus any extra delay due to synchornising.). This can then be addressed preferably by changing the design OR making the clock tree synthesis ignoring this path.

  1. farjana

    thanks sir….

    hi,

    i am doing my project in title ” NBTI induced clock skew reduction in clock trees” and i am referring IEEE paper ” Skew management of NBTI impacted gated clock trees”.  I need to know how to reduce the clock skew, by using NAND and NOR gates.  In this paper they mentioned NAND and NOR gate usage. Can u please give me a solution.

     

  2. srimanth

    Iam newbie to vlsi industry,Could you please address my doubts.
    1)What are the inputs for a tool to calculate power and how it calculates power.
    2)In STA for nanometer designs text book it is explained that in the name of power LUT we place internal Energy if so how to calculate internal Energy.
    3)How Does the Analog macros are interfaced with the Digital Counter parts in generating power and timing numbers.
    4)How the worst case slew and load are determined in generating the .lib file for standard cells.
    5)Roadmap for physical Design as a career choice in view of increased complexity in technology scaling.

    1. Sini Mukundan

      Clock buffers are designed such that the rise time and fall time are equal. This is important since clock needs to maintain its duty cycle. Schematically it is similar to a buffer, but the parameters and layout are such that these conditions are met. Consequently you may have a bigger(in terms of layout) buffer for the same drive strength.

  3. pramod

    In my design more timing violations are in In to reg only. And this is mainly due to clk latency. My vclk latency is more when compared to capture clk. My design clk has highest latency as 2.7 and this is taken as fixed latency for vclk everywhere. is it correct ? What should I do to meet my timing.

    1. Sini Mukundan

      The tool can but needn’t use all of them. It will use any combination of available cells to fix the skew and get as small an insertion delay. So some user control is required. e.g. X0 drive and very big cells can be avoided in your list.

  4. Naveen Reddy

    Hi sini,
    my design have some problem i am using icc tool,Floorplan i stated with 65% utilization.In placement i got 73.5 utilization but i run cts only skew balance but utilization i got 70.3 what are the reasons for getting low utilization from placement to cts?

    1. Sini Mukundan

      If you are using EDI, there are some post-cts and pre-cts -drv fix capabilities in optDesign. You can also pick the nets you want to fix maxCap on, and then fix them specifically by giving a filelist. Please refer cadence support for the tool specific commands. I suppose other tools will also have similar capabilities.

    1. Sini Mukundan

      1. report_timing
      If clock network delay has (Ideal ), it is not propagated. Else it will have (propagated delay).
      2. report_clocks (Or some such command for your tool) . It will list the clocks and specify whether propagated.

  5. Sarala

    hi,
    If suppose I have two scenarios, say in one case insertion delay is 0.5ns and skew is 0.1ns, in the other case insertion delay is 0.29ns and skew is 0.25ns?? which one should I select? and why?

  6. Bijesh

    I have read that clock inverters have lesser delay compared to clock buffers of same drive strength. Clock inverters have better driving capacity compared to clock buffers too. Clock inverters provide symmetrical rise and fall times too. Then why do we use clock buffers in CTS? Why does standard cell vendors provide clock buffers? Only inverters will be sufficient and better for CTS, right??

  7. Bijesh

    Suppose in my design there are multiple clocks. In such a case how the tool decides priority for each clock tree??
    If I am having clk1, clk2, clk3, clk4 and clk5, which one will be synthesized and routed first? While optimization, which one will be most optimized?

  8. sandeep

    Hi Sini,
    How to decide upon which decaps should be used for DFM. Since decaps add leakage and capacitance to the design, which decaps meaning (decap4 or decap64) is preferable?

  9. sandeep

    How does different flavours of Decap cells(i.e. decap64,decap4 etc) contribute to leakage power and capacitance in any design.
    which cell is preferred in the design since different flavours have different cap and leakage power?

    1. Sini Mukundan

      Higher area CAP will also mean higher leakage, so it’s a trade off between protection of IR drop voltage variation and leakage power in your layout. The tools can assess the characteristics and place them accordingly. You can manually change the cell after analysis as well.

  10. PRAMOD

    Hi Sini,
    Your blogs about physical design are very helpful since i am new to physical design. Please write some blogs about how to fix setup and hold violations under different scenarios, different types of clock trees, industry standard for cts and tips for a beginner in the pd engineering.

    Thanks in advance

  11. Guru Marreddy

    Hi Sini,
    Why exactly a PD engineer goes for a X-Tree implementation over h-tree implementation for CTS. Considering the headache of having the non-rectilinear trunks in x-tree.

  12. saivijayabhaskar

    madam ,
    i have a small doubt..
    If i add METALFILL M1 to M8 there after if i add VIAFILL{-mode connectbetweenfill}, is there any rule that
    1.) Every METAL layer should have a VIAFILL withrespect to next layer?
    2.) Does each METAL layers should get shorted through VIAFILL i.e M1
    to M7?

Leave a Reply

Your email address will not be published. Required fields are marked *