
I. NetlistIn & Floorplan
II. Placement
For synchronized designs, data transfer between functional elements are synchronized by clock signals. In a top level digital design, you will have one more more clock sources, like PLLs or oscillators within the chip. You may also have an external clock source connection through an IO. For a digital only block, you will have a clock pin that will be the clock source for the block in question. Clock balancing is important for meeting the design constraints and clock tree synthesis is done after placement to achieve the performance goals.
After placement you have positions of all the cells, including macros and standard cells. However, you still have an ideal clock. (For simplicity, we will assume that we are dealing with a single clock for the whole design). At this stage, buffer insertion and gate sizing and any other optimization technique is employed on the data paths, but no change is done to the clock net.
The same clock net connects all the synchronous elements in the design, irrespective of the number.
This is how your design’s clock network is at this point.
This is definitely not something we want. Think just about the load of one clock net. No driver can drive that many flops! But when it is a synchronising signal like clock, load or fanout is not the only thing we are worried about. We also want a “balanced” tree, that is the skew value for the clock tree should be zero. After clock tree synthesis, the clock net will be buffered as below.
The main concerns in CTS are:
- Skew – One of the major goals of CTS is to reduce clock skew.
Let is see some definitions before we go into clock skew.- Clock Source
Clock sources may be external or internal to your chip/block. But for CTS, what we are concerned about is the point from where the clock propagation starts for the digital circuitry. The can be a IO port, outputs or PLL,Oscillators, or even the outputs of a gate down the line. (e.g a mux output).A clock source for CTS may also be specified using ‘create_generated_clock’ command. This defines an internally generated clock for which you want to build a separate tree, with it’s own skew, timing and inter-clock relations.
You specify the clock source(s), using the command create_clock.
create_clock -name XTALCLK -period 100 -waveform { 0 50 } [get_pins {xtal_inst/OUT}] create_clock -name clk -period 100 -waveform { 0 50 } [get_ports {clk}] create_generated_clock -name div_clk1 \ -source [get_pins {block1/clk_out}] -divide_by 2 \ -master_clock [get_clocks {clk}]
- Clock Sinks
Sinks or clock stop points are nodes which receive the clock. Default sinks are the clock pins of your synchronous elements like Flipflops.
Now let us define skew as the maximum difference among the delays from the clock source to clock sinks..
In the picture above, the delay to clock sinks are given. The skew in this case is the difference between the maximum delay and minimum delay.
`Skew = 20ns-5ns = 15ns`The goal of clock tree synthesis is to get the skew in the design to be close to zero. i.e. every clock sink should get the clock at the same time.
- Clock Source
-
Power – Clock is a major power consumer in your design. Clock power consumption depends on switching activity and wire length. Switching activity is high, since clock toggles constantly. Clock gating is a common technique for reducing clock power by shutting off the clock to unused sinks. Clock gating per se is not done in layout; it should be incorporated in the design. However,lock tree synthesis tools can recognise the clock gates, and also do a power aware CTS.
In the picture above, FF1 gets the ungated clock CLK, and FF2 and any subsequent flop gets a gated clock. This clock is turned on only when the signal EN is present. (See ICG cells)
Make sure that you specify the clock as propagated at CTS stage. i.e. instead of ideal delay for clock, you are now calculating the actual delay value for the clock. This will in turn give you a more realistic report of the timing of the design. You can propagate the clock using the command `set_propgated_clock [all_clocks]`
Hi,
Thank you so much. I am new to physical design.
These are really helpful.
Please keep doing it.
Yan
how to reduce the number of clock levels after initial clock tree synthesis?
I am not sure about your question, but some of the things I follow are:
1. Your CTS tool probably has a clock browser or interactive clock tree browser. Use this to look at any unwanted buffer chains you have. This will help you pinpoint issues with your clock tree.
An example..
Two branches of tree, but one clock sink has a very high insertion delay before CTS.(Maybe due to gates, muxes etc). CTS will then be required to match this delay in all sinks(plus any extra delay due to synchornising.). This can then be addressed preferably by changing the design OR making the clock tree synthesis ignoring this path.
thanks sir….
hi,
i am doing my project in title ” NBTI induced clock skew reduction in clock trees” and i am referring IEEE paper ” Skew management of NBTI impacted gated clock trees”. I need to know how to reduce the clock skew, by using NAND and NOR gates. In this paper they mentioned NAND and NOR gate usage. Can u please give me a solution.
farjana,
Please see http://ask.vlsi.pro/clock-skew-with-nand-and-nor-gates/
Thanks for your kind effort.
under clock sinks: Is it modes or nodes.
Thanks for the catch. fixed.
Could please you include a blog specific email address so that if iam having a list of doubts it will not make cumbersome in the comment Section.
You can ask questions in ask.vlsi.pro.
I will move your questions there.
what is diff skew balanced techinic and skew group ?
Iam newbie to vlsi industry,Could you please address my doubts.
1)What are the inputs for a tool to calculate power and how it calculates power.
2)In STA for nanometer designs text book it is explained that in the name of power LUT we place internal Energy if so how to calculate internal Energy.
3)How Does the Analog macros are interfaced with the Digital Counter parts in generating power and timing numbers.
4)How the worst case slew and load are determined in generating the .lib file for standard cells.
5)Roadmap for physical Design as a career choice in view of increased complexity in technology scaling.
Why we use clock buffers in CTS can’t we use normal buffers and i came to know for clock buffers the i/p and o/p transition is time could you explain it ? why it is same
Clock buffers are designed such that the rise time and fall time are equal. This is important since clock needs to maintain its duty cycle. Schematically it is similar to a buffer, but the parameters and layout are such that these conditions are met. Consequently you may have a bigger(in terms of layout) buffer for the same drive strength.
In my design more timing violations are in In to reg only. And this is mainly due to clk latency. My vclk latency is more when compared to capture clk. My design clk has highest latency as 2.7 and this is taken as fixed latency for vclk everywhere. is it correct ? What should I do to meet my timing.
i have one question during Does tool uses all Buf/INV specified in the clock tree reference list? If not why?
The tool can but needn’t use all of them. It will use any combination of available cells to fix the skew and get as small an insertion delay. So some user control is required. e.g. X0 drive and very big cells can be avoided in your list.
What is clock divergent? I have heard people talking about it during timing violation during clock skewing.
Hi,
I am new to PD can you guide me how to gain knowledge in PD
why the clock routings are done before the signal routings ?
You are prioritizing clock routing to make sure it has the optimum utilization of the routing resources. Also clock nets may have non default routing rules, and you want to make sure the skew etc are not affected.http://vlsi.pro/physical-design-flow-iv-routing/
Hi sini,
my design have some problem i am using icc tool,Floorplan i stated with 65% utilization.In placement i got 73.5 utilization but i run cts only skew balance but utilization i got 70.3 what are the reasons for getting low utilization from placement to cts?
Hi Sini,
Could you please suggest me some VLSI backend topics for my Short seminar in a curriculum.
Ahmed
Post graduate?
1. Routing algorithms – Any improvements you can suggest?
2. Low power designs
This list looks interesting.
http://www.engpaper.com/vlsi.htm
Hi sini,
I have a doubt about how can we reduce Max Cap violations in cts stage.
Thanks in advance
If you are using EDI, there are some post-cts and pre-cts -drv fix capabilities in optDesign. You can also pick the nets you want to fix maxCap on, and then fix them specifically by giving a filelist. Please refer cadence support for the tool specific commands. I suppose other tools will also have similar capabilities.
Hi,
How i can check if all the clocks in my design have propagated or not ?
1. report_timing
If clock network delay has (Ideal ), it is not propagated. Else it will have (propagated delay).
2. report_clocks (Or some such command for your tool) . It will list the clocks and specify whether propagated.
THanks
Is there any gui/command to view the clock network after the CTS?
displayClockTree in cadence soc encounter
hi,
If suppose I have two scenarios, say in one case insertion delay is 0.5ns and skew is 0.1ns, in the other case insertion delay is 0.29ns and skew is 0.25ns?? which one should I select? and why?
we prefer insertion delay is 0.29ns and skew is 0.25ns because of hold purpose.
Himam,
i have 5ps hold violation in one path how to fix it?
Thank you so much. Your blogs are relly very lucid and helpful.
I have read that clock inverters have lesser delay compared to clock buffers of same drive strength. Clock inverters have better driving capacity compared to clock buffers too. Clock inverters provide symmetrical rise and fall times too. Then why do we use clock buffers in CTS? Why does standard cell vendors provide clock buffers? Only inverters will be sufficient and better for CTS, right??
Suppose in my design there are multiple clocks. In such a case how the tool decides priority for each clock tree??
If I am having clk1, clk2, clk3, clk4 and clk5, which one will be synthesized and routed first? While optimization, which one will be most optimized?
Hi Sini,
How to decide upon which decaps should be used for DFM. Since decaps add leakage and capacitance to the design, which decaps meaning (decap4 or decap64) is preferable?
How does different flavours of Decap cells(i.e. decap64,decap4 etc) contribute to leakage power and capacitance in any design.
which cell is preferred in the design since different flavours have different cap and leakage power?
Higher area CAP will also mean higher leakage, so it’s a trade off between protection of IR drop voltage variation and leakage power in your layout. The tools can assess the characteristics and place them accordingly. You can manually change the cell after analysis as well.
Hi
How to use macro models delay numbers at macro pins in cts stage? I tried to set it to latency value but it dint work for me
Hi Sini,
Your blogs about physical design are very helpful since i am new to physical design. Please write some blogs about how to fix setup and hold violations under different scenarios, different types of clock trees, industry standard for cts and tips for a beginner in the pd engineering.
Thanks in advance
Hi Sini,
Why exactly a PD engineer goes for a X-Tree implementation over h-tree implementation for CTS. Considering the headache of having the non-rectilinear trunks in x-tree.
hi ,
how skew balancing and clock grouping is done?
Madam,
In Clock Tree synthesis, can you explain about the importance of buffer,insertion delay,clock skew,slew rate and minimization of power.
madam ,
i have a small doubt..
If i add METALFILL M1 to M8 there after if i add VIAFILL{-mode connectbetweenfill}, is there any rule that
1.) Every METAL layer should have a VIAFILL withrespect to next layer?
2.) Does each METAL layers should get shorted through VIAFILL i.e M1
to M7?
Thanks for giving this valuable informtion
hi
if i have changed cts specification file …but i want output must be what iam changing in the specification file …how iam doing please help me
Hi Sini,
Which one is more important Global Skew or Local Skew? Why?
Hi Sini,
Have been following up on your articles and they are really helpful for freshers.
I had a query, can you please help me understand how delay numbers are calculated while setting the balance points , as in set_clock_balance_points -delay $delay -consider_for_balancing true, if i am balancing certain non end point explicity with few 100 sinks?
How to clone a cell?
what is command for the same?
do we need to check LEC after cloning a cell?
Thanks, Madam.
I am self-learner, helpful to me to know about my passion and dream field and help to get the designer place in the company, I hope so.
HI ,
Can we manually do cts? like for small partitions
hi,sini mam,
i am asking one qns how to see the clock building and reports in innvoues tool .but report_clocks_timings -type skew jitter ,latncy but value are zero. so how can see tha reports and any options ?
set_propagated_clock [all_clocks]
good evening mam,
that one ok but report_clocks_timing -type summary like skew lantcy jitter reports values showing zero’s
good evening mam,
how to solved DRC violations by using candence tool ?
like space,metal density…….!
hi mam
i have a doubt please help me with this…
why CTS is not performed in synthesis stage?
Thanks in advance
sharief