Designcon 2011 – Day 4

Here are my notes and observations from the last day of Designcon 2011.

1-TH1 – Hardware Based Floating Point Design Flow

This was presented by Michael Parker from Altera.  The paper went in to some of the limitations of the IEEE-754 floating point standard that make it difficult to implement in FPGAs:

  • normalization after every operation implemented in barrel shifters which are slow in FPGAs
  • doesn’t use 2’s complement representation of numbers
  • no direct synthesizable support in HDL languages

The presentation discussed a methodology for implementing floating point in FPGAs called Fused Data Path.  The LUT, memory, and multiplier elements in the FPGA are used.

Design entry is done through Matlab Simulink and synthesis is done directly from the Simulink schematic (no intermediate HDL step).

Fused Data Path supports all of the following:

  • complex numbers
  • vectors
  • math.h functions (ie sin, cos, tan, log, sqrt, etc.)
    • These can be implemented much more efficiently in HW using parallel logic paths

Target applications are FFTs and linear algebra.  The Fused Data path tool is part of the Altera DSP Builder software.

I read the paper as well.  The Fused Data path floating point is actually different from IEEE-754 and the paper shows comparison of results for some operations between the two, making the point that Fused Data path is more accurate.  There are also some performance benchmarks in the paper.  The question that comes to mind is how the Fused Data Path implemented in an FPGA would compare performance wise against a high end floating point processor or DSP.

The amount of FPGA resources used in the examples shown in the paper seems to be quite large, but if there is a significant performance advantage it might be a compelling solution.

8-TH2 – Comparison of Optical and Electrical Links for Highly Interconnected Systems

This paper described an engineering research project performed by the Mayo Clinic to compare optical and electrical communication channels for high performance computing platforms.  The paper describes some metrics of comparison between the two.

  • energy per bit
  • cost
  • packaging density

The conclusion seemed to be that optical was the way to go for next generation high performance computing platforms.

I’m wondering why the Mayo clinic is doing this type of research.  Do they have a design department that is creating their own high performance computer systems?

11-TH3 – Design and Optimization of Power Delivery Network for 3D Stacked Die Designs

This presentation had some interesting background on different options for stacking die inside of packages.  The four options mentioned were:

  • Die stacked on top of each other, power distributed to top die directly with through silicon via
  • Die stacked on top of each other, power distributed to regulator on first die, and then to top die
  • Die mounted next to each other on an interposer with connections to power through the interposer (very similar to a PCB only on a much smaller scale)
  • Multiple stacked die with through silicon vias connecting power directly to each die

Beyond seeing some diagrams of different options for stacking die, this presentation was mostly over my head.


Designcon 2011 – Day 3

Here my notes and observations from day 3 of Designcon.

10-WA2 Forward Error Correction (FEC) for High-Speed SerDes Link System of 25-28 Gbps

This paper contained some research into codes that would work in multi-gigabit serial systems.

One thing mentioned is that DFE equalizers tend to multiply single bit errors into burst errors.  The effect of this error propagation on BER is limited, however it has a big effect on the mean time to false packet acceptance (MTTFPA) for packets protected by CRC32.  The general standard is that the MTTFPA should be longer than the existance of the universe (1e10), and burst error propagation can reduce MTTFPA to below this, even though it is still quite high.

Many codes that have been successful in other radio applications are not attractive for serial copper because they are too complex, require too much processing overhead, and have too much latency.  Codes that were mentioned in the presentation were Reed Solomon and cyclic codes (RS(264,260,2) in particular).

In their analysis they assumed perfect equalization of channel, so the Gaussian noise channel assumptions were used for all coding gain calculations.

One thing I learned is a method that has been used in some Ethernet standards to add the coding bits without adding any additional overhead.  The 64/66b code has been used, which uses 2 bits as a packet identifier to identify a packet as data or control.  This packet identifier can be compressed to a single bit, so the extra bit can be used as a code bit.  Multiple 66b words can be concatenated together to form entire code words with all of the necessary code bits.

The use of interleaving was mentioned to increase burst error correcting capability.

Synchronization was discussed, where block synchronization would be done by checking parity bits of received data and then bit slipping if it is not error free.

This is a paper I would like to read in more detail.  

3-WA3 Case Study for PID Control in an FPGA

This was my paper presentation (copresented with Paul Schad).  There were approximately 33 people present for the presentation.  Here were the questions that we were asked afterwards:

  1. How did you handle independent input variables of voltage and current for control?  Were they controlled independently?
  2. Would a HW floating point unit in the FPGA be attractive for this application?
  3. Why did the loop rate increase from 2 KHz to 250 KHz?
  4. What was the end application?
  5. Could an FPGA based PID controller be used in switch mode power supply design?
  6. What langauge did you implement design and simulation in?


3-WA4 – Vectorless Estimation of Power Consumption Variation in an FPGA

It is possible to use statistics to predict the # of switching events at the output based on statistics of switching events at the input.  This only gives the average power noise, and an estimate of the peak is really needed to design the power distribution system.

To determine the peak, you need to know the statistical distribution of power per clock cycle to find the mean and standard deviation of the power.  Then a reasonable estimate can be made for the peak.

The focus of this paper was to develop a computationally fast method of finding this distribution.

Keynote – Ivo Bolsens (Xilinx)

He talked about the new crossover devices that bridge the gap between the ASSP and the FPGA.  These are programmable system on chip devices that include processors, memories, and programmable logic.  There are many advantages to this type of device including tight integration between CPU and HW or between multiple CPUs, and fast and flexible time to market.  The drivers of this new cross over technology are both monolithic chips and 3D system in package.  Overall I found his presentation compelling, but the fundamental flaw in programmable logic of high unit cost was not really addressed.  As long as the unit cost is multiples higher than ASSP and SOC processor devices, it is hard to see a dominant place for programmable logic except in high performance applications.

10-WP5 A Study of 25 Gb/s Backplanes Using Equalization, Modulation, and Improved Channels

I only stayed for the very first part of this.  I was really curious about different signal modulations being used.  The author revealed early that he only considered NRZ and PAM4.  I was hoping for a wider consideration of this and there was another interesting presentation I wanted to see.

1-WP5 – An On-Die 2D Eye Scope on 28 nm CMOS 12.5 Gbps Transceiver

This was presented by someone from Altera.  They first attempted to integrate a measurement circuit with their multi-gigabit transceivers in Stratix 4 at 40 nm and presented this as a paper last year [1].  This first generation only measured the horizontal eye opening and I don’t think it was actually made commercially available in these devices.  This year they are presenting a full 2D on die scope for Stratix 5 in 28 nm.  It works using the BERT Scan method (like a BERTScope or JBERT) and sweeps the clock sample position to scan in the horizontal direction and uses high speed comparators to get amplitude in the vertical dimension.  It uses an embedded BER pattern checker to evaluate whether the bit is correct at each decision threshold.  Here are some other interesting points.

  • 32 horizontal steps, 64 vertical steps
  • Works with a single fixed PRBS7 pattern
  • Good for up to 12.5 Gbps
  • Stratix 5 – will be available on every transceiver in every device

This technology sounds similar to the Vitesse patented V-Scope that I wrote about in a past article.  I wasn’t sure how the Vitesse solution worked, so I stopped by their booth, and they explained that it was a very similar BERTScan technique.

6-WP6 Fast, Accurate, ‘Simulation Lite’ Approach for Multi-GHz Board Design

SI simulation was originally focused on verifying setup and hold timing as part of a worst case design flow.  The SNR was so high in these systems that the BER’s were so small that they weren’t even worth worrying about.  As data rates increased, SI simulation has become exceedingly more complicated in a world where nearly everything matters.  Simulations must be run across manufacturing tolerance variation of dozens of parameters.  It is impractical to spend this much time in simulation.  These authors from Intel have developed a methodology where they convert transmission channel S parameters to an equivalent Gaussian noise SNR value, also taking into account things such as transmitter signal strength, jitter, and noise.  The concept is to estimate the SNR.  In prelayout analysis this method can be part of a budgeting process.  There are post layout tools that can analyze every net and create insertion loss and cross talk S parameters.  Even if the tools aren’t perfect, the process still gives a relative quality comparison between nets.  A PCB designer can intuitively learn how to fix outlying nets, and the process requires limited SI simulation.  They reported that they have successfully used the methodology on real designs with DDR style memory busses.

This method relies on imperfect simulation tools, but instead of striving to build perfect simulations, uses the tools at their current capabilities to get enough information to make design trade offs for the board designer.  This methodology could be easily built into EDA layout tools.  This sounds like a very promising option to bring a reasonable signal integrity methodology for multi-gigabit design back down to an intermediate skill level.

TP-W1 – Who is the Designer of the Future

This panel was really focused on who the IC designer of the future is, not system designer like what I am involved in.  The main point seemed to be that the RTL chip design methodology is running out of steam and that design must occur at a higher level of abstraction by engineers having a cross functional mix of application expertise, software expertise, and hardware expertise.  I didn’t stay for the whole thing.

CH-9 – Reacting to the Age of the Domain Expert: Industry Collaboration to Make Prototying Easier

This was a panel represented with a semiconductor manufacturer (NXP), distributer (Digi-Key), PCB fab vendor (Sunstone Circuits), Assembler (Screaming Circuits), EDA vendor (National Instruments).  They discussed a partnership that they have formed to make the process of designing, fabricating, and assembling a board much more seamless for the design community.  Some examples:

  • Schematic symbols and footprints available for all tools so companies don’t need to have their own library groups
  • Pricing information matches data book part numbers
  • Simulation models for all components
  • One stop shopping for PCB fabrication and assembly

Probably one of the more interesting things I learned from this was just the existence of Screaming Circuits.  They provide very fast turn assembly of printed circuit boards.

[1] Ding, Weichi, Mingde Pan, Tina Tran, Wilson Wong, Sergey Shumarayev, and Mike Peng Li. January 2010. “An On-Die Scope Based on a 40-nm Process FPGA Transceiver.” Designcon 2010 Conference Proceedings.  [Internet, WWW, PDF] Available in PDF format: [Accessed Feb 8, 2011].

Designcon 2011 – Day 2

Here are my notes and observations from the second day of Designcon 2011.

4-TA1 – A Way to Meet Bandwidth and Capacity Needs of Next-Generation Main Memory System

Current trends in memory are toward higher bandwidth but this is driving toward single DIMM systems which is in contradiction to the other trend of increased capacity.  This paper explored some possible optimizations to make in the next generation signaling architecture to achieve the goals of both increased bandwidth and capacity, while not increasing power.  Here are the things explored in the paper:

  • Per bit transmit and receive timing calibration
    • all timing adjustment done on the controller side
    • periodic recalibration during refresh to account for voltage and temperature variation
    • also allows an easy way to perform margin testing
  • Dynamic point to point
    • 2 DIMM slots exist, however they can be populated with two separate DIMMs, each attaching to half of the data bits
    • Or with 1 continuity module that passes through the data bits for connection to the second DIMM which connects to all the data bits
    • The first configuration supports full 2 DIMM memory capacity, and it achieves full bandwidth by having the controller access both DIMMs simultaneously – seems like it would add quite a bit of complexity to controller design to group reads and writes efficiently
  • Flex clocking
    • This eliminates the DLL on the memory chips used to create the DQS signal.  I’m not exactly clear on how it works, but it will save power.
  • VDDIO lowered to 0.5 V – saves power

These options were explored in simulation only and appear to allow support of up to 3200 MT/s memory interfaces.  There is no real hardware designed or evaluated yet.

8-TA2 – High-Speed Channel Characterization Using Design of Experiments

This paper described a design of experiments methodology for signal integrity analysis.  Basically a DOE was created, identifying some parametric simulations to perform that would represent the entire parametric space.  Then the data was analyzed using a statistical SW package called JMP to create a simpler statistics based model of system performance.  There might be iteration required to add more parametric simulations to improve the fit of the DOE model.  Then the DOE model was used to run monte carlo simulations with varying parameters within manufacturing tolerances to identify the most critical and sensitive parameters.  The results were filtered for any that would fall outside of an impedance tolerance (just like the PCB vendor would screen them).  The results were for a given stackup there could be up to 1.8 dB of difference in loss at 12 GHz between different board vendors due to differences in manufacturing tolerances and processes.  The DOE also revealed which parameters the loss is most sensitive to.  The top two were loss tangent (not surprising) and dielectric constant (a little surprising).

4-TA3 – Design Optimization of a DDR4 Memory Channel

This was another presentation discussing the optimizations needed for next generation memory system physical layer design.  The DDR4 memory system is targetting the following changes from DDR3.

DDR4 vs. DDR3:

  • Transfer Rate: 3200 MT/s vs. 2133 MT/s
  • VDDIO: 1.2 V vs. 1.35 V
  • DQ Termination: VDDQ pull up vs. Center tap
  • Self generated Vref vs. external Vref

One of the more eye raising changes was switching to VDDQ pull up termination from center tap.  This would be cheaper because it does not require the creation of an external push pull supply for the center tap voltage.  However it can also be used to save power if you always transmit more 1’s than 0’s.  The way to do this is to use bit inversion.  The memory controller tells the memory chip ahead of the transfer if the data is inverted or not, and then sends the data in either normal or inverted format depending on which has the most 1’s.  The memory chip then inverts the bits if necessary before writing them to memory.  More extra complexity in the interface!

9-TA4 – Phase Noise and Jitter Translations for Signal Integrity

The main high level point that I understood from this presentation is that the definition of phase noise commonly used in RF measurement is not applicable to SI for jitter measurement.  For SI, the more formal, comprehensive definition in IEEE std. 1139 is needed.  Beyond this, I really didn’t understand much of this presentation.

Keynote by Jay Alexander (Agilent)

The presenter talked about the maturing of the electronics industry into a cyclical market.  He explained how Agilent was preparing for the cyclical nature of the market by shifting as many fixed costs to variable costs as possible.

He presented 3 views of the electronics market:

  1. Electronics continues to be the primary driver of new product innovations – focus on performance and features.
  2. Electronics will be balanced with other drivers in creating value in new products.
  3. Electronics is fully mature and focus is on cost

There really isn’t one answer to this question, and the answer depends on what market you are serving.

CH-1 – Teardown Smackdown  – Dell Streak v. Samsung Galaxy

This was a very interesting presentation in which the Dell Streak and Samsung Galaxy were taken apart and compared for mechanical and electrical architecture.  Details of this tear down can be found at

CH-2 – Xbox 360 Kinect Teardown and Giveaway

This was presented by Keil Weins from iFixit.  The Xbox 360 Kinect is a sensor device that uses multiple camers to construct 3D images that can be processed to determine movement for use as a video game controller.  The key technology in this device has been licensed from an Israeli startup company called PrimeSense.  One of the most exciting parts of this teardown was learning about iFixit which is a company that provides instructions, parts, and tools for repairing many electronic devices.  The CEO really showed a strong passion for what he is doing to help change the tide of our throwaway mentality toward consumer electronics.

TP – T2 – FPGA Caveman meets FPGA Chiphead FPGA Design Tools and Methodologies: Can they keep pace?

The panelists seemed to be in agreement that the main issue was that FPGA design entry methods needed to move up to a higher level of abstraction.  There was some emphasis on the need for FPGA design tools that make the verification process more efficient, especially by the Gate Rocket representative.  Overall there was very little disagreement on this panel.  The panelists did not have good answers as to how traditional FPGA vendors with limited tool budgets will be able to own ASIC like tools.  It seems to me that the tool model for FPGAs must move more toward that of software, especially as FPGAs and processors converge in programmable SOCs.  For example in software you typically buy your development tools, emulators, and dev boards for several thousand dollars on every project.  Rather than in ASIC design where you buy tools for 10s of thousands of dollars per year.

Designcon 2011 – Day 1

I am at Designcon 2011 this week to present a paper that I cowrote with Paul Shad, titled Case Study for PID Control in an FPGA.  Here are my notes regarding the sessions and presentations that I attended today.

TT-MA5 – Intelligent Hardware / Software Interface Design

This was presented by Gary Stringham based on his book, Hardware / Firmware Interface Design.
Here is his web site:

Principles vs. practices – principles are broad guiding concepts, while practices are individual approaches to tasks and activities that help support the broad principles.

He has discovered 7 principles that lead to successful HW / SW integration.  They are:

  1. Collaboration
    Early collaboration is very important
    Documentation is a key means of collaboration
  2. Set and adhere to standards
    Some are internal, but you should adhere to industry standards whenever possible
  3. Balance the load
    Allocate functionality appropriately to HW and SW based on resources
  4. Design for compatibility
    Any HW works with any SW.  Try not to break firmware with new HW updates
  5. Anticipate the impacts
    How HW design affects SW – some features or lack of might add a lot of complexity later on
  6. Design for contingencies
    Test and debug access
  7. Plan ahead
    “There’s never enough time to do it right, but there’s always enough time to do it again.”

Collaboration Practices

  • Ambassadors between HW and SW teams
  • SW team is part of the HW architecture phase
  • Direct contact – initial on site kickoff meetings

Planning Practices

  • Use existing industry standards where possible
    • don’t tweak them, implement them exactly as specified
    • ie CAN with a custom application protocol on top – don’t make changes to CAN, make your customizations at the higher layers
  • Use latest version of every block – another project may have used the block and discovered and made fixes, so you want those fixes included in your design also
  • Document all defects and errata in all blocks, even if they seem insignificant
  • Keep HW/SW interactions in the design as simple as possible
  • Post mortem
    • have them on every project
    • More importantly, review the previous projects’ post mortems for new projects based on previously used blocks

Documentation Practices

  • Document templates
  • Write documentation at the beginning of the design
  • Register design tools – define your register in a tool, and it creates HDL, source code, and documentation automatically
  • Document the interactions between any registers
    • Including the order in which the HW expects them to be set
  • Use horizontal tables for bit fields
  • Document all conditions that could cause every error message

Super Block

  • Design blocks to include all possible functionality, even if only using a subset
    • Don’t add unnecessary functionality, instead just don’t remove existing functionality that you aren’t using from blocks that are being reused
  • Common firmware interface regardless of feature set of derivative products
  • Design blocks with enable line inputs that can be hard tied for functionality that is not used, that way the synthesis tool will remove that functionality without having to use a different design file


  • Make internal status information available in registers, for diagnostics when errors occur (state machine bits, internal counters
  • Multiplex the inputs to the chip to the different blocks that use them.  Don’t allow shared inputs to be connected to blocks when the block is not supposed to be watching the signal.
  • Always have event indicator signals of all events from the HW to the SW.  Try to avoid using blind delays in the software to wait for actions to be completed by the HW.


  • Avoid write only bits
  • Do not mix different writable bit types in any combination in the same register
  • Block level ID and version registers for all major functional blocks in the design
  • Atomic access to registers that more than 1 device driver or thread will access
    • This must be accomplished with write only registers, and this is the one exception where these should be used

Errors and Aborts

  • Need to provide abort for all tasks that the HW is performing
  • Need to think about error handling and abort functionality up front

Overall the presenter had a lot of good ideas for hardware to software interfacing that are applicable to ASIC designs, but also to FPGA designs like I have been involved with.

Keynote Address: Harold Hughes, CEO RAMBUS

There wasn’t much memorable said here.  The presenter read a speech, made little eye contact, and was only speaking for approximately 15 minutes of the allotted 30.  In past Designcon keynote addresses I remember dynamic presenters giving a vision for the past, present, and future state of the electronics industry.  This was a bit of a disappointment.

TT-MP1 – Rethinking How Signals Interact with Interconnects

This was presented by Eric Bogatin, Jeff Loyer (Intel), Olufemi Oluwafemi (Intel), and Stephen Hall (Intel).

This presentation was about the differences between two different approaches to describing signal integrity:  the current/voltage/circuits view vs. the electromagnetic fields and waveguides view.

The lumped element RLGC circuit model of a transmission line is very powerful to understand and characterize return currents, impedance, ground bounce, reflections, terminations, attenuation, and how to engineer to minimize loss.  However there are several situations where circuit models are not useful and may even give the wrong intuition.  The presenters chose 4 areas to focus on as follows.

Copper Roughness

A summary is that surface roughness increases the effective surface area and causes the concentration of E and H fields to increase because they are no longer simply orthogonal to the dielectric material.  This means that more energy is required to drive the same response as a flat surface conductor or alternatively less signal gets through, thus more loss.  The increased loss is NOT due to the current flowing up and down the ridges and thus taking a longer path, as you might predict using a circuit model approach.

Modal Decomposition

One of the presenters made the claim that far end crosstalk exists only when there are differences in the even and odd mode as in microstrip, but does not exist in striplines.  I didn’t really understand this explanation.

Waveguide Via

The presenter showed an example of a waveguide via (signal via surrounded by ground vias) and a simple signal via with a ground via very close by.  In the example simulations, the second case appeared to be better because it had less impedance discontinuity.  However in reality, the waveguide via kept the fields contained better for less radition (or crosstalk in the near field), so overall it performed much better.  This was simply meant as an example to show that the fields are important to consider and not just the impedance.

Causality of Transmission Line Models

Traditional transmission line simulators make assumptions that only R and G vary with frequency and that L and C are constant.  This leads to models that are not causal.  This will start to matter at higher frequencies.  The loss tangent will increase with frequency, however because more energy is lost as frequency increases, this also means that a corresponding less amount of energy is stored in the C of the model.  Therefore the dielectric constant must be decreasing while the loss tangent is increasing.  Similar is true for L with respect to R.  R increases with frequency due to the skin effect, however as the current crowds to the outside of the trace, the inductance must be decreasing because there are fewer magnetic field lines surrounding the current.  The presenter contended that these effects matter for simulations above 2 Gbps.  All of my experience in simulating at these data rates has been with w-element models or equivalent that did not account for this.  Some tools are starting to account for this such as Simbeor, Mentor Hyperlynx, and Ansoft (maybe others).

One of the presenters has a new book:
Advanced Signal Integrity for High-Speed Digital Design by Stephen H. Hall and Howard Heck.

TP-M2 – Closed Eye: Determining Proper Measurement Approaches in a 3rd Gen Serial World

This is a panel discussion on test and measurement of multi-gigabit serial signals (I have attended this same panel in previous years where discussion focused on jitter measurement).  The panelists were:

  • Ransom Stephens – Ransom’s Notes
  • Mark Marlett – Xilinx
  • Mike Peng Li – Altera
  • Eric Kvamme – LSI
  • Greg Le Cheminant – Agilent
  • Tom Waschura – Tektronix (formerly Synthesys Research)
  • Marty Miller – Lecroy

Apparently Tektronix acquired Synthesys Research last year which I was not aware of (see this article).  The panel had some arguments about random jitter, and concluded that deembedding techniques are pretty immature and not to expect agreement from different vendor tools.  Overall, I came away from this panel still thinking that the industry still does not know the correct way to measure multi-gigabit signals and assess them for pass fail.  The only true measure of quality is performance (BER), which is time consuming to measure.  All other tools are diagnostic at best.

Spectrum of Digital NRZ Signals

Another article that I would like to keep a reference to from today’s office clean up was in EDN Magazine several years ago [1].  It gave a nice introductory explanation of the spectral content of NRZ encoded digital signals.  The article also discusses some of the practical implications of the spectrum of a digital signal such as EMI.

[1] Redd, Justin and Craig Lyon. September 2, 2004. “Spectral Content of NRZ Test Patterns.” EDN Magazine.

Noise Margin in Digital Circuits

Today was office clean up day and I have decided this year to really clean out my cubicle after 13.5 years of accumulating stuff.  One of the tasks was to go through a couple file folders filled with old papers, articles, and application notes over the years.  They seemed interesting at the time, but I don’t think I’ve referred to any of them even once since putting them in the folder.  I threw most of them about, but a few are worthy of documenting the source for future reference.  One of those was a paper in IEEE Transactions on Education related to digital logic circuit noise margin [1].  The paper defines noise margin as:

NMh = Voh – Vih
NMl = Vil – Vol

The tricky part is how Voh, Vol, Vih, and Vil are defined, and that is what the paper describes in great detail.  Valid definitions of these are shown and analyzed in the context of how they relate to noise margin.

[1] Hauser, John R. November 1993. “Noise Margin Criteria for Digital Logic Circuits.” IEEE Transactions on Education. Vol. 36, No. 4.

Multigigabit Serial Communication System Cable Design

There was an article in EDN magazine recently about designing equalization for cables in multi-gigabit serial digital communication systems [1].  The article was written by engineers from Redmere, a cable manufacturer that makes high speed actively equalized data cables.  The article really describes how high speed cables can be constructed for consumer application space where low cost is a primary constraint.  It describes some of the process variations that affect signal quality.  The article appears to be based on a white paper that is available from Redmere’s web site.  This information is very interesting and could be very useful for someone starting a multi-gigabit design.

Hearne, Kay and John Horan PhD. December 15, 2009. “Precision equalization and test bring high-performance, low-cost cabling.” EDN Magazine.