Profil de guangdengYicong.LiaoPhotosBlogListes Outils Aide

Blog


11 novembre

old and new guiding principles from Berkely Tech Report

  1. Old CW: Power is free, but transistors are expensive.
  • New CW is the “Power wall:” Power is expensive, but transistors are “free.” That is, we can put more transistors on a chip than we have the power to turn on.
  1. Old CW: If you worry about power, the only concern is dynamic power.
  • New CW: For desktops and servers, static power due to leakage can be 40% of total power. (See Section 4.1.)
  1. Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.

·        New CW: As chips drop below 65 nm feature sizes, they will have high soft and hard error rates. [Borkar 2005][Mukherjee et al 2005]

  1. Old CW: By building upon prior successes, we can continue to raise the level of abstraction and hence the size of hardware designs.
  • New CW: Wire delay, noise, cross coupling (capacitive and inductive), manufacturing variability, reliability (see above), clock jitter, design validation, and so on conspire to stretch the development time and cost of large designs at 65 nm or smaller feature sizes. (See Section 4.1.)
  1. Old CW: Researchers demonstrate new architecture ideas by building chips.
  • New CW: The cost of masks at 65 nm feature size, the cost of ECAD to design such chips, and the cost of design for GHz clock rates means researchers can no longer build believable prototypes. Thus, an alternative approach to evaluating architectures must be developed (See Section 6.3.)
  1. Old CW: Performance improves equally in latency and bandwidth.
  • New CW: Bandwidth improves by at least the square of the improvement in latency across many technologies. [Patterson 2004]
  1. Old CW: Multiplies are slow, but loads and stores are fast.
  • New CW is the “Memory wall:” Loads and stores are slow, but multiplies are fast. Modern microprocessors can take 200 clocks to access DRAM memory, but even floating-point multiplies may take only 4 clock cycles. [Wulf and McKee 1995]
  1. Old CW: We can reveal more instruction-level parallelism (ILP) via compilers and architecture innovation. Examples from the past include branch prediction, out-of-order execution, speculation, and VLIW systems.
  • New CW is the “ILP wall:” There are diminishing returns on finding more ILP [Hennessy Patterson 2006]
  1. Old CW: Uniprocessor performance doubles every 18 months.
  • New CW is Power Wall + Memory Wall + ILP Wall = Brick Wall. Figure 2 plots processor performance for almost 30 years. In 2006, performance is a factor of three below the traditional doubling every 18 months that we enjoyed between 1986 and 2002. The doubling of uniprocessor performance may now take 5 years.
  1. Old CW: Don’t bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
  • New CW: It will be a very long wait for a faster sequential computer (see above).
  1. Old CW: Increasing clock frequency is the primary method of improving processor performance.
  • New CW: Increasing parallelism and decreasing clock frequency is the primary method of improving processor performance. (See Section 4.1.)
  1. Old CW: Less than linear scaling for a multiprocessor application is failure.
  • New CW: Given the switch to parallel computing, any speedup via parallelism is a success.