Nehalem.

Nuh - hay - lem

At least that's how Intel PR pronounces it.

I've been racking my brain for the past month on how best to review this thing, what angle to take, it's tough. You see, with Conroe the approach was simple: the Pentium 4 was terrible, AMD proudly wore its crown and Intel came in and turned everyone's world upside down. With Nehalem, the world is fine, it doesn't need fixing. AMD's pricing is quite competitive, Intel's performance is solid, power consumption isn't getting out of control...things are nice.

 

But we've got that pesky tick-tock cadence and things have to change for the sake of change (or more accurately, technological advancement, I swear I'm not getting cynical in my old age):


2008, that's us, that's Nehalem.

Could Nehalem ever be good enough? It's the first tock after Conroe, that's like going on stage after the late Richard Pryor, it's not an enviable position to be in. Inevitably Nehalem won't have the same impact that Conroe did, but what could Intel possibly bring to the table that it hasn't already?

Let's go ahead and get started, this is going to be interesting...

Nehalem's Architecture - A Recap

I spent 15 pages and thousands of words explaining Intel's Nehalem architecture in detail already, but what I'm going to try and do now is summarize that in a page. If you want greater detail please consult the original article, but here are the cliff's notes.


Nehalem

Nehalem, as I've mentioned countless times before, is a "tock" processor in Intel's tick-tock cadence. That means it's a new microarchitecture but based on an existing manufacturing process, in this case 45nm.

A quad-core Nehalem is made up of 731M transistors, down from 820M in Yorkfield, the current quad-core Core 2s based on the Penryn microarchitecture. The die size has gone up however, from 214 mm^2 to 263 mm^2. That's fewer transistors but less densely packed ones, part of this is due to a reduction in cache size and part of it is due to a fundamental rearchitecting of the microprocessor.

Nehalem is Intel's first "native" quad-core design, meaning that all four cores are a part of one large, monolithic die. Each core has its own L1 and L2 caches, and all four sit behind a large 8MB L3 cache. The L1 cache remains unchanged from Penryn (the current 45nm Core 2 architecture), although it is slower at 4 cycles vs. 3. The L2 cache gets a little faster but also gets a lot smaller at 256KB per core, whereas the lowest end Penryns split 3MB of L2 among two cores. The L3 cache is a new addition and serves as a common pool that all four cores can access, which will really help in cache intensive multithreaded applications (such as those you'd encounter in a server). Nehalem also gets a three-channel, on-die DDR3 memory controller, if you haven't heard by now.

At the core level, everything gets deeper in Nehalem. The CPU is just as wide as before and the pipeline stages haven't changed, but the reservation station, load and store buffers and OoO scheduling window all got bigger. Peak execution power hasn't gone up, but Nehalem should be much more efficient at using its resources than any Core microarchitecture before it.

Once again to address the server space Nehalem increases the size of its TLBs and adds a new 2nd level unified TLB. Branch prediction is also improved, but primarily for database applications.

Hyper Threading is back in its typical 2-way fashion, so a single quad-core Nehalem can work on 8 threads at once. Here we have yet another example of Nehalem making more efficient use of the execution resources rather than simply throwing more transistors at the problem. With Penryn Intel hit nearly 1 billion transistors for a desktop quad-core chip, clearly Nehalem was an attempt to both address the server market and make more efficient use of those transistors before the next big jump and crossing the billion transistor mark.

Multiple Clock Domains and My Concern
Comments Locked

73 Comments

View All Comments

  • Clauzii - Thursday, November 6, 2008 - link

    I still use PS/2. None of the USB keyboards I've borrowed or tried out would work in 'boot'. Also I think a PS/2 keyboard/mouse don't lag so much, maybe because it has it's own non-shared interrupt line.

    But I can see a problem with PS/2 in the future, with keyboards like the Art Lebedev ones. When that technology gets more pocket friendly I'd gladly like to see upgraded but still dedicated keyboard/mouse connectors.
  • The0ne - Monday, November 3, 2008 - link

    Yes. I have the PS2 keyboard on-hand in case my USB keyboard can't get in :)
  • Strid - Monday, November 3, 2008 - link

    Ahh, makes sense. Thanks for clarifying!
  • Genx87 - Monday, November 3, 2008 - link

    After living through the hell that were ATI drivers back in 2003-2004 on a 9600 Pro AIW. I didnt learn and I plopped money down on a 4850 and have had terrible driver quality since. More BSOD from the ati driver than I have had in windows in the past 5 years combined from anything. Back to Nvidia for me when I get a chance.

    That said this review is pretty much what I expected after reading the preview article in August. They are really trying to recapture market in the 4 socket space. A place where AMD has been able to do well. This chip is designed for server work. Ill pick one up after my E8400 runs out of steam.
  • Griswold - Tuesday, November 4, 2008 - link

    You're just not clever enough to setup your system properly. I have two indentical systems sitting here side by side with the only difference being the video card (HD3870 in one and a 8800GT in the other) and the box with the nvidia cards gives me order of magnitude more headaches due to crashing driver. While that also happens on the 3870 machine now and then, its nowehere nearly as often. But the best part: none of the produces a BSOD. That is why I know you're most likely the culprit (the alternative is faulty hardware or a pathetic overclock).
  • Lord 666 - Monday, November 3, 2008 - link

    The stock speed of a Q9550 is 2.83ghz, not 2.66qhz.

    Why the handicap?
  • Anand Lal Shimpi - Monday, November 3, 2008 - link

    My mistake, it was a Q9450 that was used. The Q9550 label was from an earlier version of the spreadsheet that got canned due to time constraints. I wanted a clock-for-clock comparison with the i7-920 which runs at 2.66GHz.

    Take care,
    Anand
  • faxon - Monday, November 3, 2008 - link

    toms hardware published an article detailing that there would be a cap on how high you are allowed to clock your part before it would downclock it back to stock. since this is an integrated par of the core, you can only turn it off/up/down if they unlock it. the limit was supposedly a 130watt thermal dissipation mark. what effect did this have in your tests on overclocking the 920?
  • Gary Key - Monday, November 3, 2008 - link

    We have not had any problems clocking our 920 to the 3.6GHz~3.8GHz level with proper cooling. The 920, 940, and 965 will all clock down as core temps increase above the 80C level. We noticed half step decreases above 80C or so and watched our core multipliers throttle down to as low as 5.5 when core temps exceeded 90C and then increase back to normal as temperatures were lowered.

    This occurred with stock voltages or with the VCore set to 1.5V, it was dependent on thermals, not voltages or clock speeds in our tests. That said, I am still running a battery of tests on the 920 right now, but I have not seen an artificial cap yet. That does not mean it might not exist, just that we have not triggered it yet.

    I will try the 920 on the Intel board that Toms used this morning to see if it operates any differently than the ASUS and MSI boards.
  • Th3Eagle - Monday, November 3, 2008 - link

    I wonder how close you came to those temperatures while overclocking these processors.

    The 920 to 3.6/3.8 is a nice overclock but I wonder what you mean by proper cooling and how close you came to crossing the 80C "boundary"?

Log in

Don't have an account? Sign up now