transcript.md at master from jwise/28c3-doctorow - GitHub
Cory Doctorow discusses the legal threat to general-purpose computers. Lots of resonance for smarter planet arguments.
[[1350.3]] Canada’s Parliament didn’t vote on its copyright bills because, of all the things that Canada needs to do, fixing copyright ranks well below health emergencies on First Nations reservations, exploiting the oil patch in Alberta, interceding in sectarian resentments among French- and English-speakers, solving resources crises in the nation’s fisheries, and thousand other issues! The triviality of copyright tells you that when other sectors of the economy start to evince concerns about the Internet and the PC, that copyright will be revealed for a minor skirmish, and not a war. Why would other sectors nurse grudges against computers? Well, because the world we live in today is /made/ of computers. We don’t have cars anymore, we have computers we ride in; we don’t have airplanes anymore, we have flying Solaris boxes with a big bucketful of SCADA controllers [laughter]; a 3D printer is not a device, it’s a peripheral, and it only works connected to a computer; a radio is no longer a crystal, it’s a general-purpose computer with a fast ADC and a fast DAC and some software.
[[1418.9]] The grievances that arose from unauthorized copying are trivial, when compared to the calls for action that our new computer-embroidered reality will create. Think of radio for a minute. The entire basis for radio regulation up until today was based on the idea that the properties of a radio are fixed at the time of manufacture, and can’t be easily altered. You can’t just flip a switch on your baby monitor, and turn it into something that interferes with air traffic control signals. But powerful software-defined radios can change from baby monitor to emergency services dispatcher to air traffic controller just by loading and executing different software, which is why the first time the American telecoms regulator (the FCC) considered what would happen when we put SDRs in the field, they asked for comment on whether it should mandate that all software-defined radios should be embedded in trusted computing machines. Ultimately, whether every PC should be locked, so that the programs they run are strictly regulated by central authorities.
[[1477.9]] And even this is a shadow of what is to come. After all, this was the year in which we saw the debut of open sourced shape files for converting AR-15s to full automatic. This was the year of crowd-funded open-sourced hardware for gene sequencing. And while 3D printing will give rise to plenty of trivial complaints, there will be judges in the American South and Mullahs in Iran who will lose their minds over people in their jurisdiction printing out sex toys. [guffaw from audience] The trajectory of 3D printing will most certainly raise real grievances, from solid state meth labs, to ceramic knives.
[[1516.0]] And it doesn’t take a science fiction writer to understand why regulators might be nervous about the user-modifiable firmware on self-driving cars, or limiting interoperability for aviation controllers, or the kind of thing you could do with bio-scale assemblers and sequencers. Imagine what will happen the day that Monsanto determines that it’s really… really… important to make sure that computers can’t execute programs that cause specialized peripherals to output organisms that eat their lunch… literally. Regardless of whether you think these are real problems or merely hysterical fears, they are nevertheless the province of lobbies and interest groups that are far more influential than Hollywood and big content are on their best days, and every one of them will arrive at the same place — “can’t you just make us a general purpose computer that runs all the programs, except the ones that scare and anger us? Can’t you just make us an Internet that transmits any message over any protocol between any two points, unless it upsets us?”
[[1576.3]] And personally, I can see that there will be programs that run on general purpose computers and peripherals that will even freak me out. So I can believe that people who advocate for limiting general purpose computers will find receptive audience for their positions. But just as we saw with the copyright wars, banning certain instructions, or protocols, or messages, will be wholly ineffective as a means of prevention and remedy; and as we saw in the copyright wars, all attempts at controlling PCs will converge on rootkits; all attempts at controlling the Internet will converge on surveillance and censorship, which is why all this stuff matters. Because we’ve spent the last 10+ years as a body sending our best players out to fight what we thought was the final boss at the end of the game, but it turns out it’s just been the mini-boss at the end of the level, and the stakes are only going to get higher.
[[1627.8]] As a member of the Walkman generation, I have made peace with the fact that I will require a hearing aid long before I die, and of course, it won’t be a hearing aid, it will be a computer I put in my body. So when I get into a car — a computer I put my body into — with my hearing aid — a computer I put inside my body — I want to know that these technologies are not designed to keep secrets from me, and to prevent me from terminating processes on them that work against my interests. [vigorous applause from audience] Thank you.
[[1669.4]] Thank you. So, last year, the Lower Merion School District, in a middle-class, affluent suburb of Philadelphia found itself in a great deal of trouble, because it was caught distributing PCs to its students, equipped with rootkits that allowed for remote covert surveillance through the computer’s camera and network connection. It transpired that they had been photographing students thousands of times, at home and at school, awake and asleep, dressed and naked. Meanwhile, the latest generation of lawful intercept technology can covertly operate cameras, mics, and GPSes on PCs, tablets, and mobile devices.
[[1705.0]] Freedom in the future will require us to have the capacity to monitor our devices and set meaningful policy on them, to examine and terminate the processes that run on them, to maintain them as honest servants to our will, and not as traitors and spies working for criminals, thugs, and control freaks. And we haven’t lost yet, but we have to win the copyright wars to keep the Internet and the PC free and open. Because these are the materiel in the wars that are to come, we won’t be able to fight on without them. And I know this sounds like a counsel of despair, but as I said, these are early days. We have been fighting the mini-boss, and that means that great challenges are yet to come, but like all good level designers, fate has sent us a soft target to train ourselves on — we have a organizations that fight for them — EFF, Bits of Freedom, EDRi, CCC, Netzpolitik, La Quadrature du Net, and all the others, who are thankfully, too numerous to name here — we may yet win the battle, and secure the ammunition we’ll need for the war.
Misconceptions in AI: Or why Watson can’t talk to Siri — Tech News and Analysis
Siri and Watson will never date… alas alack.
Last night I was schooled at playing Jeopardy by Watson in an exhibition match at the Computer History Museum, and discovered that despite our fear of the robot overlords, humans are much smarter than we think. Case in point: Watson could never use Apple’s personal assistant Siri.
While both services seemingly understand what we’re saying to them and can respond with amazingly functional or accurate answers, the truth is they are both still programmed for specific tasks and could never actually converse with one another or a human outside of a narrow context. So Watson can’t take dictation, and Siri can’t play Jeopardy. Understanding why shows how far we have to go when it comes to true artificial intelligence and those fears of the robots taking over.
As David Ferrucci, the guy at IBM behind Watson’s creation, explained during a conversation before the match, that as intuitive as the interactions with Siri or Watson appear to us, they are fundamentally task oriented. So the questions that Watson gets are in effect “translated” not just into the zeros and ones of digital signals but also to a series of words that are then broken down into related concepts.
After that point, Watson tries to ascribe “meaning” to those contexts based on searches of unstructured data to derive an answer. It then determines which answer is most likely to be correct, and how confident it is in that top answer, because in Jeopardy if someone guesses the wrong answer they are penalized. Thus Watson’s tasks are figuring out the context associated with a question, figuring out which answer is the likeliest based on that context and if it is confident enough in its probabilities to bother to answer.
Siri, on the other hand, does two important things, it recognizes speech (Watson actually doesn’t understand speech, but is fed a text version of the question) and it can figure out what steps to take in a limited number of applications once it understands the words in a natural language process that is related to the process by which Watson does. The sense from IBMers (unsurprisingly) is that Siri doesn’t have the natural language depth that Watson does. Siri certainly doesn’t have the computing horsepower behind it (2,880 processor cores and 15 terabytes of RAM), or the 100 GB of text data that Watson uses to figure out how different words relate to each other.
The net result of their differences is that not only could Siri and Watson not communicate because each relies on different input methods, but even if they could, their tasks are fundamentally far apart. Both have an ability to do natural language processing, but one uses that skill to find related information and figure out what information is most correct, while the other uses it to open applications and perform a set number of tasks.
So while Alan Turning proposed that the best test of artificial intelligence is that a human can’t tell if it is a computer he or she is interacting with, it may be more accurate to say the best test will be creating a machine that can not only understand natural language like Siri and Watson can, but then has Watson’s ability to then determine the best course of action and then Siri’s ability to take that action.
Oracle’s High-End Path to Public Cloud (IDEAS Insights)
Ideas International looks at Oracle’s high-end cloud play and at the overarching market for premium, workload-optimized systems.
When Oracle released Exalogic a few years ago, it was billed as a private cloud-in-a-box. At OpenWorld, Oracle doubled down on the plug-and-play features of Exalogic and Exadata by having these systems serve as the backbone for its new public cloud offering. Recently, IDEAS published a blog post offering a broader definition of mainframes as an integrated solution stack. By this definition, Oracle is eschewing commodity solutions, the architecture on which many existing cloud services are based, in favor of a high-end solution as the basis for its public cloud (see graphic below). Computing as a service has been successfully monetized (recently confirmed), but by taking the high-end approach, Oracle is trading larger potential sales volumes for a niche customer base. The strategy may prove to be as profitable, but it is not without risk.
The above taxonomy defines three types of server platform solutions, but in truth server architecture today is a spectrum along which we highlight three key points. On the one end is the commodity solution. This is the setup that is typical in most data centers for cloud-based workloads. Somewhere in the middle, further integration of hardware and software can be applied. Such converged server platforms, which include HP Matrix and Cisco UCS, have been a hot topic in recent years.
On the opposite end, a solution integrates software and hardware into a single smoothly functioning entity. Such a high-end solution is what Oracle plans to sell as its public cloud, incorporating Oracle’s Exadata and Exalogic platforms with Fusion software and other application layers. At first glance, this may seem unexpected, but as long ago as 1999 commentators were noting that Larry Ellison’s preference for a high-end solution stack was pretty clear:
“Oracle chairman CEO Larry Ellison is taking aim at Microsoft’s core enterprise strategy, mounting an attack on client/server computing, which he describes as an evolutionary dead end, and more specifically taking a pop at Microsoft’s ‘servers everywhere’ distributed computing model. …x86 server isn’t yet sufficiently scalable to rival centralised computing models based on Unix, mid-range and mainframe models… and that’s why Ellison is finally attacking the right target…distributed computing tends towards the chaotic and expensive.”
For Oracle, the verdict on how to best package hardware was in long ago. In the meantime, of course, Microsoft strategy did not fail. But mainframes did not fail either. Cassandra predictions by others about the dim future of the mainframe have been overzealous. IBM still sells plenty of big iron, and the mainframe ecosystem remains a solid, albeit cyclical, business.
A true metric to determine the winner of the commodity vs. high-end solution contest would normalize the amount of compute cycles (or I/O) that are now being delivered by high-end solutions vs. total compute cycles (or I/O). By that measure, high-end solutions have been losing ground quickly over the years to commodity solutions.
High-end systems with tightly integrated software have traditionally been applied for very specific tasks, such as:
- High-performance computing (HPC), which employed specialized hardware designs in the past, but which is now increasingly based on commodity hardware.
- Business-critical systems (in many cases, but not all).
- Code that needs to have an edge to beat competitors. An example of this would be hedge funds that pay big money to experts in multithreaded and parallel programming.
Android apps are an example of a commodity application layer at the low end, in consumer technology. Android applications have to run on many different chips, OS flavors, and hardware. Therefore, investments in optimizing an Android app may be a losing proposition for developers, because that app could be buggy or useless on nonoptimized devices.
On the high end, investment bank hedge funds may hire programmers to make sure that each instruction is matched to a thread of a specific Intel processor (Intel even has a special suite of tools to help enable this degree of optimization). Granted, the hedge fund programmer is working on a x86 platform. However, not everyone can afford a team of high-priced experts. The next best thing may be the high-end solution, in which software has been preintegrated and optimized for a specific hardware platform.
Most public cloud vendors build their services on commodity hardware, because part of their added value is the development of a software stack that provides meaningful differentiation from competitors. This differentiation is essential for avoiding true commoditization, in which the cloud service competes on nothing other than price. There may be a few exceptions where a vendor has some more exotic hardware offerings in the mix, but the majority are built on x86 servers, together with a hodgepodge of applications. This environment lends itself well to being partitioned into compute instances that are sold cheaply.
However, in finally embracing public cloud computing, Oracle faces a dilemma. On the one hand, it cannot afford to cannibalize its own margins on Exadata and Exalogic hardware, Fusion software, its enterprise database, and so forth to customers for on-premise deployment. Therefore, its public cloud offering cannot be too cheap in absolute terms. On the other hand, its cloud offering needs to promise customers genuine benefits in terms of price-performance and other metrics. These two forces are driving much of the risk in Oracle’s business maneuver – will end users choose to deploy Oracle high-end server platform solutions on premises, or purchase access to these systems from Oracle’s public cloud on a utility basis?
Oracle’s databases lead in market share, and Oracle’s middleware sales are second only to IBM’s. Big corporations, especially financial institutions, have traditionally been loyal customers of premium architectures. By accepting that not every startup is going to want to use an Exa-based cloud with an expensive relational database management system, Oracle is forsaking low margins and high volumes in favor of pursuing opportunities with high-end enterprise customers. Oracle’s approach is also a novel way to sell the Exadata and Exalogic to new clients, who run plenty of high-end solutions, but not necessarily on those two systems.
Oracle’s high-end cloud solutions may appeal to fewer customers than other public cloud platforms, and the analyst community may not fully understand the move. However, there is little doubt what the company is trying to achieve: as Ellison seemed to be predicting in 1999, Oracle will try to capture the enterprise with its own centralized, integrated hardware architecture and applications stack – deployed either on premises, or off.
ARM vet: The CPU's future is threatened • The Register
One of the original innovators behind the ARM platform discusses troubled waters ahead for the microprocessor industry - and it’s not simply a matter of the approaching limits of Moore’s law. The growing complexity of computing is making today’s “hard” problems far harder than the “hard” problems of the past. At the same time, growing specialization in the market is actually shrinking the number of players who can tackle these sorts of problems and pushing out the incentive to do so.
ARM’s employee number 16 has witnessed a steady stream of technological advances since he joined that chip-design company in 1991, but he now sees major turbulence on the horizon.
“I don’t think the future is going to be quite like the past,” Simon Segars, EVP and head of ARM’s Physical IP Division, told his keynote audience on Thursday at the Hot Chips conference at Stanford University, just north of Silicon Valley.
“There may be trouble ahead.”
The microprocessor industry has enjoyed an almost unbroken streak of improvements, Segars said, citing advances in silicon manufacturing techniques, power reduction, and gadget-size and gadget-cost shrinkage – he brought along a 1983, $3,995 Motorola DynaTAC as a prop.
But the landscape is changing. The low-hanging fruit has been picked, and a new way of thinking will be needed to provide the world with the squillions of low-cost, low-power microprocessors that the increasingly mobile computing ecosystem requires – not to mention the everything-connected world described by the current buzz-phrase: “The internet of things”.
Harkening back to when he joined ARM, Segars said: “2G, back in the early 90s, was a hard problem. It was solved with a general-purpose processor, DSP, and a bit of control logic, but essentially it was a programmable thing. It was hard then – but by today’s standards that was a complete walk in the park.”
He wasn’t merely indulging in “Hey you kids, get off my lawn!” old-guy nostalgia. He had a point to make about increasing silicon complexity – and he had figures to back it up: “A 4G modem,” he said, “which is going to deliver about 100X the bandwidth … is going to be about 500 times more complex than a 2G solution.”
The way that the 4G-modem problem will be solved, Segars said, will be by throwing a ton of dedicated DSP processing engines at it – which will, of course, require a lot of silicon real estate.
“But that’s not so bad,” he said, “because silicon’s being scaled the whole time. But it’s going to eat a lot of power, and power is the real problem.”
ARM is a mobile-processor company, and mobile processors run on batteries – and Segar said that the power required to juice increasingly complex silicon is a system-level challenge. “The reason for that,” he said, “is because batteries are pretty rubbish, really.”
As silicon technologies have improved in comparative leaps and bounds, batteries haven’t. “Historically,” Segar said, “battery power has grown about 10 or 11 per cent per year, which unfortunately is not very well-matched with Moore’s law.”
Moore’s law on trial
But as big a problem as batteries are in the mobile market, there are much more fundamental challenges that will make the future of the microprocessor market different from its past.
For one, the complexity of chip design and the intricacies of the physics involved is increasing, making design a much riskier and more demanding process. “And you really, really need to worry about that,” Segars said.
The reason for that worry is risk and cost. “The cost of your tape-out is going to be astronomical,” he said. “When you’ve written that check for a million or two million dollars for your [chip-making lithography] masks, you want to hope that chip works. So the effort going into validating and verifying a design has gone up by orders of magnitude.”
Another challenge that Segars sees is caused by the increasing stratification of the semiconductor industry – a development that has brought many benefits, but which also has its downside.
“When prople first started building semiconductor devices,” he said, “they did everything themselves in fully vertically integrated companies. People had fabs, product design, manufacturing, case design – they did the whole thing themselves.”
That has changed over the years – mostly for the better, from Segars’ point of view – and now the industry is filled with companies specializing in various areas such as design, IP, electronic design automation (EDA), packaging, chip-baking, and so on.
This specialization has been great for spreading the costs and risks around, and for taking advantage of the economies of scale – TSMC and Global Foundries, for example, produce silicon for many different fabless chip designers.
Dying arts and fading fabs
But with more companies focusing on design rather than manufacture, Segars sees a danger. “The skills you need to close out the timing at the transistor level are becoming a dying art,” he warned.
“As we go forward, and start worrying about very exotic processes that we’re going to have to deal with in the future, those transistor skills are going to need to become very, very important once again,” he cautioned. “And as a designer, you’re going to have to worry about everything – from architecture down to transistors.”
But no matter how “disaggregated” the semiconductor industry becomes, eventually somebody has to actually manufacture the chips themselves. Segars reminded his audience of the famous quote from T.J. Rodgers of Cypress Semiconductor that “Real men have fabs,” but then pointed out the obvious fact that there are far fewer fab companies around than there used to be – and that such shrinkage in the manufacturing base poses its own problems.
As chip-baking process sizes shrink, Segars said, “We’ve seen the cost of developing processes go up and up and up, and now it costs you billions of dollars – as I’m sure everybody knows – to develop a new process. It costs you billions of dollars to buy all the equipment for it, and so fewer prople are doing it.”
From a customer’s point of view, a small number of strong, efficient, advanced fabs is not a problem – in fact, the same economies of scale that make industry disaggregation a good thing make fewer, busier fabs a good thing.
Four customers does not a market make
But there is one potentially troubling problem, Segars said – and it’s not for the fabs or their customers, it’s for equipment suppliers. “The physics problems that you have to solve when you go to smaller and smaller geometries are getting harder and harder to solve, and so the cost of the equipment that you need for the next generation process goes up and up.”
That price inflation is not in and of itself the real market problem, though. “The problem is that the supply chain that builds that equipment for the foundries has a bit of trouble dealing with its return on investment.”
Simply put, the foundry-equipment market is shrinking. “When the world moved from 200 millimeter wafers to 300-millimeter wafers, if you were [fab-equipment manufacturer] ASML or somebody like that, you had a whole lot of customers that you could go and sell that equipment to.”
Now, however, the move to 450-millimeter wafers is the new hotness – which is great for economies of scale at the foundry level, but not so great for foundry-equipment makers. “There’s only going to be about four guys who are going to build those size wafers,” Segars said, “so if you’re doing all the R&D and your customer size is four, that is a bit of a problem.”
Moore’s law repealed
Finally, there’s the looming problem of the future of silicon process-size shrinkage: it can’t go on forever. One obvious limit, as Segars pointed out, is the .27-nanometer diameter of the silicon atom itself – as process sizes shrink down to, say, 14nm and below, you’re only talking about dozens of atoms per transistor gate.
But there are plenty of other challenges to be met before you’ll count the number of silicon atoms in a gate on your fingers and toes – namely, what lithography techniques can take you well below 20nm?
At the 20nm level, Segars said, “the problem is that you need to introduce double patterning.” Using two sets of masks to accomplish what one set could do at larger process sizes not only increases mask costs, but also slows down the throughput of the manufacturing.
If you want to keep the throughput at the same rate, you have to buy more equipment – which might make the ASML’s of the world happier, but driving up chip costs is not a good thing in a world that includes developing nations whose citizens are hankering to join the mobile world.
There’s one long-sought technology about which Segars remains a bit sceptical. “At 14 [nanometers] and below, what you really want is EUV,” he said, referring to extreme ultraviolet lithography, which has long been seen as a possible solution to the process-shinking problem. EUV’s promise comes from the fact that it’s based on 13.5nm wavelength light – one hell of a lot more precise than the 193nm light that Segars said is used in today’s visible-light lithography.
“The problem is,” he said, “that [EUV] is really, really hard to make. You’ve got to make a plasma out of tin atoms, and then shoot it with a laser, and some light comes out – but the light’s really weak, and it gets absorbed by everything. So generating enough of it to economically build chips is very, very hard.”
After the endgame: core teamwork
But that’s not to say that there’s a dead-end on the road we’re travelling. Segars’ vision of the future jibes with the one described by fellow ARMian Jem Davies, the company’s vice president of technology, when speaking at AMD’s Fusion Summit this June – namely, that heterogeneous computing systems are the Next Big Thing.
Simply put, heterogeneous computing systems distribute a workload to various and sundry specialized compute engines – CPU, GPU, video, encryption, baseband, whatever – so that individual sub-tasks are completed efficiently by dedicated hardware best suited to them.
“I think the future of processing is heterogeneous multiprocessing,” Segars said, “… dedicated engines arranged in various clusters with a software layer that can understand the underlying hardware, and make sure that if it’s not needed, it’s shut off, it’s not leaking, to preserve that battery.”
There are a host of challenges to achieving the holy heterogeous grail, of course – not the least of which being keeping all the various cores in close communication, and optimally data-coherent.
To that end, ARM’s upcoming Cortex-A15 compute core – which will likely appear in early 2013 – will introduce a cache coherent interconnect that will enable full coherency among multiple CPU clusters. Segars also projects that by 2015, coherency in ARM-based SoCs and systems will be limited not only to CPUs, but will also allow full “where’s that data?” transparency among CPUs, GPUs, and specialized engines.
Full coherence, however, brings with it its own set of challenges, such as unwanted latency when far-flung cores and engines need to share the same data, but ARM, AMD, and Intel are all looking into how different approaches to coherency can help – or hinder – heterogeneity.
A lot has changed in the microprocessor world since the Intel 4004 appeared 40 years ago this November. By and large, the arc of improvement has been relatively straightforward, with improvements in process size, processing power, and miniaturization being fairly regular – achieved through one hell of a lot of work, to be sure, but regular nonetheless.
There’s been a lot of talk recently about the “post PC era”. From Segars’ point of view, however, we may also soon be talking about the “post–Moore’s Law” era – a time when computing advances are no longer measured in transistor counts per square millimeter, but rather in how quickly, intelligently, and cooperatively different cores and engines can communicate.
Box o' Flashy virtual machines offered • The Register
An interesting example of workload-optimization for virtualized environments using flash storage.
Startup Astute Networks stores VMware virtual machines on its dedicated ViSX G3 flash SAN – leaving bulk data to be stored on standard drive arrays.
Astute claims that “virtualised storage is too slow to support SAP, Oracle, and SQL databases; virtual machine performance is inadequate to meet Microsoft Exchange and SharePoint service levels and user mailbox loads; and backing up or recovering virtualised data stores is unacceptably slow.” Its ViSX G3 is its answer to these problems.
The ViSX G3 is a 3U box connecting to servers with iSCSI on one or 10 gig Ethernet. It comprises flash memory and an Astute network processor, and provide 80,000 IOPS for up to 64 virtual machines (VMs) which could each host a virtual desktop. These could be on one or more physical servers. The company says that VMs which require capacity-optimised data stores remain on disk-based storage.
Astute says the product has a DataPump Engine which includes the networks processor, custom hardware and software. It accelerates TCP/IP, virtualises iSCSI data store traffic, and provides RAID facilities to speed performance and protect flash contents…
The VMware-certified ViSX G3 boxes are managed, Astute says, as virtualised storage via vCenter or VMware-certified 3rd party software. Their use removes VM I/O from drive arrays, freeing up bandwidth for data access.
This is a novel approach to speeding up drive array I/O, and focusses exclusively on VMs that are I/O-bound. It is not a cache, unlike Avere and Alacritech filer accelerators, and neither is it a general, primary data, all-flash array like the Nimbus Data product. It is a highly specific, flash-based SAN for I/O-bound VMware virtual machines only.
The ViSX G3 1200 is priced at $29,000; the ViSX G3 2400 is at $59,000; and the ViSX G3 4800 costs $94,000. Options include extended warranty, expert support and on-site service, and are available now.
Some Scientists Fear Computer Chips Will Soon Hit a Wall - NYTimes.com
Speculation on the end of Moore’s law in the New York Times. There is an interesting quote from Intel that seems to lend credence to workload-optimized approaches to processing.
For decades, the power of computers has grown at a staggering rate as designers have managed to squeeze ever more and ever tinier transistors onto a silicon chip — doubling the number every two years, on average, and leading the way to increasingly powerful and inexpensive personal computers, laptops and smartphones.
Now, however, researchers fear that this extraordinary acceleration is about to meet its limits. The problem is not that they cannot squeeze more transistors onto the chips — they surely can — but instead, like a city that cannot provide electricity for its entire streetlight system, that all those transistors could require too much power to run economically. They could overheat, too.
The upshot could be that the gadget-crazy populace, accustomed to a retail drumbeat of breathtaking new products, may have to accept next-generation electronics that are only modestly better than their predecessors, rather than exponentially faster, cheaper and more wondrous.
Simply put, the Next Big Thing may take longer to arrive.
“It is true that simply taking old processor architectures and scaling them won’t work anymore,” said William J. Dally, chief scientist at Nvidia, a maker of graphics processors, and a professor of computer science at Stanford University. “Real innovation is required to make progress today.”
A paper presented in June at the International Symposium on Computer Architecture summed up the problem: even today, the most advanced microprocessor chips have so many transistors that it is impractical to supply power to all of them at the same time. So some of the transistors are left unpowered — or dark, in industry parlance — while the others are working. The phenomenon is known as dark silicon.
As early as next year, these advanced chips will need 21 percent of their transistors to go dark at any one time, according to the researchers who wrote the paper. And in just three more chip generations — a little more than a half-decade — the constraints will become even more severe. While there will be vastly more transistors on each chip, as many as half of them will have to be turned off to avoid overheating.
“I don’t think the chip would literally melt and run off of your circuit board as a liquid, though that would be dramatic,” Doug Burger, an author of the paper and a computer scientist at Microsoft Research, wrote in an e-mail. “But you’d start getting incorrect results and eventually components of the circuitry would fuse, rendering the chip inoperable.”
The problem has the potential to counteract an important principle in computing that has held true for decades: Moore’s Law. It was Gordon Moore, a founder of Intel, who first predicted that the number of transistors that could be nestled comfortably and inexpensively on an integrated circuit chip would double roughly every two years, bringing exponential improvements in consumer electronics.
If that rate of improvement lags, much of the innovation that people have come to take for granted will not happen, or will happen at a much slower pace. There will not be new PCs, new smartphones, new LCD TVs, new MP3 players or whatever might become the new gadget that creates an overnight multibillion-dollar industry and tens of thousands of jobs.
In their paper, Dr. Burger and fellow researchers simulated the electricity used by more than 150 popular microprocessors and estimated that by 2024 computing speed would increase only 7.9 times, on average. By contrast, if there were no limits on the capabilities of the transistors, the maximum potential speedup would be nearly 47 times, the researchers said.
Some scientists disagree, if only because new ideas and designs have repeatedly come along to preserve the computer industry’s rapid pace of improvement. Dr. Dally of Nvidia, for instance, is sanguine about the future of chip design.
“The good news is that the old designs are really inefficient, leaving lots of room for innovation,” he said.
But other experts, not connected with Dr. Burger’s research, acknowledged that he and the paper’s other authors — from the University of Texas, the University of Washington and the University of Wisconsin — had put their finger on a real problem.
Shekhar Y. Borkar, a fellow at Intel Labs, called Dr. Burger’s analysis “right on the dot,” but added: “His conclusions are a little different than what my conclusions would have been. The future is not as golden as it used to be, but it’s not bleak either.”
Dr. Borkar cited a variety of new design ideas that he said would help ease the limits identified in the paper. Intel recently developed a way to vary the power consumed by different parts of a processor, making it possible to have both slower, lower-power transistors as well as faster-switching ones that consume more power.
Increasingly, today’s processor chips contain two or more cores, or central processing units, that make it possible to use multiple programs simultaneously. In the future, Intel computers will have different kinds of cores optimized for different kinds of problems, only some of which require high power.
And while Intel announced in May that it had found a way to use 3-D design to crowd more transistors onto a single chip, that technology does not solve the energy problem described in the paper about dark silicon. The authors of the paper said they had tried to account for some of the promised innovation, and they argued that the question was how far innovators could go in overcoming the power limits.
“Where things fall between the two is a matter of opinion,” Dr. Burger said.
Chip designers have been struggling with power limits for some time. A decade ago it was widely assumed that it would be straightforward to increase chips’ clock speed, or the rate at which it makes calculations. Then the industry hit a wall at around three gigahertz, when the chips got so hot that they began to melt. That set off a frantic scramble for new designs.
Today, some of the pioneering designers believe there is still plenty of room for innovation. One of them, David A. Patterson, a computer scientist at the University of California, Berkeley, called dark silicon a “real phenomenon” but said he was skeptical of the authors’ pessimistic conclusions.
“It’s one of those ‘If we don’t innovate, we’re all going to die’ papers,” Dr. Patterson said in an e-mail. “I’m pretty sure it means we need to innovate, since we don’t want to die!”
IBM tunes Xeon E7 appliances for VMware hypervisor • The Register
Wow! Actual workload-optimized machines from IBM. The technical details are pretty hairy, but it’s great to see actual, physical evidence of the vision.
he mantra at IBM these days is workload-optimized systems, and the company is trying to make the sales pitch easier for itself and its reseller partners by tuning up server configurations to support specific workloads right out of the cardboard box.
IBM has taken the new “Westmere-EX” Xeon E7 processors announced by Intel back in April and dropped them into configurations of its BladeCenter HX5 blade and System x3950 X5 and x3690 X5 rack machines preconfigured with VMware’s ESXi hypervisor for server virtualization.
To boost the memory capacity of the two-socket HX5 blade, which has 16 DDR3 main memory slots, so it can better support server virtualization, IBM is adding the MAX5 memory expansion blade, which snaps into the HX5 blade using IBM’s own eX5 chipset and which provides another 24 memory slots. The HX5 blade is configured with two Xeon E7-2830 processors, which have eight cores running at 2.13GHz and 24MB of L3 cache on chip. IBM is putting 40 of its 8GB memory sticks in the blade and memory expansion unit, filling up all of the available slots, for a total of 320GB of main memory across the 16 cores in the blade.
That works out to an average of 20GB per core, and that is a bit fat, and the wonder is why IBM is not using 4GB sticks. IBM is now supporting 16GB memory sticks on the HX5 blades, but because these are more expensive, IBM is using skinnier memory and the MAX5 expansion unit instead. IBM is also tossing in its Virtual Fabric Adapter for virtualizing I/O on the HX5 blade. VMware’s freebie ESXi 4.1 hypervisor is installed on USB sticks on the blade, which has room for two 50GB solid state disks or a variety of 2.5-inch hard disk drives…
IBM has also jigged a System x server tuned to run SAP’s High-Performance Analytic Appliance (HANA) software, a kind of bolt-on, in-memory data mart to speed up queries on SAP applications without whacking ERP and CRM application software running in the back-end.
In this case, IBM chose the two-socket System x3690 X5 server, which has 32 memory slots. IBM is configuring the HANA appliance with two Xeon E7-2870 processors (2.4GHz with 30MB of L3 cache on chip) and eight 16GB memory sticks for a total of 128GB. IBM is also adding in eight 50GB solid state disk and eight 300GB 10K RPM SAS disks to the server to hold the data that queries run against. IBM presumably started with 16GB memory sticks because customers will want to substantially beef up the memory capacity on the HANA appliance to keep more data in memory and speed up access, and they would get pretty grumpy if they bought 8GB sticks to only find they need to toss them out to get fatter memory. The SSDs are mirrored in groups of four drives and the disks are striped with RAID 5 protection.
The HANA appliance server is configured with a special edition of SUSE Linux Enterprise Server 11 that is tweaked to run SAP applications and IBM’s General Parallel File System. The HANA server includes the in-memory appliance, which consists of SAP’s Sybase Application Server 15 database, SAP Host Agent 7.2, Apache Tomcat 5.5, and Perl 5.8, and this stack is preconfigured on the machine. Customers have to buy a license from SAP to activate this software.
Under the Hood at Google and Facebook - IEEE Spectrum
The entire article is worth reading for it’s discussion of Google’s and Facebook’s differing philosophies on datacenter construction and policy, but the part that stood out for me was the anti-workload optimized approach to servers that both competitors follow.
Giant data centers—even energy-efficient ones—are, of course, nothing without the proper servers. Facebook will be populating its Oregon and North Carolina locations with custom-designed servers, just as Google has long done.
Facebook’s Amir Michael, manager of hardware design, explains that when the company decided to build its own facilities, “we had a clean slate,” which allowed him and his colleagues to optimize the designs of their centers and servers in tandem for maximum energy efficiency. The result was a server that “looks very bare bones. I call it a ‘vanity-free’ design just because I don’t like people to call it ugly,” says Michael. “It has no front bezels. It has no paint. It has no logos or stickers on it. It really has only what is required.”
Google also keeps server frills to a minimum. Like Facebook, it buys commodity-level computing hardware and just fixes the many pieces that break, instead of purchasing high-end systems that are less prone to failure but also much more expensive. Economics, if nothing else, drove engineers at both companies to similar conclusions here. Fit and finish might count if you’re buying one server or even a hundred, but not when you’re shopping for tens of thousands at a time. And striving for high reliability is a little pointless at this scale, where failure is not only an option, it’s a daily fact of life.
Facebook’s Michael explains that he helped design three basic types of servers for running the Facebook application. The top layer of hardware, connected most directly with Facebook’s many users, consists of outward-facing Web servers. They don’t require much disk space—just enough for the operating system (Linux), the basic Web-server software (which until recently was Apache), the code needed to assemble Facebook pages (written in PHP, a scripting language), some log files, and a few other bits and pieces. Those machines are connected to a deeper layer of servers stuffed with hard disks and flash-based solid-state drives, which provide persistent storage for the giant MySQL databases that hold Facebook users’ photos, videos, comments, and friend lists, among other things. In between are RAM-heavy servers that run a memcached system to provide fast access to the most frequently used content.
Alpha geeks will recognize that these pieces of software—Linux, Apache, PHP, MySQL, memcached—all hail from the open-source community. Facebook’s programmers have modified these and other open-source packages to suit their needs, but at the most basic level, they are doing exactly what countless Web developers have done: building their site on an open-source foundation.
Not so at Google. Programmers there have written most of their company’s impressive software from scratch—with the exception of the Linux running on its servers. Most prominent are the Google File System (or GFS, a large-scale distributed file system), Bigtable (a low-overhead database), and MapReduce (which provides a mechanism for carrying out various kinds of computations in a massively parallel fashion). What’s more, Google’s programmers have rewritten the company’s main search code more than once.
Speaking two years ago at the Second ACM International Conference on Web Search and Data Mining, Jeff Dean, a Google Fellow working in the company’s system infrastructure group, said that over the years his company has made seven significant revisions to the way it implements Web search. However, outsiders don’t realize that, because, as Dean explained, “you can replace the entire back end without anyone really noticing.”
How are we to interpret the difference between Google’s and Facebook’s engineering cultures with respect to the use of open-source code? Part of the answer may just be that Google, having started earlier, had no choice but to develop its own software, because open-source alternatives weren’t yet available. But Steve Lacy, who worked as a software engineer for Google from 2005 to 2010, thinks otherwise. In a recent blog post, he argues that Google just suffers from a bad case of not-invented-here syndrome. Many open-source packages “put Google infrastructure to shame when it comes to ease of use and product focus,” writes Lacy. “[Nevertheless, Google] engineers are discouraged from using these systems, to the point where they’re chastised for even thinking of using anything other than Bigtable/Spanner and GFS/Colossus for their products.”
Reprogrammable Chips Could Enable Instant Gadget Upgrades - Technology Review
IBM products like Netezza use a type of reprogrammable chip - a field programmable gate array (FPGA) - to drive huge performance gains in common analytic operations. A new company called Tabula offers an alternative to FPGAs that could make reprogrammable chips more common - even in consumer technologies.
Kevin Skadron, a professor at the University of Virginia, says the UCSD strategy is a good fit with smart phones, because apps are tightly integrated with a smart phone’s OS. “They are wise to target Android,” he says, “because on a phone the OS is responsible for a huge amount of the work done by the processor. That means every user of every phone will benefit from their specialized cores.” Phones with GreenDroid-style processors can be expected to last longer than conventional phones with the same battery, or to have the same lifetime with a sleeker design, he says.
However, the specialized hardware of this approach has drawbacks that make it less useful for nonmobile devices, says Skadron. “It’s more challenging with a PC or server, because the operating system has less effect on what the processor does. The applications on top of that are most important, and they vary a lot more between users.”
If programmable chips were more powerful and less costly they could be used in more devices, in more creative ways, says Steve Teig, founder and chief technology officer of Tabula. His company’s reprogrammable design is considerably smaller than that of an FPGA. “FPGAs are very expensive because they are large pieces of silicon,” says Teig, “and silicon [wafer] costs roughly $1 billion an acre.” The time it takes for signals to traverse the relatively large surface of an FPGA also limits its performance, he says.
“It’s like being inside a very large, one story building—the miles of corridors slow you down,” he says. As with a building, stacking layers of circuit on top of each other helps, by providing a shortcut between floors, says Teig. But unfortunately, the technology needed to build stacked, 3-D chips is still restricted to research labs. Instead Teig found a way to make a chip with just one level behave as if it were eight different ones stacked up.
“Imagine you walked into the elevator in a building and then walked back out, and that I rearranged the furniture quickly while you were in there,” says Teig. “You would have no way to tell you weren’t on a different floor.” Tabula’s chips perform the same trick on the data they process, cycling between up to eight different layouts at up to 1.6 billion times per second (1.6 Gigahertz). Signals on the chip encounter those different designs in turn, as if they were hopping up a level onto a different chip entirely. “From its behavior, our [design] is indistinguishable from a stack of chips,” says Teig, who calls the virtual chip layers “folds.”
That brings speed advantages, because signals don’t have to travel a long way across the surface of a chip to reach new part of circuit, as they do on an FPGA. When the chip loads a new fold, new circuitry appears in place of the old. Teig estimates that the footprint of a Tabula chip is less than a third of an equivalent FPGA, making it five times cheaper to make, while providing more than double the density of logic and roughly four times the performance.
As with FPGAs, Tabula’s chips contain arrays of many identical basic building blocks that can be programmed to implement any logic function. A memory store on the chip manages the different configurations that the chip cycles through…
Making the reconfigurable approach cheaper could enable even consumer electronics to ship with programmable chips, making it possible for them to be upgraded with new design tweaks. That approach is currently used only in some expensive equipment such as cell-phone base stations. “Sony could say, ‘look at what our competitor Toshiba did’, and upgrade the chips inside their TVs to provide new features,” says Teig. “Getting to digital cameras or TVs is definitely within reach.”
However, Rich Wawzyrniak, who tracks FPGAs and related technology for analyst firm Semico Research, points out that there are limitations to this approach. “The power consumption if these devices is relatively high, and likely too much for a device like a phone,” he says.
But ultimately, says DeHon, reconfigurable chips should morph their design even more often, shifting their workings to match the task in hand in a blend of software and hardware. “These things are really platforms that can run any computation. The grand vision is that we come up with a way for a program’s code to be mapped to the chip when it runs.”
App-Specific Processors to Fight Dark Silicon - Technology Review
Other people are working on workload-optimized processing techniques. Here’s an interesting example in the mobile space - and a discussion of “dark silicon.”
A processor etched with circuits tailored to the most widely used apps on Android phones could help extend the devices’ battery life. Researchers at the University of California, San Diego have created software that scans the operating system and a collection of the most popular apps and then generates a processor design tailored to their demands. The result can be 11 times more efficient than today’s typical general-purpose smart-phone chip, says Michael Taylor, who leads the GreenDroid project with colleague Steven Swanson.
“Chip design for mobile phones needs rethinking for two reasons,” says Taylor. “One is to improve their use of the limited energy available to a phone, and the other is to attack a problem called dark silicon, which is set to make conventional chip designs even less efficient.”
“Dark silicon” is a portion of a microchip that is left unused. Although uncommon today, dark silicon is expected to become necessary in two or three years, because engineers will be unable to reduce chips’ operating voltages any further to offset increases in power consumption and waste heat produced by smaller, faster chips…
Taylor and Swanson’s GreenDroid design sidesteps this by surrounding a processor’s main core—the part of a chip that executes instructions—with 120 smaller ones that each take care of one piece of code frequently needed by the apps used most on a phone. Each core’s circuits closely mimic the structure of the code on which they are based, making them up to 10,000 times more efficient than a general-purpose processor core performing the same task. “If you fill the chip with highly specialized cores, then the fraction of the chip that is lit up at one time can be the most energy efficient for that particular task,” Taylor says.
Rather than manually translating source code into processor cores, the UCSD team has developed software to do it. They record the computational demands of the Android OS when running popular apps for e-mail, maps, video, and the Web radio service Pandora, among others, and from that information, the software generates the GreenDroid chip design.
Because around 70 percent of that code is shared between multiple apps or parts of the OS, a GreenDroid’s specialized cores can handle much of a phone’s most energy-sapping work. Detailed simulations of a complete GreenDroid processor prove its superior efficiency, says Taylor. “We’re sending the first design off to be fabricated in June and have designed a board so we can plug it in, install Android and apps, and then benchmark against conventional designs,” he says.
Having a custom processor fabricated is extremely expensive and rare in academia. The chip will use transistors smaller than those currently on the market, with feature sizes as small as 28 nanometers. Processors with 32-nanometer features have only recently reached the market, and it is in the next generation, at 22 nanometers, that dark silicon is expected to become a serious challenge.
Kevin Skadron, a professor at the University of Virginia, says the UCSD strategy is a good fit with smart phones, because apps are tightly integrated with a smart phone’s OS. “They are wise to target Android,” he says, “because on a phone the OS is responsible for a huge amount of the work done by the processor. That means every user of every phone will benefit from their specialized cores.” Phones with GreenDroid-style processors can be expected to last longer than conventional phones with the same battery, or to have the same lifetime with a sleeker design, he says.
However, the specialized hardware of this approach has drawbacks that make it less useful for nonmobile devices, says Skadron. “It’s more challenging with a PC or server, because the operating system has less effect on what the processor does. The applications on top of that are most important, and they vary a lot more between users.”