PeteSearch: Why Facebook's data will change our world
What makes Facebook data so interesting? Let me count the ways!
When I told a friend about my work at Jetpac he nodded sagely and said “You just can’t resist Facebook data can you? Like a dog returning to its own vomit”. He’s right, I’m completely entranced the information we’re pouring into the service. All my privacy investigations were by-products of my obsessive quest for data. So with Facebook’s IPO looming, why do I think research using its data will be so world-changing?
Population
Everyone is on Facebook. I know, you’re not, but most organizations can treat you like someone without a phone or TV twenty years ago. The medium is so prevalent, if you’re not on it’s commercially viable to ignore you. This broad coverage also makes it possible to answer questions with the data that are impossible with other sources.
It’s intriguing to know which phrases are trending on Twitter, but with only a small proportion of the population on the service, it’s hard to know how much that reflects the country as a whole. The small and biased sample immediately makes every conclusion you draw suspect. There’s plenty of other ways to mess up your study of course, but if you have two-thirds of a three hundred million population in your data that makes a lot of hard problems solvable.
Coverage
Love, friendship, family, cooking, travel, play, partying, sickness, entertainment, study, work: We leave traces of almost everything we care about on Facebook. We’ve never had records like this, outside of personal diaries. Blogs, government records, school transcripts, nothing captures such a rich slice of our lives.
The range of activities on Facebook not only lets us investigate poorly-understood areas of our behavior, it allows us to tie together many more factors than are available from any other source. How does travel affect our chances of getting sick? Are people who are close to their family different in how they date from those who are more distant?
Frequency
The majority of my friends on Facebook update at least once a day, with quite a few doing multiple updates. We’ve found the average Jetpac user has had over 200,000 photos shared with them by their friends! This continuous and sustained instrumentation of our lives is unlike anything we’ve ever seen before, we generate dozens or hundreds of nuggets of information about what we’re doing every week. This coverage means it’s possible to follow changes over time in a way that few other sources can match.
Accessibility
It’s at least theoretically possible for researchers to get their hands on Facebook’s data in bulk. A large and increasing amount of activity on the site happens in communal spaces where people know casual friends will see it. Expectations of privacy are a fiercely fought-over issue, but the service is fundamentally about sharing in a much wider way than emails or phone calls allow.
This background means that it’s technically feasible to access large amounts of data in a way that’s not true for the fragmented and siloed world of email stores, and definitely isn’t true for the old-school storage of phone records. The different privacy expectations also allow researchers to at least make a case for analyses like the Politico Facebook project. It’s incredibly controversial, for good reason, but I expect to see some rough consensus emerge about how much we trade off privacy for the fruits of research.
Connections
I left this until last because I think it’s the least distinctive part of Facebook’s data. It’s nice to have the explicit friendships, but every communication network can derive much better information on relationships based on the implicit signals of who talks to who. There are some advantages to recording the weak ties that most Facebook friendships represent, and it saves an extra analysis set, but even most social networks internally rely on implicit signals for recommendations and other applications that rely on identifying real relationships.
The Future
This is the first time in history that most people are creating a detailed record of their lives in a shared space. We’ve always relied on one-time, narrow surveys of a small number of people to understand ourselves. With Facebook’s data we have an incredible source that’s so different from existing data we can gather, it makes it possible to answer questions we’ve never been able to before.
We can already see glimmers of this as hackers machete their way through a jungle of technical and privacy problems, but once the working conditions improve we’ll see a flood of established researchers enter the field. They’ve honed their skills on meagre traditional information sources, and I’ll be excited when I see their results on far broader collections of data. The insights into ourselves that their research gives us will change our world radically.
Rip Rowan - Google - Stevey's Google Platforms Rant I was at Amazon for about…
This “rant” from a Google employee has been getting a lot of attention on the interwebs over the past couple of days. And for good reason. It’s a very good, IT manager-/developer-eye view of how to architect services in huge, web-centric organizations like Google and Amazon. Very interesting.
So one day Jeff Bezos issued a mandate. He’s doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion — back around 2002 I think, plus or minus a year — he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6) Anyone who doesn’t do this will be fired.
7) Thank you; have a nice day!
Ha, ha! You 150-odd ex-Amazon folks here will of course realize immediately that #7 was a little joke I threw in, because Bezos most definitely does not give a shit about your day.
#6, however, was quite real, so people went to work. Bezos assigned a couple of Chief Bulldogs to oversee the effort and ensure forward progress, headed up by Uber-Chief Bear Bulldog Rick Dalzell. Rick is an ex-Armgy Ranger, West Point Academy graduate, ex-boxer, ex-Chief Torturer slash CIO at Wal*Mart, and is a big genial scary man who used the word “hardened interface” a lot. Rick was a walking, talking hardened interface himself, so needless to say, everyone made LOTS of forward progress and made sure Rick knew about it.
Over the next couple of years, Amazon transformed internally into a service-oriented architecture. They learned a tremendous amount while effecting this transformation. There was lots of existing documentation and lore about SOAs, but at Amazon’s vast scale it was about as useful as telling Indiana Jones to look both ways before crossing the street. Amazon’s dev staff made a lot of discoveries along the way. A teeny tiny sampling of these discoveries included:
- pager escalation gets way harder, because a ticket might bounce through 20 service calls before the real owner is identified. If each bounce goes through a team with a 15-minute response time, it can be hours before the right team finally finds out, unless you build a lot of scaffolding and metrics and reporting.
- every single one of your peer teams suddenly becomes a potential DOS attacker. Nobody can make any real forward progress until very serious quotas and throttling are put in place in every single service.
- monitoring and QA are the same thing. You’d never think so until you try doing a big SOA. But when your service says “oh yes, I’m fine”, it may well be the case that the only thing still functioning in the server is the little component that knows how to say “I’m fine, roger roger, over and out” in a cheery droid voice. In order to tell whether the service is actually responding, you have to make individual calls. The problem continues recursively until your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it’s indistinguishable from automated QA. So they’re a continuum.
- if you have hundreds of services, and your code MUST communicate with other groups’ code via these services, then you won’t be able to find any of them without a service-discovery mechanism. And you can’t have that without a service registration mechanism, which itself is another service. So Amazon has a universal service registry where you can find out reflectively (programmatically) about every service, what its APIs are, and also whether it is currently up, and where.
- debugging problems with someone else’s code gets a LOT harder, and is basically impossible unless there is a universal standard way to run every service in a debuggable sandbox.
That’s just a very small sample. There are dozens, maybe hundreds of individual learnings like these that Amazon had to discover organically. There were a lot of wacky ones around externalizing services, but not as many as you might think. Organizing into services taught teams not to trust each other in most of the same ways they’re not supposed to trust external developers.
This effort was still underway when I left to join Google in mid-2005, but it was pretty far advanced. From the time Bezos issued his edict through the time I left, Amazon had transformed culturally into a company that thinks about everything in a services-first fashion. It is now fundamental to how they approach all designs, including internal designs for stuff that might never see the light of day externally.
At this point they don’t even do it out of fear of being fired. I mean, they’re still afraid of that; it’s pretty much part of daily life there, working for the Dread Pirate Bezos and all. But they do services because they’ve come to understand that it’s the Right Thing. There are without question pros and cons to the SOA approach, and some of the cons are pretty long. But overall it’s the right thing because SOA-driven design enables Platforms.
That’s what Bezos was up to with his edict, of course. He didn’t (and doesn’t) care even a tiny bit about the well-being of the teams, nor about what technologies they use, nor in fact any detail whatsoever about how they go about their business unless they happen to be screwing up. But Bezos realized long before the vast majority of Amazonians that Amazon needs to be a platform.
You wouldn’t really think that an online bookstore needs to be an extensible, programmable platform. Would you?
Well, the first big thing Bezos realized is that the infrastructure they’d built for selling and shipping books and sundry could be transformed an excellent repurposable computing platform. So now they have the Amazon Elastic Compute Cloud, and the Amazon Elastic MapReduce, and the Amazon Relational Database Service, and a whole passel’ o’ other services browsable at aws.amazon.com. These services host the backends for some pretty successful companies, reddit being my personal favorite of the bunch.
The other big realization he had was that he can’t always build the right thing. I think Larry Tesler might have struck some kind of chord in Bezos when he said his mom couldn’t use the goddamn website. It’s not even super clear whose mom he was talking about, and doesn’t really matter, because nobody’s mom can use the goddamn website. In fact I myself find the website disturbingly daunting, and I worked there for over half a decade. I’ve just learned to kinda defocus my eyes and concentrate on the million or so pixels near the center of the page above the fold.
I’m not really sure how Bezos came to this realization — the insight that he can’t build one product and have it be right for everyone. But it doesn’t matter, because he gets it. There’s actually a formal name for this phenomenon. It’s called Accessibility, and it’s the most important thing in the computing world.
The. Most. Important. Thing.
If you’re sorta thinking, “huh? You mean like, blind and deaf people Accessibility?” then you’re not alone, because I’ve come to understand that there are lots and LOTS of people just like you: people for whom this idea does not have the right Accessibility, so it hasn’t been able to get through to you yet. It’s not your fault for not understanding, any more than it would be your fault for being blind or deaf or motion-restricted or living with any other disability. When software — or idea-ware for that matter — fails to be accessible to anyone for any reason, it is the fault of the software or of the messaging of the idea. It is an Accessibility failure.
Like anything else big and important in life, Accessibility has an evil twin who, jilted by the unbalanced affection displayed by their parents in their youth, has grown into an equally powerful Arch-Nemesis (yes, there’s more than one nemesis to accessibility) named Security. And boy howdy are the two ever at odds.
But I’ll argue that Accessibility is actually more important than Security because dialing Accessibility to zero means you have no product at all, whereas dialing Security to zero can still get you a reasonably successful product such as the Playstation Network.
So yeah. In case you hadn’t noticed, I could actually write a book on this topic. A fat one, filled with amusing anecdotes about ants and rubber mallets at companies I’ve worked at. But I will never get this little rant published, and you’ll never get it read, unless I start to wrap up.
That one last thing that Google doesn’t do well is Platforms. We don’t understand platforms. We don’t “get” platforms. Some of you do, but you are the minority. This has become painfully clear to me over the past six years. I was kind of hoping that competitive pressure from Microsoft and Amazon and more recently Facebook would make us wake up collectively and start doing universal services. Not in some sort of ad-hoc, half-assed way, but in more or less the same way Amazon did it: all at once, for real, no cheating, and treating it as our top priority from now on.
But no. No, it’s like our tenth or eleventh priority. Or fifteenth, I don’t know. It’s pretty low. There are a few teams who treat the idea very seriously, but most teams either don’t think about it all, ever, or only a small percentage of them think about it in a very small way.
It’s a big stretch even to get most teams to offer a stubby service to get programmatic access to their data and computations. Most of them think they’re building products. And a stubby service is a pretty pathetic service. Go back and look at that partial list of learnings from Amazon, and tell me which ones Stubby gives you out of the box. As far as I’m concerned, it’s none of them. Stubby’s great, but it’s like parts when you need a car.
A product is useless without a platform, or more precisely and accurately, a platform-less product will always be replaced by an equivalent platform-ized product.
Google+ is a prime example of our complete failure to understand platforms from the very highest levels of executive leadership (hi Larry, Sergey, Eric, Vic, howdy howdy) down to the very lowest leaf workers (hey yo). We all don’t get it. The Golden Rule of platforms is that you Eat Your Own Dogfood. The Google+ platform is a pathetic afterthought. We had no API at all at launch, and last I checked, we had one measly API call. One of the team members marched in and told me about it when they launched, and I asked: “So is it the Stalker API?” She got all glum and said “Yeah.” I mean, I was joking, but no… the only API call we offer is to get someone’s stream. So I guess the joke was on me.
Microsoft has known about the Dogfood rule for at least twenty years. It’s been part of their culture for a whole generation now. You don’t eat People Food and give your developers Dog Food. Doing that is simply robbing your long-term platform value for short-term successes. Platforms are all about long-term thinking.
Google+ is a knee-jerk reaction, a study in short-term thinking, predicated on the incorrect notion that Facebook is successful because they built a great product. But that’s not why they are successful. Facebook is successful because they built an entire constellation of products by allowing other people to do the work. So Facebook is different for everyone. Some people spend all their time on Mafia Wars. Some spend all their time on Farmville. There are hundreds or maybe thousands of different high-quality time sinks available, so there’s something there for everyone.
Our Google+ team took a look at the aftermarket and said: “Gosh, it looks like we need some games. Let’s go contract someone to, um, write some games for us.” Do you begin to see how incredibly wrong that thinking is now? The problem is that we are trying to predict what people want and deliver it for them.
You can’t do that. Not really. Not reliably. There have been precious few people in the world, over the entire history of computing, who have been able to do it reliably. Steve Jobs was one of them. We don’t have a Steve Jobs here. I’m sorry, but we don’t.
Larry Tesler may have convinced Bezos that he was no Steve Jobs, but Bezos realized that he didn’t need to be a Steve Jobs in order to provide everyone with the right products: interfaces and workflows that they liked and felt at ease with. He just needed to enable third-party developers to do it, and it would happen automatically.
I apologize to those (many) of you for whom all this stuff I’m saying is incredibly obvious, because yeah. It’s incredibly frigging obvious. Except we’re not doing it. We don’t get Platforms, and we don’t get Accessibility. The two are basically the same thing, because platforms solve accessibility. A platform is accessibility.
So yeah, Microsoft gets it. And you know as well as I do how surprising that is, because they don’t “get” much of anything, really. But they understand platforms as a purely accidental outgrowth of having started life in the business of providing platforms. So they have thirty-plus years of learning in this space. And if you go to msdn.com, and spend some time browsing, and you’ve never seen it before, prepare to be amazed. Because it’s staggeringly huge. They have thousands, and thousands, and THOUSANDS of API calls. They have a HUGE platform. Too big in fact, because they can’t design for squat, but at least they’re doing it.
Amazon gets it. Amazon’s AWS (aws.amazon.com) is incredible. Just go look at it. Click around. It’s embarrassing. We don’t have any of that stuff.
Apple gets it, obviously. They’ve made some fundamentally non-open choices, particularly around their mobile platform. But they understand accessibility and they understand the power of third-party development and they eat their dogfood. And you know what? They make pretty good dogfood. Their APIs are a hell of a lot cleaner than Microsoft’s, and have been since time immemorial.
Facebook gets it. That’s what really worries me. That’s what got me off my lazy butt to write this thing. I hate blogging. I hate… plussing, or whatever it’s called when you do a massive rant in Google+ even though it’s a terrible venue for it but you do it anyway because in the end you really do want Google to be successful. And I do! I mean, Facebook wants me there, and it’d be pretty easy to just go. But Google is home, so I’m insisting that we have this little family intervention, uncomfortable as it might be.
After you’ve marveled at the platform offerings of Microsoft and Amazon, and Facebook I guess (I didn’t look because I didn’t want to get too depressed), head over to developers.google.com and browse a little. Pretty big difference, eh? It’s like what your fifth-grade nephew might mock up if he were doing an assignment to demonstrate what a big powerful platform company might be building if all they had, resource-wise, was one fifth grader.
Please don’t get me wrong here — I know for a fact that the dev-rel team has had to FIGHT to get even this much available externally. They’re kicking ass as far as I’m concerned, because they DO get platforms, and they are struggling heroically to try to create one in an environment that is at best platform-apathetic, and at worst often openly hostile to the idea.
I’m just frankly describing what developers.google.com looks like to an outsider. It looks childish. Where’s the Maps APIs in there for Christ’s sake? Some of the things in there are labs projects. And the APIs for everything I clicked were… they were paltry. They were obviously dog food. Not even good organic stuff. Compared to our internal APIs it’s all snouts and horse hooves.
And also don’t get me wrong about Google+. They’re far from the only offenders. This is a cultural thing. What we have going on internally is basically a war, with the underdog minority Platformers fighting a more or less losing battle against the Mighty Funded Confident Producters.
Any teams that have successfully internalized the notion that they should be externally programmable platforms from the ground up are underdogs — Maps and Docs come to mind, and I know GMail is making overtures in that direction. But it’s hard for them to get funding for it because it’s not part of our culture. Maestro’s funding is a feeble thing compared to the gargantuan Microsoft Office programming platform: it’s a fluffy rabbit versus a T-Rex. The Docs team knows they’ll never be competitive with Office until they can match its scripting facilities, but they’re not getting any resource love. I mean, I assume they’re not, given that Apps Script only works in Spreadsheet right now, and it doesn’t even have keyboard shortcuts as part of its API. That team looks pretty unloved to me.
Ironically enough, Wave was a great platform, may they rest in peace. But making something a platform is not going to make you an instant success. A platform needs a killer app. Facebook — that is, the stock service they offer with walls and friends and such — is the killer app for the Facebook Platform. And it is a very serious mistake to conclude that the Facebook App could have been anywhere near as successful without the Facebook Platform.
You know how people are always saying Google is arrogant? I’m a Googler, so I get as irritated as you do when people say that. We’re not arrogant, by and large. We’re, like, 99% Arrogance-Free. I did start this post — if you’ll reach back into distant memory — by describing Google as “doing everything right”. We do mean well, and for the most part when people say we’re arrogant it’s because we didn’t hire them, or they’re unhappy with our policies, or something along those lines. They’re inferring arrogance because it makes them feel better.
But when we take the stance that we know how to design the perfect product for everyone, and believe you me, I hear that a lot, then we’re being fools. You can attribute it to arrogance, or naivete, or whatever — it doesn’t matter in the end, because it’s foolishness. There IS no perfect product for everyone.
And so we wind up with a browser that doesn’t let you set the default font size. Talk about an affront to Accessibility. I mean, as I get older I’m actually going blind. For real. I’ve been nearsighted all my life, and once you hit 40 years old you stop being able to see things up close. So font selection becomes this life-or-death thing: it can lock you out of the product completely. But the Chrome team is flat-out arrogant here: they want to build a zero-configuration product, and they’re quite brazen about it, and Fuck You if you’re blind or deaf or whatever. Hit Ctrl-+ on every single page visit for the rest of your life.
It’s not just them. It’s everyone. The problem is that we’re a Product Company through and through. We built a successful product with broad appeal — our search, that is — and that wild success has biased us.
Amazon was a product company too, so it took an out-of-band force to make Bezos understand the need for a platform. That force was their evaporating margins; he was cornered and had to think of a way out. But all he had was a bunch of engineers and all these computers… if only they could be monetized somehow… you can see how he arrived at AWS, in hindsight.
Microsoft started out as a platform, so they’ve just had lots of practice at it.
Facebook, though: they worry me. I’m no expert, but I’m pretty sure they started off as a Product and they rode that success pretty far. So I’m not sure exactly how they made the transition to a platform. It was a relatively long time ago, since they had to be a platform before (now very old) things like Mafia Wars could come along.
Maybe they just looked at us and asked: “How can we beat Google? What are they missing?”
The problem we face is pretty huge, because it will take a dramatic cultural change in order for us to start catching up. We don’t do internal service-oriented platforms, and we just as equally don’t do external ones. This means that the “not getting it” is endemic across the company: the PMs don’t get it, the engineers don’t get it, the product teams don’t get it, nobody gets it. Even if individuals do, even if YOU do, it doesn’t matter one bit unless we’re treating it as an all-hands-on-deck emergency. We can’t keep launching products and pretending we’ll turn them into magical beautiful extensible platforms later. We’ve tried that and it’s not working.
The Golden Rule of Platforms, “Eat Your Own Dogfood”, can be rephrased as “Start with a Platform, and Then Use it for Everything.” You can’t just bolt it on later. Certainly not easily at any rate — ask anyone who worked on platformizing MS Office. Or anyone who worked on platformizing Amazon. If you delay it, it’ll be ten times as much work as just doing it correctly up front. You can’t cheat. You can’t have secret back doors for internal apps to get special priority access, not for ANY reason. You need to solve the hard problems up front.
I’m not saying it’s too late for us, but the longer we wait, the closer we get to being Too Late.
I honestly don’t know how to wrap this up. I’ve said pretty much everything I came here to say today. This post has been six years in the making. I’m sorry if I wasn’t gentle enough, or if I misrepresented some product or team or person, or if we’re actually doing LOTS of platform stuff and it just so happens that I and everyone I ever talk to has just never heard about it. I’m sorry.
But we’ve gotta start doing this right.
Facebook's Open Compute friends ODCA IT union • The Register
The Open Data Center Alliance, started by Intel last October, and the Open Compute Project, founded by Facebook, are coming together. The initial results look interesting.
What do you get when you cross a consortium of big data center customers and IT suppliers (the Open Data Center Alliance started by Intel last October) with an open source server and data center design project started by a hyperscale Web company (the Open Compute Project founded by Facebook)?
We don’t know, but it looks like someone is going to lose some margins.
At the Intel Developer Forum in San Francisco on Wednesday, the ODCA said that the collective was building momentum – now with over 300 members, up from 70 at its founding nearly a year ago – with a collective IT spend of more than $100bn, according to Marvin Wheeler, who chairs the ODCA board and was president and COO of hosting company Terremark before telecom giant and cloud-wannabe Verizon scarfed it for $1.4bn in January.
At the IDF event, the ODCA was showing off six “usage models”, something akin to a reference architecture for handling specific cloud workloads, including cloud on-boarding (moving VMs from one hypervisor in one data center to another brand of hypervisor in other) demonstrated by Citrix Systems. EMC and Intel teamed up to show a secure VM on-boarding scenario using Intel’s TXT “trusted extensions” for Core and Xeon chips and VPLEX Metro data replication.
Cloud interoperability based on CloudForms was another proof-of-concept demonstrated at IDF, and Dell and JouleX, a maker of power management software, put together a POC that showed how the JouleX tools could be used to track and reduce energy consumption on a rack of PowerEdge C dense servers – the kind that Dell wants to sell to corporate customers, particularly now that Facebook has unfriended Dell and is building its own servers – or, rather, having Taiwanese IT manufacturing giant Quanta Computer do it and Synnex do the rack integration for Facebook’s shiny new Prineville, Oregon data center.
Speaking of Facebook, the ODCA has teamed up with the Open Compute Project, which is open sourcing both the design of the Prineville data center and the motherboards and server nodes used to run Facebook’s applications as well as the related power and cooling systems used to feed and pamper those servers.
Wheeler tells El Reg that the ODCA and the Open Compute Project will work together, initially by having requirements as expressed by ODCA members fed into the Open Compute Project. Following this, Open Compute contributors are expected to come up with designs that meet those needs (when they feel like contributing, of course), and then these hardware designs can then become part of ODCA usage models that other people can use as they deploy particular kinds of infrastructure to support specific workloads.
“This is end users telling vendors what they want the cloud to be,” says Wheeler. “This is the end user community pushing back.”
Well, yes and no. The ODCA has a lot of IT vendors that are helping steer things, and you’ll notice who is putting together the usage models, right?
No more tier ones?
The most interesting thing about the cross-coupling of the OCDA and the Open Compute project will be the establishment of detailed reference architectures for specific workloads that involve systems that are not built by tier-one server makers – at least not yet.
“The HPs and the Dells of the world can innovate on top of this,” explains Frank Frankovsky, a founding member of the Open Compute Project and director of technical operations at Facebook. “They may not be innovating at the box level any more, but I don’t think it stifles their innovation.”
At Facebook, the Open Compute servers are based on motherboards specced by Facebook that are manufactured by Quanta, which in turn ships the completed boxes back to California to Synnex, which tests the machines and plunks them into racks. These completed racks are shipped off to Prineville and rolled into the data center as needed.
Frankovsky is not at liberty to say how many of the Open Compute boxes have been built to date, but does say that there are tens of thousands of these machines installed in Prineville. And presumably the next generation of Open Compute servers will go into the new Facebook data center being built in North Carolina, which should come online in the next few months, according to Frankovsky. These Open Compute 2 platforms will be based on half-width motherboards using Opteron 6200 processors from Advanced Micro Devices and Xeon E5 processors from Intel.
Over the long haul, Frankovsky expects that anywhere from three to five distributors will step up to manufacture Open Compute server designs – Foxconn and Delta Electronics are obvious possibilities.
He also said that it was possible that just as software projects sometimes branch, Open Compute designs will likely branch (but not fork) for specific use cases, and that tier vendors such as HP and Dell might play in this way.
The other thing that Wheeler expects to happen is that countries with import duties to protect their IT industries will encourage their manufacturers to pick up the Open Compute designs and make the machines for local customers. They would not make the motherboards in, say, Brazil, but import them from Taiwan and then bend the metal around them and integrate the components indigenously.
Don’t be surprised if Intel makes Open Compute servers at some point, too. It already makes a large number of servers for cloud customers in China, according to rumors going around IDF this week.
Under the Hood at Google and Facebook - IEEE Spectrum
The entire article is worth reading for it’s discussion of Google’s and Facebook’s differing philosophies on datacenter construction and policy, but the part that stood out for me was the anti-workload optimized approach to servers that both competitors follow.
Giant data centers—even energy-efficient ones—are, of course, nothing without the proper servers. Facebook will be populating its Oregon and North Carolina locations with custom-designed servers, just as Google has long done.
Facebook’s Amir Michael, manager of hardware design, explains that when the company decided to build its own facilities, “we had a clean slate,” which allowed him and his colleagues to optimize the designs of their centers and servers in tandem for maximum energy efficiency. The result was a server that “looks very bare bones. I call it a ‘vanity-free’ design just because I don’t like people to call it ugly,” says Michael. “It has no front bezels. It has no paint. It has no logos or stickers on it. It really has only what is required.”
Google also keeps server frills to a minimum. Like Facebook, it buys commodity-level computing hardware and just fixes the many pieces that break, instead of purchasing high-end systems that are less prone to failure but also much more expensive. Economics, if nothing else, drove engineers at both companies to similar conclusions here. Fit and finish might count if you’re buying one server or even a hundred, but not when you’re shopping for tens of thousands at a time. And striving for high reliability is a little pointless at this scale, where failure is not only an option, it’s a daily fact of life.
Facebook’s Michael explains that he helped design three basic types of servers for running the Facebook application. The top layer of hardware, connected most directly with Facebook’s many users, consists of outward-facing Web servers. They don’t require much disk space—just enough for the operating system (Linux), the basic Web-server software (which until recently was Apache), the code needed to assemble Facebook pages (written in PHP, a scripting language), some log files, and a few other bits and pieces. Those machines are connected to a deeper layer of servers stuffed with hard disks and flash-based solid-state drives, which provide persistent storage for the giant MySQL databases that hold Facebook users’ photos, videos, comments, and friend lists, among other things. In between are RAM-heavy servers that run a memcached system to provide fast access to the most frequently used content.
Alpha geeks will recognize that these pieces of software—Linux, Apache, PHP, MySQL, memcached—all hail from the open-source community. Facebook’s programmers have modified these and other open-source packages to suit their needs, but at the most basic level, they are doing exactly what countless Web developers have done: building their site on an open-source foundation.
Not so at Google. Programmers there have written most of their company’s impressive software from scratch—with the exception of the Linux running on its servers. Most prominent are the Google File System (or GFS, a large-scale distributed file system), Bigtable (a low-overhead database), and MapReduce (which provides a mechanism for carrying out various kinds of computations in a massively parallel fashion). What’s more, Google’s programmers have rewritten the company’s main search code more than once.
Speaking two years ago at the Second ACM International Conference on Web Search and Data Mining, Jeff Dean, a Google Fellow working in the company’s system infrastructure group, said that over the years his company has made seven significant revisions to the way it implements Web search. However, outsiders don’t realize that, because, as Dean explained, “you can replace the entire back end without anyone really noticing.”
How are we to interpret the difference between Google’s and Facebook’s engineering cultures with respect to the use of open-source code? Part of the answer may just be that Google, having started earlier, had no choice but to develop its own software, because open-source alternatives weren’t yet available. But Steve Lacy, who worked as a software engineer for Google from 2005 to 2010, thinks otherwise. In a recent blog post, he argues that Google just suffers from a bad case of not-invented-here syndrome. Many open-source packages “put Google infrastructure to shame when it comes to ease of use and product focus,” writes Lacy. “[Nevertheless, Google] engineers are discouraged from using these systems, to the point where they’re chastised for even thinking of using anything other than Bigtable/Spanner and GFS/Colossus for their products.”
The Social Era of the Web Starts Now - IEEE Spectrum
IEEE has just published a special report on social networking that seems to be as much about the nascent conflict between Google and Facebook as it as about the broader social trend.
In the beginning was the personal computer.
Not long after, people started connecting them together on networks, culminating in the World Wide Web and the Web browser, which launched the first great era of the Web. Then came the search engine, which launched the second great era of the Web, the era of Google.
Now comes the third: the era of social networks. Facebook has jumped out to a commanding lead, but Google hasn’t really started fighting yet. So the stage is set for a battle of, well, biblical proportions. The wizards at the Googleplex in Mountain View, Calif., are working furiously on a social network to rival Facebook’s. Just a few miles away, in Palo Alto, Facebook is preparing for an initial public offering to give it the money it needs to take on Google’s Goliath. And last month, the clash got a bit ugly when it was revealed that Facebook had hired a public relations agency to slime Google’s social networking tactics.
We are about to witness the next great conflict of the information age, a rich and complicated match on the scale of mainframes vs. micros, RISC vs. CISC, Windows vs. Unix. Like those battles, Google-Facebook will shape the industry’s landscape for years to come.
The Web has had a social dimension almost from the start. It just took a while for the right software to come along and make it compelling. “We’re now seeing a Web built around people, where their profiles and content are moving with them as they visit different websites,” notes Paul Adams, who made his mark as a user-experience researcher at Google before jumping recently to Facebook. Socializing is something that people used to do on the Web; gradually it is becoming the Web…
That’s important, because ads are what makes this cockeyed caravan go. Google and Facebook have the same basic model: Offer the services free and charge for advertising. And, as any adman will tell you, the more popular your service, the more money you can get for ad space. That’s why Google and Facebook are vying to be the de facto home for Web users.
Nearly all of Google’s and Facebook’s revenues come from advertising. Google posted US $29.3 billion in revenue in 2010. A recent report said that privately held Facebook generated about $2 billion in revenue last year, which places its size on a par with that of Google when Google was also just six years old.
What Google and Facebook have that old media don’t is information about you—data that they collect and process with a barrage of advanced technologies, software, and math to wring money out of you with far greater efficiency. They do that by using the information to target you with ads that can be so specific and relentless that they seem a little creepy at times. Use Google’s Chrome browser to search for a fruit-flavored green tea and you will probably find yourself hounded for days or weeks by ads from tea sellers that pop up to the side of other pages that Google points you to. Writing the code that does that is how some of the greatest mathematical minds of the current generation make their living these days.
That’s Google’s edge: It is in the enviable position of benefiting from having users online in almost every way (but it greatly prefers to keep them at sites available to its scrutiny through the Chrome browser and Android apps). Facebook, on the other hand, can learn about people and profit from them only when they’re on the site (a fact that helps explain Mark Zuckerberg’s fervent desire that we all just get over our archaic notions about privacy). So now Facebook’s triumph is emboldening the network to take on more and more services in the interest of keeping users within its walls.
Here, Facebook has two potent weapons: crowdsourcing and games. Google’s success at collecting information and driving commerce has created incentives for sites to manipulate the system with search-engine optimization tricks that artificially elevate their ranks in search results. Distorted search results are in turn prompting frustrated Web surfers to crowdsource their questions to trusted branches of their social networks. The idea is that your friends and relations can steer you toward a good answer much more reliably than Google’s immensely powerful but compromised data centers.
Quora, a question-answering social site, has sprung into being for precisely that purpose. Here, too, Google isn’t sitting still, but you already knew that. Improved search is part of the rationale for the +1 feature, which allows users to elevate their favorite search results in queries from their friends.
Facebook has also made extremely successful use of online games to keep people within its domain. Some 250 million people play social-network games every month, according to the analysis firm Parks Associates. The ascendance of these games has been so swift that media analysts are blaming it for the death of the TV soap opera in the United States, after a reign that lasted for decades. There are scores of Facebook games, but just two—Zynga’s CityVille and FarmVille—account for 140 million of the 250 million people who play social games every month. Google’s plans for online games aren’t known, but nobody doubts that some of the fiercest battles for online revenues will occur in the arena of gaming. And as our contributor David Kushner notes, these new social diversions may be as different from action-heavy console games as bucolic FarmVille is from the bloody Call of Duty first-person shooters [see “Betting the Farm on Games”].
As improbable as it might seem now, Google and Facebook could yet lose their grip on the new social Web. They will thrive only as long as online ad revenue flows, and that flow can be maddeningly fickle and elusive [see “The Revolution Will Not Be Monetized”]. Their snooping may even backfire. Some users have already decided that they would rather not blindly trust their social networking and Web-search history to anyone. More customized service, too, is always a lure. So four young techies in San Francisco have found a niche and are trying to fill it with a different kind of social network, called Diaspora. We’ve got the inside story on the ups and downs of life at the tech start-up of the moment [see “The Anti-Facebook”].
Google and Facebook, meanwhile, are grappling with a rather different sort of engineering problem: how to build data centers that push the boundaries of what’s possible now, to keep up with epic demands for processing power, data storage, and more [see “Under the Hood at Google and Facebook”]. Even here, the two giants embrace fascinating philosophical differences, and this is just one tier of a fast-evolving technology ecosystem to which the two must be constantly adapting. Smartphones, tablets, and netbooks are now so popular that accessing the Web to settle a bet, tweet a short movie review, check in with work, or post a Facebook status update has become an almost inescapable adjunct to any other activity. No one with a smartphone is ever out of touch with his or her online social circle, for better or worse. Even shooting, editing, and posting video—something that professionals could do only in the studio a few years ago—can now be done on the fly.
Technology will also give us a whole new concept of mobility. Just as GPS units in phones make it possible for people to spontaneously advertise their coordinates at all times, new kinds of sensors linked to the Web and embedded in clothes, buildings, vehicles, and other common objects will be able to convey ongoing updates about your every action. People will need to do less and less in the future to loose a torrent of data about themselves and their ongoing activities onto the Web. Whether anyone will really want to be the recipient of these data deluges is a moot point. Much of this information might seem meaningless, but the availability of robust computing capabilities for processing constellations of such data drawn from many parts of our lives—and the lives of other demographically relevant people—could certainly make it vivid and might very well make it compelling.
There’s a downside, and it’s a doozy. A system for tracking everyone’s actions more closely than 1984’s telescreens sounds uncomfortably like the greatest threat to personal privacy that the world has ever seen. But in capitalist democracies, at least, the more immediate worries are that corporate marketing could gain major advantages from knowing so much about us, and that minor lapses or, as they say, youthful indiscretions could wind up wrecking some people’s lives and careers [see “Welcome to the Surveillance Society” and “Me, Myself, or I”].
Regaining traditional privacy may be impossible, but that doesn’t mean we should be entirely at the mercy of corporate whims in their stewardship of our data. And so in this issue Web pundit Jeff Jarvis proposes a framework for a “bill of rights” that might help to curb abuses by Google, Facebook, and the other giants weaving this new social web and struggling for advantage in it [see “Privacy, Publicness, and the Web: A Manifesto”].
Google and Facebook are both counting on human creativity to drive their future success. So they are fostering lavish workplace cultures—with beautiful campuses, gourmet food, and at Google, conveniently located dog-poop disposal stations. (Really.) You may be surprised (we sure were) by what it takes to lure, pamper, and retain a top techie these days [see “Campus Life” and “Food Fight”].
The social networks that will come out of these brainy hothouses will undoubtedly have surprising cultural consequences. Life support excepted, the most fundamental human need is companionship. And so humankind has turned its newest technologies—computers, networks, mobile gizmos, and software—to one of its oldest and most basic requirements. A new and interesting chapter has begun. You’ll like it, although there are bound to be a few scary parts. We can’t tell you how the chapter will turn out. But when you’ve read our report, at least you’ll know what to fear and what to hope for.
Facebook’s new energy efficient data center (by building43)