IT Departments Have Become Completely Useless - Business Insider
Another article on Chief Digital Officers —
The role of the CIO, the Chief Information Officer, has been debated about as long as the term in itself exists. Rarely before has there been such a misleading description, because in many companies the person assuming the position of CIO was rarely seen as the chief ‘Information’ Officer. People mostly perceived him as “Top Dog of the Nerd Herd and Boss of all things Bits and Bytes”.
I’m paraphrasing here, but that seems to be the general gist of the sentiments that I’m getting from the business community. I teach a course at London Business School for the Senior Executive Program. Top executives from all over the world spend three weeks in London to learn about the latest findings in the exciting fields of Strategy, Finance, Marketing, Innovation and Leadership. Oh, and in technology. That’s where I come in.
When I enter the class room on the first day of my ‘Information Technology’ course, more often than not, I am greeted in an understated – maybe even proactively bored – manner. Remember, these are top execs leading some of the largest companies in the world. The first thing on my teaching menu is to have them do a simple word association on their feelings about ‘IT’. The responses tend to be crude: ‘boring’, ‘complex’, ‘costly’, ‘always too late’, ‘annoying’ are some of the kinder ones. And when I mention ‘IT departments’ I get wonderfully colorful comments such as ‘arrogant’, ‘out of touch with reality’, ‘language of their own’ and - increasingly often - ‘hopelessly out of date’.
CIO’s have lost the edge
The tension between ‘business’ and IT has been around forever, but instead of getting better, it has gotten worse in the last couple of years. The reason is that digital has become ’normality’, and almost everyone now feels at ease with digital technology. In other words, the natural knowledge advantage of the IT department has eroded. To put it bluntly, since everyone and their dog started carrying around iPads, the IT department really lost their advantage on the ‘frontier of technology’.
I think the CIO is greatly to blame for this. In many companies the CIOs never fulfilled their roles as such and rather persisted in performing the more comfortable job of Chief Technology Officer (CTO). The reason being that the most of these people had been running IT before it was even called IT. The origins of IT date from the ‘Electronic Data Processing’ age when companies had units processing vast amounts of (financial) data. Many IT departments seem stuck in their role as ‘suppliers of technology’. Very few have really stepped up to what the CIO job description is really all about: having a solid impact on how companies are dealing with information.
Many so-called CIO’s limited themselves to supplying their colleagues with copies of Microsoft Office on laptops, and ever so kindly offered them SharePoint servers to store their documents. But very few had a solid influence on how companies deal with content, build up knowledge, and how they could innovate with information. Many CIO’s provided their employees with cell-phones, then Blackberries, and - when it became impossible to postpone the inevitable - with iPhones and other Smartphones. They gave the gift of nomad hardware and software – secured it - and that was it, basically. Making sure that their organizations could benefit from the endless possibilities of the mobile revolution seemed one bridge to far for most.
That’s why the CIO rarely had a real seat at the executive management table. And why most still reported to other more powerful executives, often the CFO. Chris Anderson, the chief editor of Wired, once said that ‘CIO’s have become the dead weight in an organization that keeps the real (technology) innovators from taking matters into their own hands’. Ouch.
The rise of the Chief Digital Officer
But recently, we’ve seen the rise of a new breed of CXO: the ‘CDO’ or Chief Digital Officer. Who is that dashing corporate person that deals with all things digital and social ? Why, it’s the CDO ! Who is that person that tackles the strategic questions on Big Data and Analytics Innovation? Why, it’s the CDO ! This next gen IT hero is the one who really understands digital as a means of innovating the company. His daring mission is to transform the business model of the company. The CDO does not implement technology, no, he implements technology enabled innovation.
This brings a fundamental question to mind: are the CIO, the CTO and the CDO the same person, or are they profoundly different? Are we talking Clark Kent/Superman here?
Over the last five years, I’ve collaborated with hundreds of CIO’s all around the world, understanding where they are heading and what they are focusing on. The general sentiment seems to be that many CIO’s today absolutely want to take up that new corporate superstar function of Chief Digital Officer. Unfortunately, most have two humongous obstacles in front of them.
The challenges for the CIO
The first is that many CIO’s today lack the right talent in their IT departments to boost their relevance in the digital space. Sure, their divisions are swarming with people who understand infrastructure, servers, and neat systems such as Exchange or SAP. But rarely do they have the digital skills on board that matter today: social networking skills, Big Data analytics experience, digital communications knowledge, conversation management savvy-ness, etc.
The second issue for CIOs wanting to become CDOs is that they are rarely perceived by their business counterparts as ‘credible’. The reason is that many of the digital opportunities require a deep insight into the business challenges and the CIO’s environment often doubts if they have the goods to back this up. That’s partly because CIOs are still struggling with past ‘criminal’ records’ starring complex and very painful ‘IT projects from hell’. Safe to say that their reputations are often dented and reliability and respect are not the first things that come to mind when thinking about them.
Therefore, we see that many Chief Digital Officers in companies do NOT have an IT background. They come from such well-reputed corporate regions as marketing, business development, or sales. From anywhere but IT, actually.
Batman or Robin ?
And there you have it: many organizations now have an IT department filled with ‘digital skills from the olden days’ run by the CIO as well as a ‘digital’ division with ‘new digital’ talent lead by the CDOs. Who are often more than two decades younger than the CIOs.
I honestly believe that clever organizations will find a way to reconcile the two. Because what happens in the ‘new’ digital field in terms of customer innovation will have to be connected and integrated at one point with the ‘old’ digital back-office of the company.
However, the CIO will need to undergo a huge shift in responsibility to assume that role. Time for the CIO to decide if they want to be Batman or Robin. This is the time to separate the boys from the men. I also believe it will require a complete makeover of the IT department to make this work. And it will finally mean that the CIO will have to step up to the plate and at long last truly assume the role of Chief ‘Information’ Officer.
A CMO, a CIO, and a chief digital officer walk into a bar... - Chief Marketing Technologist
The birth of a new role — very interesting.
There’s been a resurgence of popularity for the role of a Chief Digital Officer (CDO) lately. Last fall, Gartner made the prediction that 25% of organizations will have a CDO by 2015. And that’s shaking up the corporate technology power structure in uncomfortable ways for many CIOs — and I suspect for some CMOs too.
“The Chief Digital Officer will prove to be the most exciting strategic role in the decade ahead,” predicts Gartner VP David Willis. “The Chief Digital Officer plays in the place where the enterprise meets the customer, where the revenue is generated, and the mission accomplished. They’re in charge of digital business strategy.”
In some firms, the CDO is essentially in charge of the online business unit, the e-commerce portion of the business. Russell Reynolds Associates, in their epic article on The Rise of the Chief Digital Officer, notes that in retail and leisure sectors, such digital businesses are the fastest growing revenue stream. At media companies, struggling to survive in a world that has redefined media, CDOs are the star-crossed warriors charged with building the digital properties and supporting business models on which their future depends.
These scenarios make a lot of sense as business units.
But the role and reach of the CDO seems to be evolving as rapidly as everything else related to the digital sphere — it’s actually quite hard to find something that isn’t related to digital in some way these days. CDOs are appearing in companies, not as explicit business unit owners, but as hybrid marketing-technology change agents at the right hand of the CEO.
For instance, that Russell Reynolds article actually begins by stating (emphasis added is mine): “The challenges and opportunities for businesses in this digital age are enormous. Companies need to be fleet-footed to keep pace with changing technology and consumer behavior. Business strategies now must be seamlessly interwoven with ever-expanding digital strategies that address not only the web but also mobile, social, local and whatever innovation there may be around the corner.”
That kind of sounds like, well, everything. Except maybe janitorial services?
The CIO is feeling the heat
Because all things digital are powered under the hood by technology, the executive who has perhaps felt the most immediate pressure from the rise of the CDO is the CIO.
Peter Hinssen captured the situation quite viscerally in his provocative — provocative in the way one might provoke a tiger with a sharp stick — article on Business Insider earlier this month, IT Departments Have Become Completely Useless. (Don’t pull any punches, Peter.)
“Who is that dashing corporate person that deals with all things digital and social?” he writes breathlessly. “Why, it’s the CDO! Who is that person who tackles the strategic questions on big data and analytics innovation? Why, it’s the CDO! This next-gen IT hero is the one who really understands digital as a means of innovating the company.”
Some believe that the CIO will morph into the CDO. But according to Peter Hinssen, “Many Chief Digital Officers in companies do NOT have an IT background. They come from such well-reputed corporate regions as marketing, business development, or sales. From anywhere but IT, actually.”
The barriers are two-fold.
First, the kinds of technology that CIOs have had the most expertise with are generally back-office in nature. In many companies, they’ve been ill-positioned to champion more front-office technical innovations, either due to cost or security concerns or because it fell outside their comfort zone or the capabilities of their staff. As I wrote in an article some time ago — why marketing and IT are diametrically opposed — a significant part of this barrier is the structural incentives around which IT was intentionally chartered.
Second, a number of CIOs seem to be pigeonholed by the C-suite more as technical leaders than general business leaders. And now that CEOs are seeking a change agent to transform their enterprises into the digital age — in many circumstances, a “turnaround” kind of mission — the CIO may not be viewed as the right kind of leader for that job.
CIO.com recently published an article Do Chief Digitial Officers Spell Trouble for CIOs? Their short answer was yes. In many cases, “the CDO is an executive from outside the company — and outside IT — who parachutes in at the behest of a CEO who is adamant about corporate transformation. Usually reporting to the CEO, the CDO gets the authority to rearrange staff and request funding to launch big plans.”
Dave Aron of Gartner calls that a “vote of no confidence” in the CIO.
To be sure, many CIOs would like the CDO job. Gartner estimates that about 20% of CIOs have already taken on those responsibilities. And they sound quite optimistic about the opportunities for CIOs who step up to the challenge. But CIOs are Gartner’s bread-and-butter customers. Peter’s candid editorial suggests that may be a hard step to climb.
But what about the CMO?
Yet surprisingly, I haven’t heard as many concerns raised by the CMO community: how does a CDO who reports to the CEO, and the not the CMO, affect their position in the enterprise leadership Pantheon?
Sure, the CDO is independently wielding technology to accomplish his or her mission — what makes the CIO nervous. But that’s merely a means to an end. The mission of the CDO is to understand and connect with the organization’s modern customer and to take charge of crafting the experience those customers receive. This is especially true in organizations where the CDO role is broader than a specific business unit.
Where does that leave the CMO? Overseeing the sundowning of traditional marketing channels to a winnowing number of non-digital customers? Doing “branding” — not the modern kind of brand-as-experience, but old school logos and taglines validated by focus groups? Handling “PR” — not the modern kind of everything-social-is-PR (and pretty much everything is social), but old school news releases and press conferences?
I’m exaggerating to make a point, but in a C-suite that has a strong CDO and a digitally inexperienced CMO, that may not be too far off the mark. The CMO might start to feel like that poor wretch in Office Space who keeps having his desk moved to smaller cubicles in darker corners of the building. Next to go is his stapler.
What is marketing’s purpose if not to understand and connect with the customer?
I fully appreciate that understanding and connecting with modern customers is more complex than ever and requires enormous changes to what we’ve called “marketing” in the past. I sympathize with more traditional marketing leaders who have had a veritable tidal wave of changes crash upon them with a velocity that is nothing short of dizzying. This is an epic challenge.
But that doesn’t change marketing’s responsibility. If you carve out all things digital from marketing — in a world that is asymptotically approaching all things being digital — and give it to a CDO who’s independent of the CMO, then the CMO has lost that responsibility. And with great responsibility goes great power.
If I were a CMO, I would be as nervous about a direct-to-the-CEO CDO as the CIO. (On the bright side, previously distant CMOs and CIOs might finally have something in common — and more reason than ever to go get a beer together.)
Is there a better solution?
Yes! Marketing should own digital. Because digital is inherent in understanding and connecting with the modern customer. And regardless of whatever C-level role wins this responsibility, understanding and connecting with the modern customer is marketing.
The ideal scenario, I believe, is for the CMO to preempt a digital coup and hire a CDO as his or her right hand with the urgent mission of collaboratively absorbing all things digital into the very definition of a unified modern marketing department. This role may be more like a chief marketing technologist — or the CDO may have a chief marketing technologist as their right hand for the more technical facets of that transformation.
But the technology is simply a means to an end. The vision is that understanding and connecting with customers is fully unified under marketing’s umbrella.
After all, there still are non-digital aspects of understanding and connecting with customers in most businesses. But to customers, the lines between digital and non-digital are nearly invisible. They simply expect continuity in their experience with you.
Only a truly unified marketing department can deliver that experience.
Change agents, team players, and the post-digital era
Of course, ideal scenarios and reality don’t always match.
It may be that in entrenched, large-scale organizations, the overhaul of marketing from within may be too hard to execute in the timeframe that the market is demanding digital mastery. It may be that the CMO — as brilliant as he or she is in their own ways — is just not up for the challenge of having to lead that transformation and to hire and deftly manage a powerhouse CDO (or multiple digital and marketing technology leaders).
In which case, if I were the CEO, I’d hire a CDO and make them my agent of change.
Now, in practice, I don’t believe that digital has to be a one-chief-to-rule-them-all battle for corporate dominance. In fact, cooperation between all the heads of the business is more necessary than ever in a world where, literally, everything is connected. If there’s a CMO, a CIO, and a CDO, they should — they must — find ways to work together in the best interest of customers and the business. Such collaborations can be not merely congenial, but inspiring and immensely effective.
But in the fluid battlefield of modern markets, there is also great value to having individual leaders with undisputed authority to make swift decisions. If too much of tactical execution requires consensus from a committee, it’s going to be a drag on the firm’s competitiveness. Nimbler competitors will outmaneuver them.
When it comes to understanding and connecting with the modern customer, that leader can be the CMO or the CDO — but it’s harder for it to be both.
Eventually, either the CDO reports to the CMO (probably, I’m afraid, a new CMO by that time) or the CDO becomes the CMO. At which point, the firm will have successfully completed its transformation and entered what David Cooperstein of Forrester calls the post-digital era: digital, business, and marketing are all one in the same.
In fact, if a company doesn’t recombine them, something is seriously wrong.
“I see nothing wrong with hiring a chief digital officer to accelerate growth in the digital space,” said Tarik Sedky, CDO of agency Young & Rubicam from 2007-2010, an an Adweek article from 2011, Chief Digital Officer Title Won’t Die. “I think the real problem is having a chief digital officer for more than three years.”
Where will CDOs go after their positions are assimilated into fully transformed firms?
Some may indeed become the new CMO. For others, the Russell Reynolds article concludes, “CDOs who demonstrate their ability to manage change and transform their businesses almost certainly will lead the way in the rise of the Digital CEO.”
How I Explained REST to My Wife
A really good primer on what RESTful means —
Ryan: Machines don’t have a universal noun - that’s why they suck. Every programming language, database, or other kind of system has a different way of talking about nouns. That’s why the URL is so important. It let’s all of these systems tell each other about each other’s nouns.
…
Wife: What about verbs and pronouns and adjectives?
Ryan: Funny you asked because that’s another big aspect of REST. Well, verbs are anyway.
Wife: I was just joking.
Ryan: It was a funny joke but it’s actually not a joke at all. Verbs are important. There’s a powerful concept in programming and CS theory calledpolymorphism. That’s a geeky way of saying that different nouns can have the same verb applied to them.
Wife: I don’t get it.
Ryan: Well.. Look at the coffee table. What are the nouns? Cup, tray, newspaper, remote. Now, what are some things you can do to all of these things?
Wife: I don’t get it…
Ryan: You can get them, right? You can pick them up. You can knock them over. You can burn them. You can apply those same exact verbs to any of the objects sitting there.
Wife: Okay… so?
Ryan: Well, that’s important. What if instead of me being able to say to you, “get the cup,” and “get the newspaper,” and “get the remote”; what if instead we needed to come up with different verbs for each of the nouns? I couldn’t use the word “get” universally, but instead had to think up a new word for each verb/noun combination.
Wife: Wow! That’s weird.
Ryan: Yes, it is. Our brains are somehow smart enough to know that the same verbs can be applied to many different nouns. Some verbs are more specific than others and apply only to a small set of nouns. For instance, I can’t drive a cup and I can’t drink a car. But some verbs are almost universal like
GET,PUT, andDELETE.Wife: You can’t
DELETEa cup.Ryan: Well, okay, but you can throw it away. That was another joke, right?
On Developers and Technology Adoption – tecosystems
RedMonk argues that R is rapidly becoming a go-tool analytics tool thanks to bottom-up advocacy by statisticians who used it extensively in school -
The ongoing promotion of developers from serf to kingmakers has many implications, but perhaps none so important as technology adoption. For decades, we’ve been seeing the manifestation of this trend, as the growth in market share of technologies from Chrome to Linux to Mac to MySQL have been driven at least in part by developer populations that preferred them. This was in evidence yet again last week on a Google Hangout James and I participated in organized by IBM.
In discussing the market for analytics, one of the topics of discussion was people, or more specifically the lackthereof. One of the least controversial statements one can make in the technology industry today is that demand for talent is outstripping the supply, at least in most markets. Analytics are no exception. As a result of this shortage, many organizations are the proverbial beggars now unable to be choosers. Where they may previously have hired for analytical roles only those trained on sanctioned analytical tools, businesses are now compelled to hire outside of these comfort zones. And in the analytical world, this most often benefits the statistical language R.
When developers – or in this case, statisticians – are permitted to choose their own tools, an increasing number turn to R if for no other reason than the fact that it was the basis for their academic training. So prevalent is R in academic environments, in fact, that the datasets associated with the text taught in my second semester of statistics were available as a downloadable package via R’s repository, CRAN.
What this means, then, is that like Linux, MySQL, PHP, AWS and other technologies before it, R is being annointed in bottom up fashion as a fundamentally important technology moving forward – it’s not often we get to watch this in real time. Whether enterprises approve of that or not. Even those conservative enterprises who wish to dictate technology choices to their rank and file will find that increasingly difficult in a tight labor market. Fragmentation is the new reality, and as developers make more choices – if only by default – the number of technologies employed within enterprises will inevitably rise. Those businesses that understand this at a fundamental level and do not seek to oppose it will have significant advantages in both hiring and productivity. Those vendors, meanwhile, that appreciate that hetereogeneity is the new norm and optimize for interoperability will, likewise, benefit.
As for the developers? With their newfound influence comes greater responsibility; just as enterprises need to be more flexible in technology adoption, so too should developers be more careful in the selection process. Heterogeneity is fine, even beneficial. Chaos, less so.
Executives, not Employees, are Driving the Consumer Tech in the Enterprise
A recent Forrester report shows some interesting insights into the perception of the IT department by the business. In short, it doesn’t look good.
Forrester conducted a similar poll among 1,047 North American and European IT managers with responsibility for their companies’ budgets. Guess what? In every category of IT operations performance, the IT workers rated their own efficiency more highly than did the executives. And that difference in perception, Forrester concludes, is driving business departments to take more responsibility for IT purchasing and deployment decisions.
“Senior management is frustrated with IT’s ability to deliver,” Forrester’s John C. McCarthy writes.
Only 15.7% of executive respondents said they’re increasing their departments’ involvement in IT, in an effort to decentralize procurement and deployment away from their IT departments. Fifty-nine percent say they get IT support from a centralized IT resource, while 20% say it comes from a dedicated, division-specific IT resource. (That last figure is up 9% over last year, by the way.) Among those who are increasing their divisions’ involvement this year, 75% agreed with the statement, “Technology is too important for the business not to be involved,” and 54% agreed with, “IT does not understand the business issues and priorities to do it by itself.”
If we were to stop there, we might conclude that corporate departments want to move decision-making closer to the executive suite. However, when asked what they intend to do when they do get control of the IT process, a tremendous number of executives said they were planning to outsource it.
A full 36% of executive respondents said they plan to outsource IT services - a 19% jump over last year. And 25% said they plan to hire systems integration consultants this year, a 7% rise over 2010.
What do businesses expect these consultants to produce first? A total of 46% of executives responding said they either have already tasked these consultants, or are making plans within the next 12 months, to have them… build a website for them. Not a Facebook page, not a social gathering point, but a website.
Safety Still Leads to Hesitancy for the Cloud
You might think that cloud technology would play a more prominent role in affecting these executives’ planning decisions. As it turns out, only 34% of executives responding say they use a SaaS application for customer relationship management or human resources - the two top categories of SaaS. Another 6% are requesting to use such service, and 19% are thinking about it, but the total doesn’t even eclipse the two-thirds mark. And every other category of SaaS ranked lower on Forrester’s executive survey.
What’s keeping businesses from finally making the leap? The result was an absolutely clear signal, from both the executives and the IT managers polled: Some 38% of executives and 46% of IT managers polled agreed with the statement, “We cannot manage security to our strict standards” - far higher than any other statement in the list.
To summarize: Businesses absolutely know they need to cut costs in this economy, and they know both mobile devices and cloud applications are means to that end. But they don’t know what Step 1 should be. Executives know the IT department isn’t taking Step 1, so they and their division leaders are taking the reins for themselves. But once they have the control, they don’t know how to begin either, so they hire consultants - even creating new external IT departments to take over from the internal ones. What’s keeping them from going forward with what should otherwise be a simple implementation plan, are fears about security, compliance and identity management. Simple solutions are being obstructed by complex problems - far more complex than the consumerization of IT.
High Scalability - High Scalability - Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
A technical look at the infrastructure that supports Tumblr and how it is designed to scale - very interesting as a real world case study.
With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.
Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.
One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr’s situation. Now with twenty engineers there’s enough energy to work on issues and develop some very interesting solutions.
Tumblr started as a fairly typical large LAMP application. The direction they are moving in now is towards a distributed services model built around Scala, HBase, Redis, Kafka, Finagle, and an intriguing cell based architecture for powering their Dashboard. Effort is now going into fixing short term problems in their PHP application, pulling things out, and doing it right using services.
The theme at Tumblr is transition at massive scale. Transition from a LAMP stack to a somewhat bleeding edge stack. Transition from a small startup team to a fully armed and ready development team churning out new features and infrastructure. To help us understand how Tumblr is living this theme is startup veteran Blake Matheny, Distributed Systems Engineer at Tumblr. Here’s what Blake has to say about the House of Tumblr:
Site: http://www.tumblr.com/Stats
- 500 million page views a day
- 15B+ page views month
- ~20 engineers
- Peak rate of ~40k requests per second
- 1+ TB/day into Hadoop cluster
- Many TB/day into MySQL/HBase/Redis/Memcache
- Growing at 30% a month
- ~1000 hardware nodes in production
- Billions of page visits per month per engineer
- Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
- Dashboard runs at a million writes a second, 50K reads a second, and it is growing.
Software
- OS X for development, Linux (CentOS, Scientific) in production
- Apache
- PHP, Scala, Ruby
- Redis, HBase, MySQL
- Varnish, HA-Proxy, nginx,
- Memcache, Gearman, Kafka, Kestrel, Finagle
- Thrift, HTTP
- Func - a secure, scriptable remote control framework and API
- Git, Capistrano, Puppet, Jenkins
Hardware
- 500 web servers
- 200 database servers (many of these are part of a spare pool we pulled from for failures)
- 47 pools
- 30 shards
- 30 memcache servers
- 22 redis servers
- 15 varnish servers
- 25 haproxy nodes
- 8 nginx
- 14 job queue servers (kestrel + gearman)
Architecture
- Tumblr has a different usage pattern than other social networks.
- With 50+ million posts a day, an average post goes to many hundreds of people. It’s not just one or two users that have millions of followers. The graph for Tumblr users has hundreds of followers. This is different than any other social network and is what makes Tumblr so challenging to scale.
- #2 social network in terms of time spent by users. The content is engaging. It’s images and videos. The posts aren’t byte sized. They aren’t all long form, but they have the ability. People write in-depth content that’s worth reading so people stay for hours.
- Users form a connection with other users so they will go hundreds of pages back into the dashboard to read content. Other social networks are just a stream that you sample.
- Implication is that given the number of users, the average reach of the users, and the high posting activity of the users, there is a huge amount of updates to handle.
- Tumblr runs in one colocation site. Designs are keeping geographical distribution in mind for the future.
- Two components to Tumblr as a platform: public Tumblelogs and Dashboard
- Public Tumblelog is what the public deals with in terms of a blog. Easy to cache as its not that dynamic.
- Dashboard is similar to the Twitter timeline. Users follow real-time updates from all the users they follow.
- Very different scaling characteristics than the blogs. Caching isn’t as useful because every request is different, especially with active followers.
- Needs to be real-time and consistent. Should not show stale data. And it’s a lot of data to deal with. Posts are only about 50GB a day. Follower list updates are 2.7TB a day. Media is all stored on S3.
- Most users leverage Tumblr as tool for consuming of content. Of the 500+ million page views a day, 70% of that is for the Dashboard.
- Dashboard availability has been quite good. Tumblelog hasn’t been as good because they have a legacy infrastructure that has been hard to migrate away from. With a small team they had to pick and choose what they addressed for scaling issues.
Old Tumblr
- When the company started on Rackspace it gave each custom domain blog an A record. When they outgrew Rackspace there were too many users to migrate. This is 2007. They still have custom domains on Rackspace. They route through Rackspace back to their colo space using HAProxy and Varnish. Lots of legacy issues like this.
- A traditional LAMP progression.
- Historically developed with PHP. Nearly every engineer programs in PHP.
- Started with a web server, database server and a PHP application and started growing from there.
- To scale they started using memcache, then put in front-end caching, then HAProxy in front of the caches, then MySQL sharding. MySQL sharding has been hugely helpful.
- Use a squeeze everything out of a single server approach. In the past year they’ve developed a couple of backend services in C: an ID generator and Staircar, using Redis to power Dashboard notifications
- The Dashboard uses a scatter-gather approach. Events are displayed when a user access their Dashboard. Events for the users you follow are pulled and displayed. This will scale for another 6 months. Since the data is time ordered sharding schemes don’t work particularly well.
New Tumblr
- Changed to a JVM centric approach for hiring and speed of development reasons.
- Goal is to move everything out of the PHP app into services and make the app a thin layer over services that does request authentication, presentation, etc.
- Scala and Finagle Selection
- Internally they had a lot of people with Ruby and PHP experience, so Scala was appealing.
- Finagle was a compelling factor in choosing Scala. It is a library from Twitter. It handles most of the distributed issues like distributed tracing, service discovery, and service registration. You don’t have to implement all this stuff. It just comes for free.
- Once on the JVM Finagle provided all the primitives they needed (Thrift, ZooKeeper, etc).
- Finagle is being used by Foursquare and Twitter. Scala is also being used by Meetup.
- Like the Thrift application interface. It has really good performance.
- Liked Netty, but wanted out of Java, so Scala was a good choice.
- Picked Finagle because it was cool, knew some of the guys, it worked without a lot of networking code and did all the work needed in a distributed system.
- Node.js wasn’t selected because it is easier to scale the team with a JVM base. Node.js isn’t developed enough to have standards and best practices, a large volume of well tested code. With Scala you can use all the Java code. There’s not a lot of knowledge of how to use it in a scalable way and they target 5ms response times, 4 9s HA, 40K requests per second and some at 400K requests per second. There’s a lot in the Java ecosystem they can leverage.
- Internal services are being shifted from being C/libevent based to being Scala/Finagle based.
- Newer, non-relational data stores like HBase and Redis are being used, but the bulk of their data is currently stored in a heavily partitioned MySQL architecture. Not replacing MySQL with HBase.
- HBase backs their URL shortner with billions of URLs and all the historical data and analytics. It has been rock solid. HBase is used in situations with high write requirements, like a million writes a second for the Dashboard replacement. HBase wasn’t deployed instead of MySQL because they couldn’t bet the business on HBase with the people that they had, so they started using it with smaller less critical path projects to gain experience.
- Problem with MySQL and sharding for time series data is one shard is always really hot. Also ran into read replication lag due to insert concurrency on the slaves.
- Created a common services framework.
- Spent a lot of time upfront solving operations problem of how to manage a distributed system.
- Built a kind of Rails scaffolding, but for services. A template is used to bootstrap services internally.
- All services look identical from an operations perspective. Checking statistics, monitoring, starting and stopping all work the same way for all services.
- Tooling is put around the build process in SBT (a Scala build tool) using plugins and helpers to take care of common activities like tagging things in git, publishing to the repository, etc. Most developers don’t have to get in the guts of the build system.
- Front-end layer uses HAProxy. Varnish might be hit for public blogs. 40 machines.
- 500 web servers running Apache and their PHP application.
- 200 database servers. Many database servers are used for high availability reasons. Commodity hardware is used an the MTBF is surprisingly low. Much more hardware than expected is lost so there are many spares in case of failure.
- 6 backend services to support the PHP application. A team is dedicated to develop the backend services. A new service is rolled out every 2-3 weeks. Includes dashboard notifications, dashboard secondary index, URL shortener, and a memcache proxy to handle transparent sharding.
- Put a lot of time and effort and tooling into MySQL sharding. MongoDB is not used even though it is popular in NY (their location). MySQL can scale just fine..
- Gearman, a job queue system, is used for long running fire and forget type work.
- Availability is measured in terms of reach. Can a user reach custom domains or the dashboard? Also in terms of error rate.
- Historically the highest priority item is fixed. Now failure modes are analyzed and addressed systematically. Intention is to measure success from a user perspective and an application perspective. If part of a request can’t be fulfilled that is account for
- Initially an Actor model was used with Finagle, but that was dropped. For fire and forget work a job queue is used. In addition, Twitter’s utility library contains a Futures implementation and services are implemented in terms of futures. In the situations when a thread pool is needed futures are passed into a future pool. Everything is submitted to the future pool for asynchronous execution.
- Scala encourages no shared state. Finagle is assumed correct because it’s tested by Twitter in production. Mutable state is avoided using constructs in Scala or Finagle. No long running state machines are used. State is pulled from the database, used, and writte n back to the database. Advantage is developers don’t need to worry about threads or locks.
- 22 Redis servers. Each server has 8 - 32 instances so 100s of Redis instances are used in production.
- Used for backend storage for dashboard notifications.
- A notification is something like a user liked your post. Notifications show up in a user’s dashboard to indicate actions other users have taken on their content.
- High write ratio made MySQL a poor fit.
- Notifications are ephemeral so it wouldn’t be horrible if they were dropped, so Redis was an acceptable choice for this function.
- Gave them a chance to learn about Redis and get familiar with how it works.
- Redis has been completely problem free and the community is great.
- A Scala futures based interface for Redis was created. This functionality is now moving into their Cell Architecture.
- URL shortener uses Redis as the first level cache and HBase as permanent storage.
- Dashboard’s secondary index is built around Redis.
- Redis is used as Gearman’s persistence layer using a memcache proxy built using Finagle.
- Slowly moving from memcache to Redis. Would like to eventually settle on just one caching service. Performance is on par with memcache.
Internal Firehose
- Internally applications need access to the activity stream. An activity steam is information about users creating/deleting posts, liking/unliking posts, etc. A challenge is to distribute so much data in real-time. Wanted something that would scale internally and that an application ecosystem could reliably grow around. A central point of distribution was needed.
- Previously this information was distributed using Scribe/Hadoop. Services would log into Scribe and begin tailing and then pipe that data into an app. This model stopped scaling almost immediately, especially at peak where people are creating 1000s of posts a second. Didn’t want people tailing files and piping to grep.
- An internal firehose was created as a message bus. Services and applications talk to the firehose via Thrift.
- LinkedIn’s Kafka is used to store messages. Internally consumers use an HTTP stream to read from the firehose. MySQL wasn’t used because the sharding implementation is changing frequently so hitting it with a huge data stream is not a good idea.
- The firehose model is very flexible, not like Twitter’s firehose in which data is assumed to be lost.
- The firehose stream can be rewound in time. It retains a week of data. On connection it’s possible to specify the point in time to start reading.
- Multiple clients can connect and each client won’t see duplicate data. Each client has a client ID. Kafka supports a consumer group idea. Each consumer in a consumer group gets its own messages and won’t see duplicates. Multiple clients can be created using the same consumer ID and clients won’t see duplicate data. This allows data to be processed independently and in parallel. Kafka uses ZooKeeper to periodically checkpoint how far a consumer has read.
Cell Design for Dashboard Inbox
- The current scatter-gather model for providing Dashboard functionality has very limited runway. It won’t last much longer.
- The solution is to move to an inbox model implemented using a Cell Based Architecture that is similar to Facebook Messages.
- An inbox is the opposite of scatter-gather. A user’s dashboard, which is made up posts from followed users and actions taken by other users, is logically stored together in time order.
- Solves the scatter gather problem because it’s an inbox. You just ask what is in the inbox so it’s less expensive then going to each user a user follows. This will scale for a very long time.
- Rewriting the Dashboard is difficult. The data has a distributed nature, but it has a transactional quality, it’s not OK for users to get partial updates.
- The amount of data is incredible. Messages must be delivered to hundreds of different users on average which is a very different problem than Facebook faces. Large date + high distribution rate + multiple datacenters.
- Spec’ed at a million writes a second and 50K reads a second. The data set size is 2.7TB of data growth with no replication or compression turned on. The million writes a second is from the 24 byte row key that indicates what content is in the inbox.
- Doing this on an already popular application that has to be kept running.
- Cells
- A cell is a self-contained installation that has all the data for a range of users. All the data necessary to render a user’s Dashboard is in the cell.
- Users are mapped into cells. Many cells exist per data center.
- Each cell has an HBase cluster, service cluster, and Redis caching cluster.
- Users are homed to a cell and all cells consume all posts via firehose updates.
- Each cell is Finagle based and populates HBase via the firehose and service requests over Thrift.
- A user comes into the Dashboard, users home to a particular cell, a service node reads their dashboard via HBase, and passes the data back.
- Background tasks consume from the firehose to populate tables and process requests.
- A Redis caching layer is used for posts inside a cell.
- Request flow: a user publishes a post, the post is written to the firehose, all of the cells consume the posts and write that post content to post database, the cells lookup to see if any of the followers of the post creator are in the cell, if so the follower inboxes are updated with the post ID.
- Advantages of cell design:
- Massive scale requires parallelization and parallelization requires components be isolated from each other so there is no interaction. Cells provide a unit of parallelization that can be adjusted to any size as the user base grows.
- Cells isolate failures. One cell failure does not impact other cells.
- Cells enable nice things like the ability to test upgrades, implement rolling upgrades, and test different versions of software.
- The key idea that is easy to miss is: all posts are replicated to all cells.
- Each cell stores a single copy of all posts. Each cell can completely satisfy a Dashboard rendering request. Applications don’t ask for all the post IDs and then ask for the posts for those IDs. It can return the dashboard content for the user. Every cell has all the data needed to fulfill a Dashboard request without doing any cross cell communication.
- Two HBase tables are used: one that stores a copy of each post. That data is small compared to the other table which stores every post ID for every user within that cell. The second table tells what the user’s dashboard looks like which means they don’t have to go fetch all the users a user is following. It also means across clients they’ll know if you read a post and viewing a post on a different device won’t mean you read the same content twice. With the inbox model state can be kept on what you’ve read.
- Posts are not put directly in the inbox because the size is too great. So the ID is put in the inbox and the post content is put in the cell just once. This model greatly reduces the storage needed while making it simple to return a time ordered view of an users inbox. The downside is each cell contains a complete copy of call posts. Surprisingly posts are smaller than the inbox mappings. Post growth per day is 50GB per cell, inbox grows at 2.7TB a day. Users consume more than they produce.
- A user’s dashboard doesn’t contain the text of a post, just post IDs, and the majority of the growth is in the IDs.
- As followers change the design is safe because all posts are already in the cell. If only follower posts were stored in a cell then cell would be out of date as the followers changed and some sort of back fill process would be needed.
- An alternative design is to use a separate post cluster to store post text. The downside of this design is that if the cluster goes down it impacts the entire site. Using the cell design and post replication to all cells creates a very robust architecture.
- A user having millions of followers who are really active is handled by selectively materializing user feeds by their access model (see Feeding Frenzy).
- Different users have different access models and distribution models that are appropriate. Two different distribution modes: one for popular users and one for everyone else.
- Data is handled differently depending on the user type. Posts from active users wouldn’t actually be published, posts would selectively materialized.
- Users who follow millions of users are treated similarly to users who have millions of followers.
- Cell size is hard to determine. The size of cell is the impact site of a failure. The number of users homed to a cell is the impact. There’s a tradeoff to make in what they are willing to accept for the user experience and how much it will cost.
- Reading from the firehose is the biggest network issue. Within a cell the network traffic is manageable.
- As more cells are added cells can be placed into a cell group that reads from the firehose and then replicates to all cells within the group. A hierarchical replication scheme. This will also aid in moving to multiple datacenters.
…
Software Deployment
- Started with a set of rsync scripts that distributed the PHP application everywhere. Once the number of machines reached 200 the system started having problems, deploys took a long time to finish and machines would be in various states of the deploy process.
- The next phase built the deploy process (development, staging, production) into their service stack using Capistrano. Worked for services on dozens of machines, but by connecting via SSH it started failing again when deploying to hundreds of machines.
- Now a piece of coordination software runs on all machines. Based around Func from RedHat, a lightweight API for issuing commands to hosts. Scaling is built into Func.
- Build deployment is over Func by saying do X on a set of hosts, which avoids SSH. Say you want to deploy software on group A. The master reaches out to a set of nodes and runs the deploy command.
- The deploy command is implemented via Capistrano. It can do a git checkout or pull from the repository. Easy to scale because they are talking HTTP. They like Capistrano because it supports simple directory based versioning that works well with their PHP app. Moving towards versioned updates, where each directory contains a SHA so it’s easy to check if a version is correct.
- The Func API is used to report back status, to say these machines have these software versions.
- Safe to restart any of their services because they’ll drain off connections and then restart.
- All features run in dark mode before activation.
Development
- Started with the philosophy that anyone could use any tool that they wanted, but as the team grew that didn’t work. Onboarding new employees was very difficult, so they’ve standardized on a stack so they can get good with those, grow the team quickly, address production issues more quickly, and build up operations around them.
- Process is roughly Scrum like. Lightweight.
- Every developer has a preconfigured development machine. It gets updates via Puppet.
- Dev machines can roll changes, test, then roll out to staging, and then roll out to production.
- Developers use vim and Textmate.
- Testing is via code reviews for the PHP application.
- On the service side they’ve implemented a testing infrastructure with commit hooks, Jenkins, and continuous integration and build notifications.
…
Lessons learned
- Automation everywhere.
- MySQL (plus sharding) scales, apps don’t.
- Redis is amazing.
- Scala apps perform fantastically.
- Scrap projects when you aren’t sure if they will work.
- Don’t hire people based on their survival through a useless technological gauntlet. Hire them because they fit your team and can do the job.
- Select a stack that will help you hire the people you need.
- Build around the skills of your team.
- Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
- Ask your peers. They talked to engineers from Facebook, Twitter, LinkedIn about their experiences and learned from them. You may not have access to this level, but reach out to somebody somewhere.
- Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.
Digital IQ: 7 Digital Strategies of Top-Performing Companies — CIO Dashboard
Exactly what the title says — but it also offers some insight into how the CIO role is transforming in the wake of today’s technologies.
Growing a multimillion dollar corporation during a recession is no small feat and it’s no accident. Companies that are swimming against the economic tide are doing things differently than those that are treading water, or worse, drowning. So, what exactly separates the best from the rest?
In a world of tech-empowered consumers and employees, companies that are bucking the economic trend exhibit key behaviors that allow them to exploit technology and weave it into their businesses. We call this a company’s Digital IQ.
We surveyed 489 companies across industries with annual revenues of more than $500 million to find out what successful companies are doing that the others aren’t. Following is how the “top performers”—companies that grew by 5% or more last year—are making the grade.
1) Integrate Technology into Strategic Planning
Naturally, creating a strategic plan is the first step in implementing any sort of large-scale corporate effort. However, the effectiveness of the strategy is what counts. 89% of top performers feel confident about their strategy versus 63% of the pack. Top performing companies are also more likely to integrate IT in their strategic planning process. Of the companies that are excelling, 86% said that their CEO is an active champion in the use of information technology to achieve corporate strategy. That number drops to 56% for the remaining respondents. Moreover, the crème of the crop is more likely to have a CIO who not only reports directly to the CEO, but has strong relationships with other C-suite executives. In winning companies, the CIO is seen as a business champion.
2) Set and Share a Single, Multi-year Roadmap for the Overall Business Strategy
In the mobilization stage, executives create a blueprint for breathing life into the strategy. Too many companies bypass this critical step. Not the best companies. 77% of top performers have a single, multi-year roadmap. That number sinks to 54% among average performers. Furthermore, sharing that strategy throughout the company is also critical to success. 76% of top performers say that their strategy is well communicated throughout the company. Only 44% of the rest make that claim. Additionally, 78% of top performing companies say that their business and IT leaders share an understanding of the strategy. Only 49% of the other survey respondents feel that everyone is on the same page.
3) Look Beyond Delivering IT Projects on Time and on Budget
Setting a strategy leads to IT projects that are delivered on time and on budget so it’s no surprise that superior corporations are meeting their goals more often than the others. 67% of top performers say that their initiatives were delivered on time. In stark contrast, only 38% of the rest of respondents hit the mark. Coming in at or below budget proves to be more difficult for both groups, but nonetheless, 54% of top performers did so compared to 35% of the larger group. However, many organizations downplay the trade-offs in cutting scope and therefore potential business value in favor of bringing a project in under cost and time targets. Almost 100% of top performers say that they frequently or always deliver their planned scope versus only 35% for all surveyed.
4) Invest in Mobile Workforces
Top performers spend more on empowering their workforces with mobile devices than the rest. 44% of the high performers will invest between $250,000 and $1 million and 33% will invest more than $1 million. For the remainder of respondents, those numbers are 37% and 27%, respectively.
5) Interact with Customers Using Mobile Technology
Only 44% of the pack interacts with customers via mobile devices “quite or very significantly.” That number jumps to 66% among the top performers. And, top performers are putting their money with their mouths are: 50% of them plan to invest more than $1million in mobile solutions for customers in 2012. That number plummets to 29% for the rest of the group.
6) Reap the Rewards of Social Media
Both the best and the rest are investing in social media at almost the same rates, but the top performers claim to see more of a benefit from their efforts than their lower-performing counterparts. 41% of top performers say they are benefiting from their investments in social media compared to 24% of the others. To gain a greater edge, 40% of the top performers expect to increase their use of social media. Only 26% of the rest of respondents plan to spend more in this area. In fact, 36% of the top of the heap plan to invest $1 million in social media for internal communications in 2012.
7) Invest in Cloud Computing
Top performers are investing more aggressively in cloud computing than the pack. 52% of top performers will spend more than $1 million on public cloud applications. Only 34% of the remaining survey respondents plan to spend that much. For private cloud investments, the gap is even wider: 58% of top performers will invest more than $1 million, compared with 39% for the rest.
Overall, top-performers link strategy to specific programs and actions while mobilizing the organization around that strategy. Everyone knows what role they should be playing and what success looks like. Top performers also don’t shy away from opportunities to innovate. The role that the CIO plays is to drive superior execution and innovation.
In this post, we didn’t address the roadblocks that stop companies from becoming top performers. What is preventing the rest from being the best?
Drive to Public Clouds Powered by Both Consumers and IT Managers (IDEAS Insights)
IDEA International shares some thoughts on cloud’s evolution. Nothing particularly new or groundbreaking, but a good summary of recent developments.
The farthest-reaching changes in the IT industry often occur when a single new development simultaneously responds to the needs of both consumers and business users. Perhaps the best-known example of this kind of wave was the original PC: office workers used PCs during the day for their business tasks and then used the machine when they got home to play games (and often vice versa). The Web was also adopted in equal measure by consumers and business users when it first broke into the mainstream, which was a key factor in its incredibly rapid rise. The same will happen with cloud computing, once the public cloud providers close the loop between business and consumer services.
In 2012, both consumers and large companies will step up their adoption of public clouds. From the standpoint of end users, public cloud computing will be seen as enabling greater mobility, gradually leading to “ubiquitous” computing in which they no longer have to be concerned about where their data is actually located. At the same time, the economics of public cloud computing will become sufficiently attractive to IT managers in enterprise environments that they will no longer be able to avoid considering it, at least for certain workloads. This kind of lockstep between consumers and business users will cause big changes across the IT industry.
Users who put their data in the cloud expect that they will be able to access the data on any device, from anyplace in the world. Because there is only one copy of the data (and hopefully a backup copy somewhere), users hope that they will no longer need to synchronize laptops with other devices like iPads and smartphones. In 2011, many consumers were subtly introduced to the convenience of cloud storage when Apple introduced automatic synching of data between iPhones, iPods, and other devices with its iCloud service (the capability was introduced transparently with an update of Apple’s iOS operating system). Since the iPad dominates the tablet market, and the iPhone is one of the most popular smartphone models, other tablet and smartphone providers will soon need to include similar capabilities to remain competitive. As a result, the huge base of consumers storing their music and photos in multitenant clouds will promote the acceptance of cloud storage from a theoretical capability to a real and useful service. The rise of cloud computing will eventually speed the convergence of “mobile” and “social” trends, in which data sharing between trusted parties will become the normal approach for exchanging information.
In datacenters, the economics of public cloud computing will become increasingly attractive to IT managers. Continuing concerns about potential security risks will prevent organizations from entrusting their most sensitive workloads to public clouds, but for many other workloads, the flexibility and potential cost benefits of cloud deployment will outweigh its risks. In 2012, the use of public clouds will go beyond early adopters and enter the mainstream for certain applications. As public clouds become part of standard IT operating procedures, some business issues with service providers will rise to the forefront. Customers will increasingly focus on issues such as service level agreements (SLAs) and portability between cloud services. Companies planning a cloud deployment will narrow their focus to providers who have the technical ability to deliver on SLAs and can provide security in the cloud. Vendor lock-in with cloud service providers will become a greater concern as customers grapple with the decision of whether to embrace proprietary solutions that deliver unique benefits, or more open solutions that may have limitations. Some cloud vendors will tout their relative openness and present vendor lock-in as a major reason customers should not buy from their competitors.
Throughout 2012, cloud services will become an increasingly big business as companies complete their trials and begin to roll out full-scale enterprise applications to the cloud. Amazon AWS will become the first billion-dollar cloud venture. Towards the end of 2012, the cloud business will begin to see a shakeout as larger, better-financed companies cherry-pick the best companies and push out the weaker start-ups. It will become much clearer by the end of the year which service providers can deliver for the long haul, and which can’t. On the user side, most deployments will be noneventful and successful, but some high-profile events will occur that highlight the problems that happen when cloud is not deployed correctly. To fully reap the benefits of cloud computing, IT workers will need to reassess their skills and go for training in new areas. In the meantime, companies that want to deploy a private cloud may have a difficult time finding IT workers with the right mix of skills to design, deploy, and manage a cloud.
A CTO’s take on cloud — Cloud Computing News
Great article on how the CTO of Capgemini is looking at recent developments in enterprise adoption of the cloud. Good stuff.
As Capgemini’s CTO for North America, Joe Coyle hears an awful lot about cloud computing. He hears it from customers that want to evaluate cloud solutions and from vendors that want to win that business. Capgemini, a $12 billion global systems integrator, has relationships with all the major vendors and many enterprise customers, so it’s interesting to hear what Coyle has to say about the current state of the market.
Here are my main takeaways from a recent conversation with him.
1: IBM is cloudier than you think.
Big Blue has a pretty potent set of cloud options but it’s going about its business very cleverly. Given it’s big-iron heritage, IBM rarely talks about the hardware component of its cloud portfolio, Coyle said.
“They’re attacking this from a software perspective. They’ve taken Tivoli and are building this software umbrella so that you can take whatever you’re running in your data center now and put all or part of it in a public or private cloud,” he noted. IBM’s 2010 acquisition of Cast Iron also give it a slick appliance that lets customers integrate in-house apps with SaaS applications running outside.
He doesn’t see IBM cloud penetrating a ton of new smaller businesses, but for many existing IBM shops — and there are a ton of them — IBM cloud is a no brainer.
2: Microsoft Azure has a tough row to hoe
Coyle is of two minds on Windows Azure, the platform-as-a-service (PaaS) underlying Microsoft’s cloud strategy.
“Azure’s been a bit of a disappointment,” he said. “When Microsoft briefed us on it years ago, all the national [systems integrators] were chomping at the bit. But then it stumbled.”
“Then the message was the software would only run on Azure. That’ s fine, but by that point, the world had moved on, companies were already using Amazon,” he said. The usual argument that Azure is a PaaS while Amazon Web Services (AWS) is Infrastructure-as-a-Service (IaaS) simply doesn’t matter to most customers. The big AWS draw is they know they can deploy their applications on AWS now and move them to another hosted or in-house data center, later.
On the plus side, the Azure technology is solid and, unlike previous Microsoft development technologies, forces developers to follow the rules — they can’t design software services that misbehave. ”Azure is extremely powerful and if [Microsoft] can get its act together people will try it,” Coyle said.
But overshadowing all that technical mastery is the perception of Azure as a closed platform — despite its multi-language support. Microsoft’s single biggest problem is customer suspicion that it will use Azure to lock them into the next wave of Microsoft technologies, essentially replacing the Windows/Office upgrade cycle.
“I’m not saying it’s true, but it’s what people think,” Coyle said.
3: Amazon is Amazon
Amazon Web Services are what they are: extremely flexible and leading the league in public cloud. AWS suffered a couple black eyes in 2011 with an embarrassing four-day outage in April and then a widespread reboot glitch later in the year.
Coyle is pretty forgiving of these miscues. The April outage, he said, was largely due to people implementing their work incorrectly, something that AWS tried to fix manually. There are things you can do now in AWS to prevent this stuff, to build in more reliability and redundancy, although users will have to pay for it, he said.
The bottom line? Glitches and all, Amazon is the incumbent public cloud power and will stay that way, he said.
4: OpenStack as big-time cloud disruptor
Coyle is also bullish on the OpenStack movement, which is building a standard cloud foundation out of open-source tools. Initiated by Rackspace and NASA, it’s achieved critical mass with nearly every IT provider — from Dell, to HP, to Cisco, to Citrix — aboard and Rackspace offloading management to a more neutral OpenStack Foundation.
“OpenStack will change the world of cloud computing. As a lot of smaller companies look to build their own clouds, this will be a natural choice,” Coyle said.
Who stands to lose if that’s the case? Ironically, the Dells and HPs of the world — all of which are building their own clouds. “Why do you think they joined?” His feeling is these hardware companies — many of which were building their own more vendor-specific clouds — are hedging their bets.
Will OpenStack affect Amazon? “No. Amazon is Amazon,” he said.
5: CIOs are getting over cloud phobia
It’s taken time, but the economics of cloud computing are too good for CIOs to ignore, Coyle said. Any doubts they had about moving at least some corporate data to an outside cloud storage provider, for instance, have evaporated in recent months.
And they’re getting emboldened to do more than storage. The advent of Hadoop and NoSQL technologies means that companies could actually get some use out of all that old stuff sitting on tape or in platters, he said. Uploading that information, and massaging it with the latest analytics means that historical data can be used to test assumptions and new models, for example, seeing what a price change means to sales over time.
Wringing real value out of old data is a pretty good proposition for most CIOs.
Big Data in Context • The Register
The Register publishes results on a recent reader poll concerning “big data.” Not sure how scientific this is, but it’s a good directional indication of how IT professionals are looking at so-call “big data” solutions. Not surprisingly, there is some cynicism about the marketing of these idea, but these types of solutions are gaining a foothold in the enterprise.
Big Data is an ‘umbrella’ term that is commonly used to refer to a number of advanced data storage, access and analytics technologies aimed at handling high volume and/or fast moving data in a variety of scenarios. These typically involve low signal-to-noise ratios, such as social media sentiment monitoring, or log file analysis, to mention just a couple. If you listen to the PR folks, it’s the next ‘big thing’ in IT, but does it really warrant the relatively high level of media attention and coverage it is currently receiving?
We wanted to gain practical insights into the context for some of the ideas and solutions often associated with the ‘Big Data’ phrase, and to test the extent to which they are actually reflected in current operational practices and future plans. And where better to go for a down-to-earth view than the Reg readership?
To this end, we ran a survey on The Register during November of 2011, which allowed 122 respondents to give us their feedback in this area - and thanks to all those that participated for their inputs and insight.
Given the number of vendors jumping onto the bandwagon, it’s hard to pin down where the Big Data discussion stops and starts, so the scope of the survey included both long-established data management approaches and some of the emergent technologies which are often (but not exclusively) associated with the Big Data label. These include scale-out storage architectures, distributed indexing and search tools, distributed analytics solutions as well as stream based processing technologies.
Looking at the results, the first and most obvious observation is that Relational Database Management Systems (RDBMSs) continue to rule the roost when it comes to data storage, management and analytics technologies
Figure 1
By comparison, the aforementioned ‘Big Data’-related solutions currently have a very small footprint, especially when we consider the ‘self-selecting’ nature of the sample, meaning early adopters are likely to be over-represented. The limited penetration we see is not surprising given that the surge in promotional activity is a relatively recent phenomenon, even if some of the technologies now labelled ‘Big Data’ have existed in niches for many years.
However, the increasing role of Big Data solutions in the mainstream over time does come through when we look at how the use of data management technologies might change over the next three years (Figure 2).
Figure 2
With the exception of legacy databases and file systems, which are clearly anticipated to be in decline, the overriding message from this chart is that all forms of modern data management and analytics solutions will be in greater demand in the short to medium term. This reflects the fact that Big Data technologies often allow new types of problem to be tackled - e.g. large, dirty and/or noisy data sets - but this doesn’t mean that more familiar problems go away. Indeed, while we haven’t shown it here, this latest survey confirmed what most of us know already, i.e. that data volumes are increasing across all record types while the thirst for actionable information and insights among business users continues to escalate.
Turning to practical implementation matters, the potential for adopting any new technology is generally constrained by the available levels of awareness, understanding and expertise available. Therefore, data management professionals will need the right skills and knowledge to exploit the full range of technology options effectively. With this in mind, the survey sought to gauge the current levels of familiarity (Figure 3) with a variety of solutions.
Figure 3
Comparison of future intent (as previously seen in Figure 2) with the current levels of technology familiarity (Figure 3) indicates the need for improving awareness, understanding and skills, particularly in the areas of: scale-out storage architectures, distributed indexing and search, along with distributed analytics engines. It is to be expected that knowledge levels will climb as organisations investigate and pilot Big Data flavoured solutions that pretty much all of the large established IT vendors, as well as a plethora of newcomers, are now starting to make available in some form or other.
However, as Big Data action develops, one message from the survey is very clear, namely the emergence of advanced storage, access and analytics solutions does not represent the end of the traditional RDBMS (Figure 4).
Figure 4
We can also see that as vendors fall over themselves to reposition anything and everything to with data management as ‘Big Data’, the hype monster is again rearing its ugly head. And given the confusion that comes with this, it is not surprising that around 40% of respondents have no clear understanding of what the term ‘Big Data’ means.
Once you strip away the hype terminology, though, it’s clear that a majority of respondents see that the technologies we asked about can bring benefits, both in terms of tackling existing problems and new approaches to meeting current as well as emerging business requirements (Figure 5).
Figure 5
In a nutshell: ‘Big Data’ technologies have a lot to offer, but they’re not going to replace existing, modern database and analytics infrastructures. However, everybody would be better served if vendors toned down the hype and focused more on communicating what types of ‘Big Data’ solutions they have available, and which use cases those solutions can address. Otherwise, there’s a risk that the marketing fog ends up obscuring the real message, which in turn will inhibit adoption and delay the moment when companies can reap the business benefits of ‘Big Data’.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5