IBM's potential x86 server sale to Lenovo highlights oncoming train | ZDNet
Wow! Interesting analysis - and crazy implications.
IBM is reportedly talking to Lenovo about selling its x86-based server business to Lenovo and the move would make a lot of sense.
If the talks, flagged in the Wall Street Journal and CRN, sound familiar that’s because Big Blue famously unloaded its PC business to Lenovo in a win-win deal. Lenovo went on to be one of the premier PC makers and IBM focused on software and services and got ahead of trends such as analytics.
To say the IBM’s PC situation then and today’s server state of affairs rhyme would be an understatement. You could argue the situations are the same thing. When IBM offloaded its PC unit, no one saw tablets coming. All IBM knew is that the margins stunk and it wanted higher value wares. The post-PC era was years away.
Fast forward to the server market, which is ripe for disruption. Server sales are doing ok. Companies will have to buy servers right? Of course they will—-for about another three to five years. The reality is servers are going in the following directions:
Specialization by workload. Think IBM’s PureSystems and Oracle’s Exadata efforts.
- Commodity-ville on the x86 front. You can’t ignore that companies like Google and Facebook go right to white box makers for servers. That reality isn’t so hot for HP, Dell and IBM.
- You need to own the silicon and intellectual property to really work the server business. IBM’s Power systems won’t go anywhere. Oracle has SPARC. Hewlett-Packard is going processor agnostic with Moonshot, a server line that appears to be innovative.
- Fewer server buyers. As companies move to the cloud, demand for compute will only increase. The problem. Server makers will be selling in bulk to fewer customers and cloud computing farms. There will only be so many cloud providers. Enterprises large enough to roll their own data centers will be few and far between.
Now let’s talk timing here. The server market won’t unravel tomorrow. It won’t unravel in a few years. But Armageddon will occur and the clock starts ticking right about now.
Why? An enterprise that buys a server right now will start a tax depreciation clock that will run about three years. Once those three years are up and those assets depreciate, the CXO in charge will weigh the costs and benefits of the cloud vs. running a data center, server cluster or whatever. I’ll bet that in three years the cloud will win by a wide margin. Let’s face it—-the cloud is already starting to win and all you have to do is show up at one of Amazon Web Services’ customer powwows to know the writing is on the server rack.
On Thursday, I caught up with Cycle Computing CEO Jason Stowe. There’s a lot to like about Cycle Computing. First, the company is bootstrapped so there’s instant respect. Second, Cycle Computing is at the forefront of making high performance computing clusters for the masses. And third, Cycle Computing has top insurance and pharmaceutical companies as customers. Cycle Computing had massive customers from day one. In other words, Cycle Computing is the real deal, hooked up with Amazon Web Services and will enable a lot of science to happen just by democratizing HPC for smaller companies.
Stowe noted that Cycle Computing is starting to land manufacturing and engineering customers now for its HPC management software and cloud connections. In other words, this HPC for the masses is catching on. If you play this out, there will be fewer servers sold because folks will be using Rackspace, AWS or some other former hardware focused vendor.
Today, it’s big data and research compute driving Cycle Computing demand. Tomorrow every company will have the mathematic models and horsepower to simulate just about anything. You won’t buy your own servers for that computing power.
Stowe said servers will become like wheat fields not things you name. “Today servers are hugged, named and managers know their quirks. There’s an attachment. In the future server clusters will be more like wheat fields. You grow the wheat, reap and sow, eat and replant the seeds. There’s no attachment to the wheat,” said Stowe.
In other words, Stowe’s excellent analogy on servers and meeting compute demand translates into cloud farms and fields. Most companies are going to hit the brakes on new server buying as soon as the depreciation ends and new compute demand has to be met. Play this out and the profit margins on servers aren’t going to look so hot.
IBM sees all of the servergeddon scenarios developing and that’s why it’s ditching its commodity server business now. Let Lenovo, which has the scale and ambition to do the commodity server game, carry the ball from here and duke it out with HP and Dell.
Big Blue to handle SKA's Big Data about Big Bang • The Register
IBM embarks on extremely ambitious big data project with the Netherlands Institute for Radio Astronomy (ASTRON):
Big Blue has been given the ultimate big data gig - collecting and analysing data all the way back to the universe’s early history, thanks to a brief from the Netherlands Institute for Radio Astronomy (ASTRON) and a €32.9m cheque from the Dutch government. ASTRON and IBM will collaborate on a computer capable of ingesting the expected exabyte a day that will be generated by the Square Kilometer Array (SKA).
In an announcement issued today, the two organisations say “ASTRON and IBM scientists in the Netherlands and Switzerland have launched an initial five-year collaboration called DOME, named for the protective cover on telescopes and the famous Swiss mountain.”
“DOME will investigate emerging technologies for large-scale and efficient exascale computing, data transport and storage processes, and streaming analytics that will be required to read, store and analyze all the raw data that will be collected daily.”
Ton Engbersen of IBM Research’s Zurich facility says, in a statement, that “If you take the current global daily Internet traffic and multiply it by two, you are in the range of the data set that the Square Kilometre Array radio telescope will be collecting every day. This is Big Data Analytics to the extreme. With DOME we will embark on one of the most data-intensive science projects ever planned, which will eventually have much broader applications beyond radio astronomy research.”
The DOME boffins will research “advanced accelerators and 3D stacked chips for more energy-efficient computing” and hopes they will do the trick, along with “novel optical interconnect technologies and nanophotonics to optimize large data transfers, as well as high-performance storage systems based on next-generation tape systems and novel phase-change memory technologies.”
Work already undertaken BY IBM for ASTRON’s low-frequency array (LOFAR), will be an important stepping stone towards the DOME project.
LOFAR uses an IBM Blue Gene/P system, so it seems likely IBM’s HPC crew may get another turn for this projrct. LOFAR also use plus un-named storage systems. We’d be a little surprised if IBM’s SONAS scale-out NAS product isn’t present too, but cannot guess just what kind of rig will be needed to handle the next-generation tape systems mentioned above.
The DOME project will proceed regardless of which of the two competing SKA hosts – Australia and South Africa – wins the right to host the telescope. Recent scuttlebutt suggested South Africa had won the competition, but the committee charged with making the decision recently issued a statement saying no definitive judgement has been made.
Exascale by 2018: Crazy ...or possible? • The Register
A short history of supercomputing and the race to exascale computing.
I recently saw some estimates that show we should hit exascale supercomputer performance by around 2018. That seems a bit ambitious – if not stunningly optimistic – and the search to get some perspective led me on an hours-long meander through supercomputing history, plus what I like to call “Fun With Spreadsheets.”
Right now the fastest super is Fujitsu’s K system, which pegs the Flop-O-Meter at a whopping 10.51 petaflops. Looking at my watch, I notice that we’re barely into 2012; this gives the industry another six years or so to attain 990 more petaflops worth of performance and bring us to the exascale promised land.
This implies an increase in performance of around 115% per year over the next six years. Is this possible? Let’s take a trip in the way-back machine…
Just getting to megaflop performance took from the beginning of recorded history until 1964. If we start the clock with the Xia Dynasty at 2,000 BC, this means it took us 3,964 years to get from nothing to megaflops. This is a pretty meager rate of increase, probably somewhere around 0.17 per cent a year, but you have to factor in that everyone was busy fighting, exploring, coming up with new kinds of hats, and inventing the Morris Dance.
The first megaflop system, the Seymour Cray-designed Control Data CDC 6600, was delivered in 1964. It was a breakthrough in a number of ways: the first system to use newly-invented silicon-based processors, the first RISC-based CPU, and the first to use additional (but simpler) assist processors, called ‘peripheral processors,’ to handle I/O and feed tasks to the CPU. This was game-changing technology.
The transition from megaflop to gigaflop performance took only another 21 years with the introduction of the Cray-2, which hit the market in 1985. Seymour Cray broke away from Control Data in 1972 to start his own shop, Cray Research Inc. The Cray-2 delivered 1.9 gflops peak performance by extensively using integrated circuits (early use of modular building blocks), multiple processors (four units), and innovative full-immersion liquid cooling to handle the massive heat load. In its time, it was also game-changing technology. The Cray-2 was also highly stylish, with a futuristic design complimented by blue, red, or yellow panels. Here’s a PDF of a brochure covering the Cray-2.
Fast-forward another 11 years and we see the first system to sustain teraflop performance, the Intel-based ASCI Red system, which was also a big break from past supercomputer designs. Installed at Sandia National Lab in 1996, it’s an example of what we’ve come to expect from modern supercomputers with 9,298 Intel Pentium processors, a terabyte of RAM, and air cooling.
The compound annual performance growth rate (CAGR) for this move from gflop to tflop (another thousand-fold increase) is roughly 87.5 per cent per year, which won’t get us to exascale until midway through 2019 (just in time for the June Top500 list, I’d expect). Not too far off of the 2018 prediction, however.
Twelve years later, in 2008, the first petaflop (the IBM Roadrunner) system debuted. Achieving another 1000-fold performance increase in 12 years is equivalent to a 78 per cent compound annual growth rate. This is way faster than Moore’s Law, which has an implied CAGR of around 60%, but a little slower than the previous move from giga to teraflops. At this growth rate, we’ll reach exascale in 2020 – probably late in the year, but it might make the November 2020 Top500 list.
A mere three years after that, the K computer hit 10.51 pflops performance. The performance growth rate from Roadrunner to K? 116 per cent CAGR, which is almost exactly the growth rate necessary to deliver exascale by 2020.
Does this mean that we’ll see exascale systems in 2018 or even 2020? No, it doesn’t; it’s merely another data point in handicapping the race. This analysis simply looks at timelines; it ignores the problems inherent in housing, powering, and cooling a system that’s 1,000x faster than the current top performer, which sports more than 80,000 compute nodes, 700,000 processing cores, and uses enough power to run 12,000 households before they all get electric cars.
The technology challenges are mind-boggling, and it’s clear that simply applying ‘smaller but faster’ versions of today’s technology won’t get us over the exascale hump. It’s going to take some technology breakthroughs and new approaches. Even with these hurdles, I’m betting that we’ll see exascale performance before the end of 2020, putting us right in line with previous transitions.
But all bets are off if the Mayan prediction of global destruction in December of 2012 turns out to be true. In that case, I reserve the right to change my bet to the year 5976 – which is 2012 AD plus the 3,964 years it took us to get to megaflops. Seems like a safe enough hedge to me.
IBM's Anjul Bhambhri on What Is a Data Scientist? - Forbes
This article is much broader than the title would suggest. It’s a really good explanation of IBM’s unique point of view on how to create a holistic value chain around big data.
In order to solve contemporary business problems, a big data strategy is needed much more than any one product. As I explained in my prior article, “Curing the Big Data Storage Fetish,” there is a growing understanding among enterprises that solving the big data conundrum can’t just be about acquiring more data warehousing technology. To fully exploit the opportunity presented by big data, a value chain must be created that helps address the challenges of acquiring data, evaluating its value, distilling it, building models both manually and automatically, analyzing the data, creating applications, and changing business processes based on what is discovered. Organizations have to figure out a way to increase analytical capacity, not just raw storage capacity.
“The enterprises that will achieve a competitive edge and win will have a blend of a healthy data-science culture, enterprising data scientists who can bend the ear of C-level decision makers, and the right combination of technology that will surface the data that make sense in the context of the business,” says , vice president of development for big data projects at IBM. To continue my series of articles on “What is a Data Scientist?” I interviewed Bhambhri about her vision for creating business value from big data. (For more on expanding big data capabilities and a list of all the stories in the “What is a Data Scientist?” series, see “Growing your Own Data Scientists” on CITOResearch.com.)
As the leader of the Big Data development initiative at IBM, Anjul Bhambhri defines the overall product strategy, as well as leads the engineering team for delivering Big Data products. These products include both IBM InfoSphere BigInsights, and IBM InfoSphere Streams, which perform both historical and real-time data analysis. In addition, Bhambhri leads specialized customer-focused teams, who work with partners and customers in several vertical industries including finance, retail, energy, utilities, healthcare and telco. These teams provide critical aid to customers getting started in understanding and unleashing the power of the Big Data products, by defining proofs-of-concept, architecture and solutions.
No Silos or Ivory Towers
One of the biggest obstacles to analytical productivity at an organization is the fact that data are often locked in different lines of business as “silos,” and are not analyzed effectively—or analyzed at all. So, parking big data in a new repository to remove these silos is a good thing. However, in doing so, there is a risk of introducing a new “Big Data silo” if the new repository is not effectively connected to the rest of the business intelligence (BI) infrastructure of an organization.
Many organizations today solve the “data silos” problem by storing large volumes of decision support data in warehouses. To leverage Big Data analytics, organizations are being challenged to glean information from new data sources that are difficult to incorporate into an existing warehouse, says Bhambhri. “If you deal with this data, much of which is ‘noisy’ and unstructured, then you’re not really adding more value or more context to the information that you’re already storing,” she says. “The key to the Big Data approach is to be able to analyze all of this data, without moving it around, to gain better insights, and to be able to do it in near-real-time when necessary. The results of the analysis can enable a new class of applications for the enterprise, or can be used to enrich existing data warehouse or master-data implementations.”
Data Scientists should always have this in mind, in order to avoid creating new “silos” in the enterprise. “Ideally, every analyst would have all data in the company available to them, so they could analyze it and determine what would be of use in solving the problem at hand,” Bhambhri says. “For example, in the telco industry, we have seen the need to process and analyze billions of call data records a day for mediation, customer relationship management, etc. It would be cost–prohibitive to store the historical data in a traditional warehouse for trend analysis and deep data mining. In addition, many applications, such as fraud detection and billing reconciliation, will require real-time analysis at the point of arrival.”
IBM has been working with telco customers to expand their analytical capabilities in two ways. One the one hand, they’ve been building big-data systems that deploy connectors between traditional data warehouses and feeds for real-time transactional data. This can accommodate more real-time decision-making, while still correlating real-time data with historical data in traditional warehouses. On the other, they’ve been expanding the range of users to include not just data scientists in an “ivory tower,” but also enable data enthusiasts, business users who can get their hands dirty in the data using relatively familiar interfaces.
“We provide the data scientists the right set of tools so that they can explore the sources that they want to analyze, and ask questions they want to ask, so that they can focus on their core competency,” Bhambhri says. “They ask the questions, and the answers are given to them in user interfaces that they are familiar with, such as a spreadsheet . They can then interpret the results of those questions, and ask for more questions in an iterative way. So from our standpoint, we provide the platform and the tools to increase the analytic capabilities and the capacity for the business users to make use of all this data. We hope to increase the number of data scientists over time, through iterative cognition, and through rising up the ladder from data to information, without needing to understand every nuance of every analytical operation.”
Making Analytics Consumable
Part of the challenge of big data is making sure that data scientists and enthusiasts alike don’t have to spend hours creating new analytical models, or scouring huge datasets that are literally expanding by the second. The journey of making analytics consumable has begun, but is not complete, Bhambhri says. IBM is at work developing algorithms so that patterns in data can be detected, and analysts can be alerted if certain patterns occur. This serves a similar role to the time-tested scientific practice of sampling data, in that it makes it more digestible. But the key difference is, the algorithms poll all the data that has been collected, and make decisions about which patterns are meaningful, which can then be analyzed by humans.
At the University of Ontario, sensors monitoring the health of newborn babies return almost 1,000 readings per second. Each one-second span is typically compressed into a single composite reading every 30 to 60 minutes, comprising 1.8 million to 3.6 million individual readings per composite. If the reading appears normal, it is discarded after being stored for 72 hours. Under this approach, any telltale data that happens within each one-second interval might be lost. With the new technology IBM developed, however, those patterns can be discovered, first by applying machine learning techniques to historical data, then detected in real-time. When this happens, an alert is sent to the analyst. The end result is that the babies’ likelihood of developing infections was greatly reduced, without requiring analysts to individually scan millions of records looking for patterns, or risking missing patterns by viewing consolidated readings.
“Another example of machine learning on Big Data was demonstrated in Big Blue’s Watson computer, which has 4 TB of structured and unstructured data, including the entire content of Wikipedia, at its disposal. Big Blue was able to improve its Jeopardy! score from losing to a 12-year-old to beating the two reigning human champions, Bhambhri says. In case you’re concerned that there is no place for humans in data analysis any longer, fret not – at the University of Ontario, the machine learning was accomplished through several trial runs,in which humans pointed out the spots the algorithm missed. Once the gaps were identified, the tool could tirelessly focus on patterns that would be useful to the human scientists.
Data Scientist as Change Agent
Another way to keep data science out of the silo and at the forefront of the enterprise’s mind is to make sure that the organization is set up so that data scientists can truly bend the ear of C-level executives, Bhambhri says. We’ve explored the skill set of data scientists before in this series. Add “change agent” to the list of resume must-haves.
“You need some change agent in the company who can to really show the business decision-makers that if they are not transforming themselves to become data-driven decision makers, then they will lose out to their competition,” Bhambhri says, “If the business side is not convinced, then it’s difficult to just get the IT arm going. But if you get the business side convinced, and this data scientist or the change agent is really tied to a C-level executive in the company, I think that could really help them get started.”
Bhambhri has seen customer exoduses from companies that don’t take data seriously. The annual IBM Cyber Monday benchmark survey of 500 retailers revealed that the number of people who use a mobile device to visit a retail Web site jumped 11 percent between 2010 and 2011. The retailer who doesn’t provide a promotions campaign for mobile phones, or optimize its Web site so that it can be easily accessed from a mobile device, is missing out on that growth. The data scientist can (and should) play a key role in advocating for a dynamic, information-focused view on business growth, Bhambhri says.
Enterprises will need to cast a wide net for these individuals, and once they get them, they will need to be empowered.
“Organizations have to identify people from within who have a track record of breaking the status quo, and who are open to exploring new sources of information on a regular basis,” Bhambhri says. “And if they don’t have them within the organization, they need to bring them from outside the organization.”
Once found, these data scientists/change agents will need to be empowered to uncover value throughout the organization, and strongly encouraged to communicate their findings, or their missions will fail, Bhambhri warns.
The level of advocacy and business focus is one of the characteristics that separate the old idea of “statistician” or “analyst” from the emerging role of “data scientist,” Bhambhri says.
“It’s not so much that they are designing new systems, but are really championing these new sources of data,” he explains. “Of course, IT still has to build the system, but the new data scientists are the change agents who really help departments collaborate throughout the organization to create value.”Balancing Spending Across the Big Data Value Chain
If buying ever-larger data warehouses only provides a partial solution, the question remains, what is the right way to invest in building big data capabilities? Bhambhri has several recommendations.
One approach is to work with an established player that offers a mix of integrated capabilities and business partner solutions, so that the enterprise doesn’t waste resources stringing together multiple solutions, effectively becoming a systems integrator in its own right. IBM has integrated partnerships across the value chain, such as InfoSphere BigInsights for the Hadoop file system organization„Datameer for visualization and Karmasphere for application development.
“To get started, customers should not jump into an enterprise-wide big data deployment before they really know what data has useful information,” Bhambhri says. “It is part of a data scientist’s mission to understand the business needs and evaluate potential big-data solutions that can deliver return on investment to the business. So, through our customer engagements team, we work with customers to identify use cases and proofs of concept, where we identify the challenges to help them start on the journey. We take this journey with them so we can put capabilities in the product that will be useful to them. We’re not building the product in isolation, so that gives us ROI, and the customer is happy that they can see the value before they make the investment.” Even at the end of this proof period, the customer is under no obligation to make that investment with IBM, Bhambhri adds.
Through this approach, IBM helped a Danish windpower company build and use a Big Data solution to analyze weather data. The objective was to leverage data to deliver business value by improving optimal locations to deploy wind turbines. To achieve this, the company needed to run complex data mining models on a large volume of data, which took at least three weeks of processing time, even with a subset of the 2.8-petabyte data set.
Organizations ignore data at their peril, and there will only be more of it in the future. The technology to understand it is now within reach, and it should be exploited, Bhambhri says. But ultimately, the competitive difference will be made by the level of organizational influence exerted by the data scientist to make the information from the data actionable.
IBM arms robo-sysadmin QRadar with virus know-how • The Register
Information on an interesting, new security offering from IBM.
IBM is beefing up its enterprise security offerings by creating a security platform that is aware of real-time virus information, meaning that the system will be much quicker at recognising new threats.
Marketing its updated QRadar Security Intelligence Platform as a comprehensive security solution, IBM argue that the platform will protect companies much better than a bunch of piecemeal security patches. Systems patched that way have loopholes, warned Brendan Hannigan, general manager, IBM Security Systems.
“Trying to approach security with a piece-part approach simply doesn’t work,” Hannigan said. “By applying analytics and knowledge of the latest threats and helping integrate key security elements, IBM plans to deliver predictive insight and broader protection.”
The QRadar platform – designed by Q1 Labs and acquired by IBM last autumn – will have live information about viruses fed into it from 400 different sources. It will use that information to react more quickly and effectively to detect and quash bugs. The information feed is drawn from the IBM X-Force threat repository, which combs through over 13 billion security threats a day. According to Big Blue, it is the first time that X-Force‘s threat intelligence has been incorporated into a security intelligence solution.
Another key feature of the platform is additional data-crunching capacity – which will allow the monitoring and corroborating of suspicious activity across multiple different areas.
For example, the software will track activity for unusual changes:
With security intelligence, security teams can quickly determine whether access patterns exhibited by a given user are consistent with the user’s role and permissions within the organization.
And then using information from other areas, the system will be able to combine reports of threats. The statement explains:
With IBM Guardium Database Security integrated with the security intelligence platform, users can better correlate unauthorized or suspicious activity at the database layer – such as a database administrator accessing credit card tables during off-hours – with anomalous activity detected at the network layer, such as credit card records being sent to unfamiliar servers on the public Internet.
IBM’s QRadar Security Intelligence Platform will be available before the end of March 2012.
Where to Put Flash for Enterprise Performance? (IDEAS Insights)
IBM and EMC represent two different approaches to enterprise-class flash-based storage.
The approach to utilizing flash for high-performance, enterprise use cases is evolving. Compute-intensive servers have already been enlisted to enable high-performance applications like enterprise resource planning (ERP), customer relationship management (CRM), online transaction processing (OLTP) databases, and more recently virtual desktop infrastructure (VDI). But a high-performance storage architecture to match the high performance compute has not yet been decided on. Flash seems to be the core of the solution, but where to put this flash for optimal enterprise performance? Announcements this week from EMC and IBM suggest two approaches: one server-based, one array-based.
One of the original strengths of SAN-based storage arrays was that they could support high-performance transactional databases at higher capacities. To improve performance, faster and faster hard-disk drives (HDDs) were employed in the array: 7,200 rpm, 10,000 rpm, 15,000 rpm. Eventually, flash-based solid-state drives (SSDs) were introduced with the ability to provide up to a 300X increase in IOPS over HDDs. Although more expensive, the performance benefits of flash SSDs have become too compelling not to support. This week, IBM announced SSD support for its XIV enterprise storage array, joining its own DS8000 series and every major storage vendor in offering flash SSD support for their enterprise class arrays.
But using flash as a storage tier in an array does not fully utilize its potential for performance. PCIe-based flash deployed in the server can offer up to a 20X increase in IOPS over array-based flash. Fusion-io has capitalized on this market with its ioMemory platform of flash-based PCIe cards aimed at accelerating high-performance applications, databases, and VDI, independent from the storage array. And just this past week, EMC’s announcement of VFCache, its own server-based flash product (previously code-named Lightning), shows that EMC is also embracing this approach and wants a piece of this growing market.
There are some drawbacks to current server-based flash solutions. Their capacities are smaller, they do not scale well in the server, they are not cache coherent between servers, and they cannot provide the enterprise data integrity and data management capabilities that flash in a storage array can. So while some of the highest-performance use cases may be moving to flash on a server, these issues will prevent primary storage from leaving the SAN any time soon.
There are also several flash-based solutions between the primary disk array and the server that offer different tradeoffs. “All flash” arrays like those from Violin Memory and Texas Memory Systems (TMS) can provide higher performance than typical tiered flash in an array, addressing the capacity and data integrity gaps of server-based flash, but at a higher cost. “SAN proxy appliances” like GridIron’s TurboCharger offer a less-pricey, high performance, flash-based caching solution for the SAN that can easily be dropped into an existing architecture. “Server network flash appliances,” such as EMC’s code-name Thunder, promise to address the limited scalability and shareability of server-based flash while retaining its higher performance.
As flash prices decrease, capacities increase, and as data integrity, cache coherency, and scalability are addressed, flash will continue its slow march towards the application. But it is clear today that flash in the server is an optimal solution for smaller data sets and highest performance, while flash in the array should be leveraged for larger or more mission-critical data sets, and a number of solutions in between can improve performance while balancing enterprise priorities with cost.
Decoding Online Chatter: Using Twitter to Spill the Beans « A Smarter Planet Blog
Interesting article on how IBM and researchers at USC are developing ways to make sense of the twitterverse.
February seems to be a month of excitement for all movie, television and sports enthusiasts. It’s that time of year – Super Bowl madness and Oscar Buzz – frenzy so electric that it transcends worlds – into the social media world. Think about it, how long does it take for you to see a Tweet or Facebook post once you hear the winner for Best Motion Picture or following the first touch-down? Seconds?
Information flows so quickly that Twitter alone is handling approximately 35MB of data a second, every second. The majority of thissocial media data represents public ‘streams of consciousness’, data that approximates human thought and speech, what we in the business call unstructured data. But, as anyone who has filled in a tax form, booked a flight or applied for a loan knows: computers prefer data with structure, data fields that have entries in strictly controlled formats.
The good news is change is coming. Computers are becoming smarter about unstructured data (unstructured data isn’t just natural language … it’s photos, videos, emails, tweets, audio, sensor data, mobile device data). For example, using advanced analytics technologies and natural language processing we can now begin to understand the patterns behind human expression. Not just ‘key words’ that have been identified and indexed, but all words, as we type them or say them. We may have spent most of the computing age training humans to communicate with computers, using methods optimized for the machines, but today the reverse is happening. We’re now training computers to communicate with us and understand us in our own language. It is not easy. It is as the IBM Research team behind Watson declared, a Grand Challenge. But it’s a challenge that can lead to some very important and far-reaching results.
Watson represents a pinnacle achievement in Deep QA and natural language processing but there are many routes to the top and plenty of room for additional exploration and discovery. The team of researchers, students and faculty at the University of Southern California (USC) Annenberg Innovation Lab are taking a slightly different approach to the Grand Challenge. Rather than using the Answer Question formulation of Jeopardy!, they are applying IBM analytics software, and some very smart coding and modeling, to train computers to understand and analyze Tweets. The project is part of an ongoing collaboration between the lab and IBM to explore how technology can be used by organizations from news outlets and journalists to movie studios, broadcasters and retailers to better understand, respond, and predict public sentiment. To date, the model has been applied to film forecasting, the World Series and fashion retailing trends, in an effort to identify social media trends and better understand public opinions. For example, just last week IBM and USC analyzed millions of public tweets to determine the fans’ sentimental Super Bowl Quarterback favorite – Tom Brady or Eli Manning. Just like the game, Eli Manning in a late game-changing move, overtook Tom Brady as the Social Media MVP with 66% positive sentiment vs. Brady’s 61%.
But why stop at the World Series and Super Bowl? AIL and IBM are now collaborating with the Los Angeles Times to measure moviegoer sentiment toward the upcoming Academy Awards race. Dubbed a ‘Senti Meter’, we’re analyzing Oscar- related positive and negative opinions shared via millions of tweets to determine who will win “The People’s Oscars”. The project has been profiled by the Los Angeles Times and we can all follow the evolving sentiment for Best Actor, Best Actress and Best Picture categories over the next two weeks by visiting http://graphics.latimes.com/senti-meter/.
This project is much more than just analyzing which best picture or movie star fans are rooting for – it’s an example of how movie studios can better understand their audience preferences and use social media to improve their marketing programs and in turn improve box office results. There is no doubt that the Twitterverse and other social media platforms are changing communication as we know it. Tweets, Facebook and blog posts are becoming a vital resource for many organizations including the media industry to identify trends, inform reporting and understand as well as connect with their audience.
Think of how much change in the last year has been driven or expressed or reported in social media. Think how much social value could have been derived if we’d had the ability to understand and react to these social media conversations and sentiments – in context and in real time. We can now analyze the vast river of public data that streams from Twitter in its unstructured complexity, and apply a level of sentiment to the commentary. In other words the computer can now determine, with the certainty level of a non-native speaker, that the tweet it just analyzed expressed a positive or negative sentiment and how strongly that sentiment was stated – all in real-time. We can then apply this analysis to deliver business value – the effectiveness of marketing activities, customer responses to services, products and promotions, the impact of advertising, or the reaction to real world events… the list is limitless.
This new capability will eventually deliver solutions founded on semantic analysis of Big Data that are only just now being imagined. And it will happen faster than we expect. Stay tuned, there is more on the way….
IBM Booklets.
Folders from IBM’s data processing division. Illustration by Clarence Lee. From Graphis Annual, 1963/64.
A CTO’s take on cloud — Cloud Computing News
Great article on how the CTO of Capgemini is looking at recent developments in enterprise adoption of the cloud. Good stuff.
As Capgemini’s CTO for North America, Joe Coyle hears an awful lot about cloud computing. He hears it from customers that want to evaluate cloud solutions and from vendors that want to win that business. Capgemini, a $12 billion global systems integrator, has relationships with all the major vendors and many enterprise customers, so it’s interesting to hear what Coyle has to say about the current state of the market.
Here are my main takeaways from a recent conversation with him.
1: IBM is cloudier than you think.
Big Blue has a pretty potent set of cloud options but it’s going about its business very cleverly. Given it’s big-iron heritage, IBM rarely talks about the hardware component of its cloud portfolio, Coyle said.
“They’re attacking this from a software perspective. They’ve taken Tivoli and are building this software umbrella so that you can take whatever you’re running in your data center now and put all or part of it in a public or private cloud,” he noted. IBM’s 2010 acquisition of Cast Iron also give it a slick appliance that lets customers integrate in-house apps with SaaS applications running outside.
He doesn’t see IBM cloud penetrating a ton of new smaller businesses, but for many existing IBM shops — and there are a ton of them — IBM cloud is a no brainer.
2: Microsoft Azure has a tough row to hoe
Coyle is of two minds on Windows Azure, the platform-as-a-service (PaaS) underlying Microsoft’s cloud strategy.
“Azure’s been a bit of a disappointment,” he said. “When Microsoft briefed us on it years ago, all the national [systems integrators] were chomping at the bit. But then it stumbled.”
“Then the message was the software would only run on Azure. That’ s fine, but by that point, the world had moved on, companies were already using Amazon,” he said. The usual argument that Azure is a PaaS while Amazon Web Services (AWS) is Infrastructure-as-a-Service (IaaS) simply doesn’t matter to most customers. The big AWS draw is they know they can deploy their applications on AWS now and move them to another hosted or in-house data center, later.
On the plus side, the Azure technology is solid and, unlike previous Microsoft development technologies, forces developers to follow the rules — they can’t design software services that misbehave. ”Azure is extremely powerful and if [Microsoft] can get its act together people will try it,” Coyle said.
But overshadowing all that technical mastery is the perception of Azure as a closed platform — despite its multi-language support. Microsoft’s single biggest problem is customer suspicion that it will use Azure to lock them into the next wave of Microsoft technologies, essentially replacing the Windows/Office upgrade cycle.
“I’m not saying it’s true, but it’s what people think,” Coyle said.
3: Amazon is Amazon
Amazon Web Services are what they are: extremely flexible and leading the league in public cloud. AWS suffered a couple black eyes in 2011 with an embarrassing four-day outage in April and then a widespread reboot glitch later in the year.
Coyle is pretty forgiving of these miscues. The April outage, he said, was largely due to people implementing their work incorrectly, something that AWS tried to fix manually. There are things you can do now in AWS to prevent this stuff, to build in more reliability and redundancy, although users will have to pay for it, he said.
The bottom line? Glitches and all, Amazon is the incumbent public cloud power and will stay that way, he said.
4: OpenStack as big-time cloud disruptor
Coyle is also bullish on the OpenStack movement, which is building a standard cloud foundation out of open-source tools. Initiated by Rackspace and NASA, it’s achieved critical mass with nearly every IT provider — from Dell, to HP, to Cisco, to Citrix — aboard and Rackspace offloading management to a more neutral OpenStack Foundation.
“OpenStack will change the world of cloud computing. As a lot of smaller companies look to build their own clouds, this will be a natural choice,” Coyle said.
Who stands to lose if that’s the case? Ironically, the Dells and HPs of the world — all of which are building their own clouds. “Why do you think they joined?” His feeling is these hardware companies — many of which were building their own more vendor-specific clouds — are hedging their bets.
Will OpenStack affect Amazon? “No. Amazon is Amazon,” he said.
5: CIOs are getting over cloud phobia
It’s taken time, but the economics of cloud computing are too good for CIOs to ignore, Coyle said. Any doubts they had about moving at least some corporate data to an outside cloud storage provider, for instance, have evaporated in recent months.
And they’re getting emboldened to do more than storage. The advent of Hadoop and NoSQL technologies means that companies could actually get some use out of all that old stuff sitting on tape or in platters, he said. Uploading that information, and massaging it with the latest analytics means that historical data can be used to test assumptions and new models, for example, seeing what a price change means to sales over time.
Wringing real value out of old data is a pretty good proposition for most CIOs.
Innovation in the Cloud: Today’s Idea, Tomorrow’s Industry « A Smarter Planet Blog
Nice thinking from IBM’s Lauren States on the link between cloud computing and IBM’s work with autonomic systems in the early oughts.
Increasingly, companies will access innovations over the Internet– from the cloud. The power of the cloud model comes from the fact that IT complexity is masked from the people who use it. Plug into the cloud and you can get the resources you need on demand. Sounds easy, right?
Not so fast. A recent GigaOm article starts with the premise “Cloud is complex – deal with it.” Author James Urquhart (@jamesurquhart) goes on to describe while cloud is a great way to worry less about your infrastructure, the problems you are eliminating may create new challenges (and opportunities). Urquhart explains the cloud as a complex system, one made of piece parts that are all interdependent and adaptive. He used three lines that stood out for me in his explanation. “Think biology. Think economics. Think ecosystems.”
At IBM, if there’s one thing we do, it is think deeply about complex systems. And we have thought for a long time about adaptive, almost biological IT systems interconnected with our businesses needs. And that’s how we think about the cloud.
In fact, IBM has been working on this since our Research division launched its own grand challenge to create “self-healing, self-protecting, self-optimizing, and self-configuring” Autonomic Computing systems in 2002.
At the time, there was a certain amount of skepticism (see this USA Today piece). But the goal then is the same goal today with cloud computing – take the complexity out of IT management so that you can become more innovative as you focus on your business goals.
IBM’s Autonomic Computing push started as an idea on an IBM researcher’s white board. Through a process of applied science, testing and rigorous real-world proofing, that initial idea spawned much of the current thinking around IT systems today.
Much of what IBM created with autonomic computing was the precursor and is the foundation for today’s cloud systems. Everything from automatic provisioning of servers to sensors for security in the network were born from this effort.
This month, with the release of last year’s U.S. patent office data, IBM became the first company to be issued more than 6,000 patents in a single year. With a new single-year record of 6,180 patents, it marks the 19th consecutive year that IBM had earned more U.S. patents than any other company.
On its own, this is a remarkable testament of IBM’s continued dedication to advancing state of the art of technology. But underneath the number of patents themselves is what intellectual property can achieve when it is put to work. IBM scientists are focused on applying these innovations to solve the unique problems our clients face in every industry, from designing a better healthcare system to alleviating traffic congestion in cities.
In reviewing this year’s patent list, I can’t help but remember the early days of autonomic computing. Just as IBM Research innovations were predictive of a fundamental change in computing, I see the next advances in cloud networking (Patent # 8060878), cloud security (Patent # 7941706) and sensors in the cloud (Patent # 8041772) as having the ability to usher in a new era of computing for businesses around the world.
So while our researchers continue to push the boundaries of IT and reinvent the systems of the future, we have an opportunity to collaborate with our clients in exciting new ways that impact the future for all of us and help drive global competitiveness in the future.
After all, today’s thought could be tomorrow’s cloud.