Drive to Public Clouds Powered by Both Consumers and IT Managers (IDEAS Insights)
IDEA International shares some thoughts on cloud’s evolution. Nothing particularly new or groundbreaking, but a good summary of recent developments.
The farthest-reaching changes in the IT industry often occur when a single new development simultaneously responds to the needs of both consumers and business users. Perhaps the best-known example of this kind of wave was the original PC: office workers used PCs during the day for their business tasks and then used the machine when they got home to play games (and often vice versa). The Web was also adopted in equal measure by consumers and business users when it first broke into the mainstream, which was a key factor in its incredibly rapid rise. The same will happen with cloud computing, once the public cloud providers close the loop between business and consumer services.
In 2012, both consumers and large companies will step up their adoption of public clouds. From the standpoint of end users, public cloud computing will be seen as enabling greater mobility, gradually leading to “ubiquitous” computing in which they no longer have to be concerned about where their data is actually located. At the same time, the economics of public cloud computing will become sufficiently attractive to IT managers in enterprise environments that they will no longer be able to avoid considering it, at least for certain workloads. This kind of lockstep between consumers and business users will cause big changes across the IT industry.
Users who put their data in the cloud expect that they will be able to access the data on any device, from anyplace in the world. Because there is only one copy of the data (and hopefully a backup copy somewhere), users hope that they will no longer need to synchronize laptops with other devices like iPads and smartphones. In 2011, many consumers were subtly introduced to the convenience of cloud storage when Apple introduced automatic synching of data between iPhones, iPods, and other devices with its iCloud service (the capability was introduced transparently with an update of Apple’s iOS operating system). Since the iPad dominates the tablet market, and the iPhone is one of the most popular smartphone models, other tablet and smartphone providers will soon need to include similar capabilities to remain competitive. As a result, the huge base of consumers storing their music and photos in multitenant clouds will promote the acceptance of cloud storage from a theoretical capability to a real and useful service. The rise of cloud computing will eventually speed the convergence of “mobile” and “social” trends, in which data sharing between trusted parties will become the normal approach for exchanging information.
In datacenters, the economics of public cloud computing will become increasingly attractive to IT managers. Continuing concerns about potential security risks will prevent organizations from entrusting their most sensitive workloads to public clouds, but for many other workloads, the flexibility and potential cost benefits of cloud deployment will outweigh its risks. In 2012, the use of public clouds will go beyond early adopters and enter the mainstream for certain applications. As public clouds become part of standard IT operating procedures, some business issues with service providers will rise to the forefront. Customers will increasingly focus on issues such as service level agreements (SLAs) and portability between cloud services. Companies planning a cloud deployment will narrow their focus to providers who have the technical ability to deliver on SLAs and can provide security in the cloud. Vendor lock-in with cloud service providers will become a greater concern as customers grapple with the decision of whether to embrace proprietary solutions that deliver unique benefits, or more open solutions that may have limitations. Some cloud vendors will tout their relative openness and present vendor lock-in as a major reason customers should not buy from their competitors.
Throughout 2012, cloud services will become an increasingly big business as companies complete their trials and begin to roll out full-scale enterprise applications to the cloud. Amazon AWS will become the first billion-dollar cloud venture. Towards the end of 2012, the cloud business will begin to see a shakeout as larger, better-financed companies cherry-pick the best companies and push out the weaker start-ups. It will become much clearer by the end of the year which service providers can deliver for the long haul, and which can’t. On the user side, most deployments will be noneventful and successful, but some high-profile events will occur that highlight the problems that happen when cloud is not deployed correctly. To fully reap the benefits of cloud computing, IT workers will need to reassess their skills and go for training in new areas. In the meantime, companies that want to deploy a private cloud may have a difficult time finding IT workers with the right mix of skills to design, deploy, and manage a cloud.
Cloud is complex—deal with it — Cloud Computing News
Awesome article on the complexities of the cloud. I love the definition of the cloud as an application-centric operations model. Very smart.
If you are looking to cloud computing to simplify your IT environment, I’m afraid I have bad news for you.
Yeah, you might find yourself having to worry less about infrastructure, less about how storage systems work or what networking to use to connect a virtualized resource pool, or even what middleware settings are optimal for your applications. However, for every problem eliminated by choosing cloud, you’ll find it just creates more of the problems you remain accountable for—and may even create some new problems that you never had to face before.
Which is as it should be. Let me explain.
When I describe cloud computing as an application-centric operations model, one of the first questions that should come to mind is “operations of what, exactly?” Just because the cloud is focused on the application, it by no means implies that the application is all that is being operated. In fact, just as in any computing technology since the earliest electronic computers, the application can’t exist without myriad things supporting it.
And the world doesn’t consist of a single applications, but, in fact, millions of applications. Most of these are interconnected in some way, and the matrix of code, data, infrastructure, people, policies, requirements and so on that makes up modern IT is ultimately a very interconnected, complex system. Cloud computing is just one (very effective) way of dealing with that complexity.
Cloud as a complex system
What’s interesting is that it turns out science has a whole body of work around complex systems. A complex system, according to Wikipedia, is “a system composed of interconnected parts that as a whole exhibit one or more properties (behavior among the possible properties) not obvious from the properties of the individual parts.”
That’s certainly true of the modern interconnected IT environment. Just look at automated trading systems and the famous “flash crash” for an example—systems designed for increasing market returns reacted to each other in a way that temporarily crashed that very market. Other examples abound, and I’m sure your own IT environment often behaves in ways that no single application or other element was designed to do explicitly.
What science teaches us about complex systems is that they are made up of many individual agents, each of which effect and are affected by agents around them. The feedback loops of events created by agents affecting each other both directly and indirectly, combined with the mechanisms that choose behaviors to in response to those events, combine to create the systemic behavior that is so unpredictable.
Cloud as an adaptive system
The thing is, however, a certain class of complex systems, complex adaptive systems, have the additional trait that they can change their behavior in response to the success or failure of previous behaviors when a given event occurs—or when a certain series of events occurs. This ability to “learn” and adapt to the surrounding system environment creates amazing outcomes, including many of the most rich, enduring and powerful systems in our universe.
Think biology. Think economics. Think ecosystems.
IT is adaptive, in that winning functionality survives and thrives, while losing functionality dies out and disappears. Thus, those investing in building IT technologies are constantly seeking ways for their technology to survive in a changing, often hostile environment.
If an application, or function or even just a line of code fails to add value to the environment—or worse, negatively disrupts the value of the environment—it will be removed or changed, one way or another. Those that rely on IT are constantly seeking ways to optimize applications, data and technologies to take the most advantage of their systems environments.
The result is constant innovation, and constant adjustment to our needs as businesses and individuals. It ain’t always pretty, as they say, but so far it has been quite effective. (I should note that this even applies to infrequently modified “legacy” applications; there is an ongoing decision to not modify such an application, and thus it continues to survive.)
The developer as DNA
I want to leave you with one last thought, however. One of the things about complex adaptive systems is the learning or adapting traits of the agents in the system. In the world of evolution, the main agent of learning or change is DNA. In the world of IT, the agent of learning or change is the engineer or software developer.
If something goes wrong with an application, developers are on the hook to fix it, change it or kill it. If existing hardware fails to create new opportunities to innovate, engineers find new approaches to introduce into the ecosystem to shake things up.
However, developers and engineers can only make those changes one, or a few, components at a time. Nobody can configure the “system” to work an expected way. All you can do is constantly monitor the success and effectiveness of the technologies you deploy into the cloud, and constantly tweak them to make them as useful as they can be in that environment.
It’s up to people to make technologies that survive cloud as a complex system—one component at a time. That’s, well, how you deal with it.
InfoQ: Everything Is PaaSible
Fantastic article about PaaS - and more. Worth it alone for the opening in which Vambenepe discusses the headaches that enterprise architects face in building out applications and services. I love his description of middleware and the choices that enterprise architects need to make.
I’m not much of a cook. Of the many errors I commit in the kitchen, the most common is a failure to use the right tool for the job. Not because I don’t have it, or I don’t know how to use it, but because I over-optimize for the post-cooking cleanup at the expense of the cooking experience. The saucepan used to boil the potatoes isn’t the best tool for a sauté, but it’s already out and it will do. The wooden spoon used to stir-fry vegetables doesn’t make as good a serving instrument as the metal ladle, but it’s already dirty so let’s use it. Sometimes this make-do attitude just produces inconvenience, sometimes it leads to disaster (“I don’t need to dirty a colander; I can just hold the lid against the pan and slowly pour… Oops! OK kids, we’re eating pasta off the sink tonight.”)
What is true for cooking is true for software development. Using the right tool for the job means higher productivity and a more robust, efficient implementation. But, just like cooking, there are many tools in the drawer and each one that you take out comes with its own cost.
Enterprise applications are made of more than just code. The selection of the platforms they are built on is as important as their specific code and configuration. That’s why “middleware” is a much broader category than just “application servers”, and even “middleware” doesn’t capture all of various runtimes that support enterprise applications. Just looking at the list of application-related “Magic Quadrants” offered by Gartner gives an idea of the diversity of these product categories and the difficulty of the task, for IT architects, of deciding which to use: Just for the application runtime, there are Magic Quadrants for “Enterprise Application Servers “, “Horizontal Portals”, “Mobile Enterprise Application Platforms”, “Ajax Technologies and RIA Platforms”, “Application Delivery Controllers”, “Application Infrastructure for Systematic SOA-Style Application Projects”, etc. Add to this data-related platforms (relational databases, distributed databases of various forms, master data management), as well as messaging infrastructure, security, identity management infrastructure, application performance management tools and soon enough one has to decide whether to optimize for employing the right tool for every task or optimize for assembling an easy-to-manage set of tools.
Each new tool, when selected, brings additional features that the application can take advantage of, but also a new set of administrative tasks, another platform the administrators need to be trained on, another set of operating system requirements, additional license and support costs, another support channel, etc. That’s when people start wondering whether they really need a portal or can they make do with the more basic UI reuse features of the application server. Integrated product suites from portfolio vendors alleviate many of these issues (tested integrations, centralized management, unified support…) but not all. Enterprise application platforms, like Java EE, provide a large set of features, but in many cases they provide a portable interface to infrastructure tools that are not themselves included in the platform runtime (e.g. a database). It significantly lowers the barrier to using these external tools, but from an operations perspective the cost of yet-another-tool-to-manage remains.
The next leap in making it practical to always use the right tool for the job will come with PaaS.
Today’s PaaS offerings are, with few exceptions, focused on the basic building blocks of application platforms: an application runtime (“app server”), a database, and an Identity Management (IDM) store. For now, the flexibility of the various offerings has been most often measured in terms of number of programming languages supported (Java, Ruby, Python, PHP, JavaScript…). But PaaS changes so quickly (compare today’s PaaS landscape to what it was a year ago) that the current landscape is almost irrelevant already.
Soon we will realize that supporting yet another language that offers the same interaction style is not a very important measure of diversity. Diversity will instead be measured in the number, richness and comprehensiveness of value added platform services offered. From map-reduce services to business process orchestration.
In the same way that many application runtime tools are under-utilized because of the operational cost of setting them up, configuring them and maintaining them, many application management tools are also under-utilized, for much of the same reasons. So system administrators and developers scroll through various log files in scenarios where transaction tracing tools might provide a much more direct answer. They add crude browser-side instrumentation in places where network-based traffic capture can provide a rich view of the user experience and permit replay of transactions. They manually generate test transactions in places where the right tools can capture actual customer traffic, scramble confidential information and deliver a usable and realistic set of test input and output payload.
All these “right tools” exist today and are sometimes used in traditional data centers, but not as often and as consistently as they should be. Because their acquisition and operation cost aren’t perceived (rightly or wrongly) to be worth the value they can deliver. Because trying them out (in a realistic way, applied to your application) is, in itself, a time-consuming effort.
As of today, in most cases, in a PaaS environment one cannot use these advanced tools, because the environment is too restrictive. Today, advanced runtime services and advanced management services are used occasionally in traditional environments (to some extent including IaaS) and never in PaaS environment.
As PaaS matures, this trend will reverse and PaaS environments will be those in which users (almost) HAVE to use the right tool; where they’ve lost all the excuses (both of the valid and the imagined kind) for not using (or at least trying out) the full richness of runtime and management tools. That’s because the underlying Cloud operating system is built from the ground up to host these various platform services and present them in a unified way not just to the application running on top but also to the administrators in charge of maintaining them.
This change is even more profound than it seems. Getting transaction tracing, user experience monitoring, transaction capture, auto-scaling and all kinds of advanced platform features by default (or by just checking a checkbox) is a big improvement. But the move to PaaS will hand to developers and architects, tools that aren’t even on the table today. Here are 3 examples:
Example 1: There is no reserved domain
In traditional settings, the application is confined to what happens on the computers on which it runs. There is a lot more infrastructure than just computers in a datacenter, but all that is the domain of IT administration and out of reach for programmers. The best they can do is document what network topology and load balancing configuration is desired. In PaaS, the entire infrastructure is accessible, and when that happens, you can count on developers using it in ways that will horrify traditional IT administrators. The CDN is not just an after-the-fact deployment optimization; it becomes a core part of the application logic. Even DNS, old, boring, sacrosanct, must-never-go-down DNS, becomes another programmable entity. And it is used in completely new ways as a result, as illustrated by people currently experimenting with giving almost everything a CNAME and gaining complete location transparency.
Example 2: Business is code
The mantra of “aligning IT and business” is frequently heard. Nothing is wrong with it. But PaaS gives you not only the tools to align them but also to merge them in many places. Fine-grained metering and billing, programmatically driven usage of pay-on-demand resources put a significant part of your operating costs under direct and explicit control of your code. If you want to lower your computing costs by 10% next month, you can make a configuration change that will have exactly that effect, in the same way that you can change any other application parameter. Of course, this stricter constraint might affect the performance of your application in some way, there’s no free lunch, but the point is that the monthly cost is not a guessing game; it’s much more precisely controlled… What’s true on the cost side may also be true on the revenue side. At the risk of stretching the definition of PaaS, an app store is a PaaS service, in the same way that an application hosting service is. And the line will blur. I would be very surprised if, as I type this, someone at Amazon is not working on better integrating their app store with their AWS platform, so that you deploy your app (its server side logic and its client side piece) to their infrastructure, the client side gets uploaded, via the app store, onto the user’s Android/Kindle device, the server side runs on AWS and the AWS bill is paid by the app sales (hopefully with some leftover for you). And the charging model goes from one based on just app purchase to one based on consumption, as measured by the platform which hosts the server side.
Example 3: Everything is a platform
It is my contention that the “Platform” part of PaaS refers to anything that can be programmed. An application server obviously meets this definition, but there are many other things that can be programmed. Not a day goes by that something gets an API that didn’t use to have one. This opens the door to PaaS including many different kinds of hardware. It’s not just traditional computers. You can get GPUs and supercomputers (both of which Amazon offers today). One day you’ll get to use observation or communication satellites by the hours via an API. Or maybe wireless spectrum. You can already access logistical services (a warehouse, transportation) that way. The line will blur. Ultimately, PaaS is not just raising the IaaS abstraction level by pushing the app server into the Cloud offering. It’s about making all the useful resources programmatically accessible.
For a long time, the basic building blocks of IT have been simple: computers, storage and networking. It was the age of computer-centric IT. Everything else was the responsibility of the system administrator and out of reach for the programmer. IaaS made these resources available in a more flexible manner, but didn’t change the nature of computing. In its first iteration, PaaS doesn’t change the nature either, it just looks at the way people typically use these infrastructure pieces (install a database on them, install an application platform on them, etc…) and offloads that responsibility from the application owner. That has the important benefit of lowering the barriers to using the right platform tool for whatever task the application is accomplishing. But that’s just the beginning of PaaS. PaaS is about to multiply the variety of IT building blocks, offering to the application owner access to resources that would be very hard, or impossible, for them to compose based on the traditional resources of networks computers and storage.
In the early age of PaaS (e.g. when Google App Engine entered the scene), you wrote applications differently for PaaS because you had to. The idiosyncrasies of PaaS were mostly driven by the need to make its delivery easy and cheap for service providers. We’ll grow out of this. But we’ll do more than outgrow it by removing these constraints. We’ll transcend them. The ultimate goal of PaaS is not to get rid of the early PaaS limitation and to allow developers to do things in the way they are used to. It is to make developers write applications differently not because they *have* to but because they *want* to; because the platform services offered by PaaS are better than what developers had at their disposal before it. They’ll come with little incremental operational cost; they’ll provide access to a much wider range of services than those traditionally offered by an application server; they will provide direct control of cost and revenue parameters as part of the application logic.
The transition from machine-centric application design to PaaS is of the same magnitude as the transition from chemistry of simple molecules to one in which amino acids became available as building blocks. Applications are about to come to life.
IBM is Jumping Into Network Virtualization with VMready Switches (IDEAS Insights)
A good overview of how virtualization complicates traditional networking approaches - and how IBM is addressing the issue.
As virtualized servers become more prevalent in the data center, networking components such as switches must adapt to become more aware of virtualization. Most switches were originally designed for physical networks, in which LAN configurations were more or less static. When a new node was added or an existing node was moved to a new subnet, it often required a network administrator to make manual changes to the network configuration to ensure that requirements such as SLAs and security are maintained. As a result, changes to the network topology needed to be carefully planned in advance. With the rise of virtual infrastructure, in which virtual machines (VMs) migrate frequently from one host to another, it becomes nearly impossible for network administrators to keep up by making manual changes. IBM is now directly addressing this problem with the VMready technology that it acquired with Blade Network Technologies.
Server virtualization makes network management more complex, because VMs are hidden from the physical network switches by the hypervisor. VMs forward packets from the virtual network interface controller (vNICs) in the hypervisor to a virtual switch (vSwitch), and then to a physical NIC that forwards the packets on to the physical switch, which can only see the physical NIC. This setup works adequately until a VM needs to be migrated across multiple subnets to a new host. Although such a move is possible, it is not without issues and a network administrator must carefully plan out the move in advance so that the network policies associated with that VM, such as ACLs, QoS, and VLANs, follow the VM to the new switch.
To solve this problem, developers of server and network technologies have come up with various solutions to help networks bridge the physical and virtual domains in a way that allows them to adapt quickly and easily to changes on either side. IBM is now starting to promote one such technology, called VMready, which it obtained with the acquisition of Blade Network Technologies in 2010. Switches that use VMready allow VMs to be managed at the level of virtual ports in addition to physical ports. VMready switches sniff the network traffic between the hypervisors and management tools in order to determine the location and identification of each VM (alternately, the switch can send a query to a VM management console to determine the network attributes of VMs, such as UUIDs, MAC addresses, IP addresses, and VM names).
VMready management tools store the relevant data about all of the VMs they discover in a central policy database that contains information on every VM in the datacenter. For customers who need to set up the network in advance, a network administrator can populate the central VMready database manually. Once the database is populated, VMready allows VMs to be moved to any VMready switch on the network, and the defined policy profiles will follow the VM.
One of the nice features of VMready is the ability to create groups of VMs based on common Layer 2 or Layer 3 networking policies. For instance, VMs hosting databases can be assigned a higher Quality of Service (QoS) policy with Layer 2 redundancies, and web servers can be assigned a lower QoS without these redundancies. Once these groups are set up, detailed profiles can be assigned to each group, and these profiles will follow a VM wherever it may go on the network. When new VMs are created, they can be added to a group in which the relevant attributes will automatically be assigned to the VMs.
Standards for VM-aware networking are now being introduced, including Edge Virtual Bridging (802.1Qbg), which IBM helped to develop and champion. IBM’s VMready 4.0 incorporates EVB, and, unlike some competing approaches, IBM’s VMready can be used with hypervisors from multiple vendors, and VMready switches do not require a separate management station or any special software to be installed on the hypervisor. Essentially, they function like normal switches, but at the virtual port level. IDEAS feels VMready will thus appeal to network administrators who have to manage increasingly virtualized datacenters, but also value compatibility with their existing physical infrastructure and heterogeneous virtualization software.
IBM rises to the optimisation challenge • The Register
Doug Brown from IBM talks to the Register about the tenets of Smarter Computing. Interesting to see this translated into an article.
In computing, it sometimes pays to specialise. Generic systems will handle most computational needs, but they may not excel at them.
For larger companies, honing systems to handle specialised tasks involving large amounts of data could help to make data centres more efficient. This is what workload optimisation is for.
Transaction processing is not the same as business analytics, for example. Transactions may be processed independently of each other and pushed through in a queue.
Conversely, some analytics work requires results from one set of analytics to be used by another. And when that starts happening in real time, the computational demands become very specific.
Superhuman powers
That is why Watson, the supercomputer that beat two human opponents in a game of Jeopardy earlier this year, was a workload optimised system, says Doug Brown, vice-president of global marketing at IBM.
“There may be analytics that require real-time calculations in the middle of a transaction flow, in which case having an aggregated database close to the processing, like IBM’s System z provides, might be best. So it varies even within analytics,” he says.
One of three pillars supporting IBM’s Smarter Computing strategy, workload optimisation involves tweaking hardware and software stacks to suit a particular task. IBM claims that operational costs can fall by as much as 55 per cent when using optimised computing solutions.
The second pillar of Smarter Computing focuses on using every aspect of increasingly large data volumes for analytic purposes. In a world where large amounts of data come from an increasingly wide array of sources, this becomes ever more important.
Finally, the strategy’s cloud component emphasises the role of cloud-based technologies in helping to optimise IT service delivery.
Stay flexible
The workload optimisation component of Smarter Computing focuses on three sub-elements: hardware, software and domain knowledge. In many cases, the hardware and software components can be largely generic.
It is the domain knowledge, for example, that gave Watson the smarts to process language in real time and question a huge database of information to come up with the answers that beat its human opponents.
Workload optimisation can be implemented in a variety of ways. Customers can buy and configure their own systems, which requires a significant amount of IT expertise, time and effort.
They can purchase an appliance, customised for a specific task. Or they can buy a system that has been pre-integrated by IBM to support specific computing tasks.
The third approach offers the best of both worlds. It provides customisation opportunities while also allowing customers design flexibility. The pre-integration involves collaboration between hardware and software teams.
For example, IBM’s DB2 team might optimise the database for use on a clustered set of Power 7 servers. This might enable the customer to get more performance from WebSphere and DB2 using a specific hardware footprint.
Hardware plays an important role in workload optimisation, and Watson was based on a collection of Power7-based clusters. This eight-core, 32-thread processor is designed to be optimised for a variety of different workloads with the addition of specific software solutions.
“Watson used extreme analytics, and massive threading capabilities from the Power 750 clusters,” Brown says.
“That capability with the clusters created a lot of simultaneous calculations. We combined those hardware capabilities with the software and the domain knowledge.”
A potential downside of vendor-integrated analytics systems is the price premium. They tend to ship with rich sets of features, some of which customers may not use.
However, Brown argues that the cost savings in reduced complexity and configuration cut some of the operational expenses, offsetting part of the capital expenditure.
IBM also offers a service component to the sale, maintaining and tweaking the systems as customers’ needs evolve.
Made to measure
In any case, there are different systems to fit different needs. You won’t see a single page outlining IBM’s workload optimised offerings because workload optimisation permeates a lot of what it does.
For example, its Smart Analytics systems are optimised to provide businesses with analytics capabilities. Its CloudBurst series of cloud computing servers, available on both its System X and Power platforms, is another optimised offering.
The vendor also offers systems such as the Netezza analytics appliances, designed to help companies breach the entry point into business analytics systems.
As the world moves towards more efficient computing infrastructures workload optimisation will continue to gain traction among larger companies. Smaller businesses with fewer performance requirements and space-to-power constraints are likely to want more generic systems.
Server virtualisation: How to pick the right model • The Register
A fantastic, in-depth discussion of different virtualization methodologies from The Register.
Virtualisation has become an over-used buzzword.
On mainframes, it has been around for ages. Its introduction to x86 took a concept formerly reserved for Big Tech and let it loose among the masses.
Once a straightforward technology with a limited number of implementation models, virtualisation has been bootstrapped and shoehorned into every crevice of IT imaginable. Even smartphones are getting the treatment.
New capabilities do nothing for refuseniks who eschew the use of virtualisation. Some feel the need to evangelise this choice, while others loudly proclaim the “one true way” to use the technologies involved.
Direct-attached virtualisation versus distributed models is a common ideological battleground.
Direct-attached virtualisation is simple. A server with local storage hosts several virtual machines. These use the virtual switch (vSwitch) provided by the hypervisor to communicate with each other without having to send packets across the network interface card (NIC) and thus out to the rest of the network.
Talk among yourselves
Typically, virtual machines hosted in a direct-attached scenario are capable of communication with servers and clients located outside the host system, but most of the communication occurs among virtual machines residing on that one system.
Distributed virtualisation is very different. The host server is treated as much as possible as an entirely disposable processing unit. Storage is centralised and delivered to multiple hosts over a storage area network (SAN) with communication between virtual machines offloaded to physical switches.
Each model has its quirks.
Direct-attached virtualisation is fast. The maximum theoretical speed that a 10Gb NIC (a standard interface for modern SANs) could provide information is 1280 megabytes per second (MBps). A fairly common PCIe 8x 2.0 RAID card can theoretically provide up to 4000MBps.
Real-world numbers are not so clean. I’ve only ever got a 10Gb network attached storage up to 900MBps, and the best I’ve wrung out of my RAID cards (SSD RAID 10) is 2200MBps. But 2200MBps beats the pants off 900MBps, and handily demonstrates the storage speed advantage that the direct attached model can deliver.
Networking tells a similar tale. A hypervisor’s vSwitch provides each virtual machine with a virtual 10Gb NIC. This allows all the virtual machines located on a single host to chat among themselves at 10Gb, or faster if you feel like attaching multiple virtual NICs to a given virtual machine.
Tight squeeze
When heading off-host to the rest of the network, these virtual machines need to fight for the limited bandwidth provided by the hardware available. Having 30 virtual machines talking merrily away at 10Gb each is a completely different experience from asking those same 30 virtual machines to squeeze through a single 10Gb network card – and back again – to have networking processed by a physical switch.
Were we to consider only the numbers presented so far, distributed virtualisation would seem insane. But it has its advantages, and for many they are worth the cost.
What direct-attached virtualisation can’t do is rapidly move a virtual machine from one host to another. Virtual machines can be quite large, and moving the entire thing across a network can take a long time.
This is not an issue with distributed virtualisation’s centralised storage model. Distributed virtualisation also allows for live migration of running virtual machines between hosts.
High availability is another key selling point for distributed virtualisation.
Direct-attached virtualisation relies on robust, fault-tolerant virtual hosts for high availability. Distributed virtualisation senses when a host has failed and restarts all its virtual machines on other hosts in the cluster. The more hosts you have in play, the more the distributed model makes sense.
I can see you
Another benefit is that despite the speed bottlenecks, forcing all traffic through a physical switch gives network administrators visibility and manageability.
Enterprise-class networks run networking gear with tools providing end-to-end management straight down to the very last port. They can offer encryption between links, traffic isolation, monitoring, quality of service and a bingo card of other tick-box features.
All of that goes away the instant a vSwitch is brought into play. vSwitches don’t speak the same management language as the physical network providers. Instead of being able to control every packet to every system on the network, the closest you can get when using a vSwitch is control to and from the host servers.
Blurred outlines
Until recently, these two models were all we had. You picked the features that were more important to you and lived with your choices. This is unsatisfactory and in the grand IT tradition of nothing ever remaining sacred for long, hybrid virtualisation models have started to appear.
A new generation of NICs is starting to blur the lines, employing leading-edge standards such as 802.1Qbg, also known as Edge Virtual Bridging or Virtual Ethernet Port Aggregation (VEPA).
VEPA NICs are switches in their own right. When in use, virtual machines on a host bypass the vSwitch and talk directly to the switch integrated into the NIC. The NIC can talk to the management software, and now we have all the advantages of distributed networking without the bottleneck caused by having to send all virtual machine traffic out to the physical switch.
The competing approach to VEPA is 802.1Qbh, also known as Bridge Port Extension or VN-Tag. It is backed almost exclusively by Cisco, and requires an extension to the Ethernet specification, thus lots of new hardware.
This is a stark contrast to VEPA, which doesn’t require you to rip up and replace your network estate, and yet provides a viable solution to end-port management issues in virtual environments.
Configurations making use of both direct-attached storage and distributed storage in a single host are also beginning to appear. I have recently finished a deployment in which all hosts have a large amount of local storage to facilitate backups.
Each host has a virtual backup appliance (VBA) that takes live image-based backups of the virtual machines assigned to that host and stores them on the local buffer drive. This makes for very fast backups.
A central VBA reads the backups from all hosts and writes them out to tape during the day. The tape drive is mapped directly through from the host to the VBA rather than being a network-attached device.
This hybrid approach was not found in a whitepaper but born out of the necessity to make the best use of existing equipment. It has worked so well that, with refinements, I will re-use it in future deployments.
Perpetual movement
The continual introduction of new technologies into the mix will ensure that no virtualisation model stays static for long. IOMMU is the latest greatest, and promises to allow individual virtual machines direct access to system devices such as graphics cards.
Virtual machines will have the ability to tap into the full power of GPGPU computing, and will need to be fed data far faster than distributed technologies such as fibre channel can provide.
Advances in fault-tolerant hardware promise to make the individual host more reliable while new networking technologies push to 40Gb, 100Gb and beyond.
We have come full circle. Virtualisation started on the mainframe, and virtualisation is driving x86 to adopt technologies that bring it closer behaving like a mainframe.
Regardless of the similarities, there remains a fundamental difference between a mainframe and a cluster of x86 virtual hosts.
The mainframe is designed to be a single entity. Rack after rack, node after node, everything from the operating system to the interconnects binding individual nodes together, treats the mainframe as a single gigantic computer that is then sliced up for individual tasks.
An x86 virtual cluster is very different. Whether direct attached, distributed or hybrid, each processing node is very much a distinct unit. Each node matters: it must be configured, licensed and designed separately as well as with consideration to the whole.
A mainframe is an expensive computer that you custom-design software for: a high-performance system worth high-quality development. The x86 virtual cluster is a collection of cheap systems that you wrap around existing software.
A mainframe is what you build when you are running a financial system where milliseconds of latency can mean millions of dollars. It shines when you feed it applications that can break work down into small chunks and lots of small tasks in parallel.
x86 virtualisation, on the other hand, is a kludge.
It is our way of compensating for the fact that we are dragging around decades worth of software that is remarkably single-threaded, not very environmentally aware, and which needs to be insulated from other programs running on the same system.
x86 virtualisation models will continue to evolve because of this need to accommodate the sheer diversity of x86 applications.
There are many options available today to accomplish large amounts of computing efficiently. You can buy a mainframe or maintain a fleet of x86 systems with applications installed on the bare metal.
You can venture into x86 virtualisation and explore all the myriad different possibilities it presents. You could even lash together a few thousand cell phones into an incredibly awkward Beowulf cluster if you so chose.
There is no “one true way” to get the job done. The needs of your software and the capabilities of the hardware available to you will determine the implementation paths you can choose.
Arrays take on servers in storage smackdown • The Register
A really good piece on how server arrays work - and the battle that’s taking place to reshape how they work in fundamental ways.
There is a battle going on behind the scenes over the location of storage’s soul: the controller hardware and software. Oracle, Dell, EMC and VMware want it to be in the server, while NetApp and HDS want it to be in the array, an array operating with servers but distinct from them.
The picture is not as clear-cut as this on the surface – NetApp is working with Oracle for example – but this is my take on what is happening down in the development depths, among the strategists and engineers with multi-year product horizons.
The modern storage industry, the one shipping networked external storage arrays, has been built on two foundations. One is EMC’s establishment of a market for third-party external, block-addressed storage arrays distinct from the server suppliers of the time: HP, IBM, Digital Equipment, etc.
The other was the invention and establishment of file-addressed network-attached storage (NAS or filers). NetApp is the single most effective proponent of that, although EMC grew to ship more filers than NetApp. EMC and NetApp represent the twin peaks of the external storage array.
A storage array comes in two flavours. It is either monolithic, with multiple controllers or engines and some fancy interconnect hardware to link these to the storage shelves – think Symmetrix, latterly VMAX – or modular. Modular arrays have two controllers linked – by simpler Fibre Channel or latterly SAS – to the storage shelves. NetApp’s FAS arrays and EMC’s CLARiiON are classic embodiments of this idea.
Applications in servers sent SCSI block requests or file access requests to these arrays, which presented themselves, logically, as a single pool of storage, separated into dedicated logical disks (LUNs) for the server apps, or sharable filestores.
This long-lived storage concept is now being discarded, and the first nail in its coffin came from Sun and the inventive Mr Andy Bechtolsheim.
Honeycomb upsets the storage hive
Bechtolsheim’s idea was that co-locating servers and storage in the same overall enclosure would speed server apps dependent on lots of stored data. Thumper, a server-rich NAS device delivered as the X4500, was one result of this and Honeycomb another.
Neither set the world on fire but they did show the way to getting more data into servers faster. Then Oracle bought Sun in 2009 and suddenly Bechtolsheim’s idea got a rocket boost from the Exadata product, a set of server resources running Oracle software with their own storage resources. This is setting the Oracle World on fire, with much encouragement from Oracle marketing because its own bunch of modular arrays was pretty second-rate.
What Sun invented and Oracle extended is the NoSAN server. EMC has seen this idea and responded by devising an opposite of this, the No-Server SAN, a kind of reverse engineering in its way.
EMC brings the servers to the array
EMC is trying to have it both ways. VMAX, VNX and Isilon arrays are going to be able to run application software in server engines in the array controller complex. There is a natural fit with VMware’s ESX running the whole shebang and VMs being loaded to run storage controller software and applications that benefit from low-latency access to buckets of data. These array-located app servers use the array’s own internal network or fabric, VMAX’s Virtual Matrix for example, instead of the normal Ethernet or Fibre Channel fabric. This isn’t SAN access as we know it.
EMC also has its Project Lightning to have its arrays manage the loading and running of flash caches in servers, PCIe-connected flash. That’s a step on the road along which Dell appears to be further along. The Round Rock company is also going to build servers with flash, but this is a storage tier and not cache. This tier zero storage is logically part of the entire array controller-managed storage pool with automatic data movement.
Now EMC may well have this in mind as well, with FAST VP shipping data to and from the server flash which is then not really a cache but tier zero too. However Dell’s vision, as I understand it, is to move the storage into the servers and hook it up to the same PCIe gen 3 bus that the flash and server’s DRAM hook into. Once again this means that the servers will not use traditional external storage links to access data. This again is not a SAN. What do these NoSAN ideas mean for external array vendors?
Storage array vendor conundrum
The mainstream external storage array vendors are facing two, not one, strategic challenges to their long-term hegemony. The nearer term one is the introduction of networked flash-only arrays from companies like Huawei, Nimbus, Pure Storage and others, which threaten to cream off their profitable fast array business based on 15,000rpm Fibre Channel disk drives. All primary data storage could migrate to flash-only arrays, leaving the EMC, Dell, HDS, HP, IBM and NetApp-style vendors to store the bulk data crumbs, the low-access rate, lower profit, online data repositories.
The second threat though is the more important one because, after all, a networked external flash storage array is still a networked external storage array. Co-locating servers and storage with no traditional storage networking protocol connecting them is different. If servers absorb external storage, as is the Oracle Exadata plan, or external storage absorbs servers, as is the import of Dell and EMC’s plans, then networked external storage is no longer needed: we have the NoSAN server.
There’s a distinction to be made between converged systems with co-located servers and storage using storage networking protocols and converged systems that do not, the NoSAN servers.
A VCE Vblock, an HP VS3 and a NetApp Flexpod set-up all use networked external storage. An app-running VMAX has no storage networking protocol needed to link servers (app-executing VMAX engines) and storage (storage controller app-executing VMAX engines) as they all hook into the VMAX internal fabric. This is fundamentally an entirely different beast. For a start the server-storage link is proprietary and not open. You wouldn’t be able to substitute a NetApp, HP or IBM array for the VMAX storage part of VMAX app-executing system. It would be like trying to put a BMW engine in a Mercedes car. Good luck to you and don’t expect any sympathy for the grief heading your way.
Criticise Fibre Channel, FCoE and iSCSI all you like but they are open systems allowing you to switch suppliers of servers and storage either side of the storage networking link.
FCoE becomes irrelevant
Discussions about whether to move to Fibre Channel over Ethernet (FCoE) from Fibre Channel will be rendered irrelevant by co-located servers and storage communicating across some kind of I/O backplane or mesh network, or servers and storage talking across PCIe. FCoE is a transition from one instantiation of a legacy storage-server connection protocol to another.
The future may be not to have a storage networking protocol at all. Data will ship between server memory and flash to co-located array storage across PCIe buss links or an array’s internal fabric.
What this means for stand-alone storage array products is that a razor thin end of a wedge is appearing in the door through which they sell their arrays. This wedge can be expected to fatten. If it does, then the external array business will be impacted. It may be that data growth will be such that array sales increase while NoSAN server storage sales grow as well. But there is a long-term threat here, looking at a five-year or 10-year timescale, in which the external array business declines.
Its only hope of rescue will be through the provision of external array links that match the latency and bandwidth of third-gen PCIe, or internal array fabrics such as is found inside VMAX.
A strategic battle has been joined and the existence of the external storage array business is being questioned. Yes, it’s a razor thin wedge that has been slipped into the external array door, but wedges do what wedges do: they become thicker. If Dell and EMC pull their NoSAN server ideas off, then a world of hurt awaits external array vendors that have no effective response.
Storage is ending in tiers • The Register
Good explanation of storage tiering - why companies do it, how they do it, and the advantages it bestows.
One basic rule of storage is don’t keep items you need to access quickly in hard-to-get-at places: that’s like keeping instant coffee in a safe. You suit the storage type to the type of item you want to store.
One size does not fit all, and valuable data should not be stored on slow-speed disk drives along with ordinary data.
The “hot” data – information that is needed often and fast – should be stored on fast disk drives, not slow ones. But these drives should not be cluttered with old data that no one will access in a blue moon.
Disk drive array suppliers and storage administrators have understood this for years, but they still have to resolve the following: how is hot data to be identified, then moved from fast to slow storage, and how often should this be done?
Spin speeds
Disk drives come with various spin speeds and interfaces. Other things being equal, the speed of rotation is the most important factor in data access time. A read/write head has to move to the right track on a disk’s surface and then wait for the disk’s rotation to bring the target sector under the head.
Moving the slider to bring the read/write head to the target track takes the same time irrespective of the disk’s spin speed, leaving disk form factor (3.5in or 2.5in) out of it. A 15,000rpm disk will bring the target sector under the head almost three times faster than a disk spinning at 5,200rpm.
Also, a 15,000rpm drive has a lower capacity than a 7,200rpm drive, which in turn has a lower capacity than a 5,200rpm drive. The faster a drive spins, the lower its capacity and the higher its cost.
Make way for the new
Let’s imagine a company that has all of its data on a drive array with a single disk type. Some of the data is there because it has to be kept just in case, but is of low importance and is accessed infrequently – last year’s expense reports and manufacturing records, for example, or the data kept for regulatory compliance reasons.
Some of the data is accessed more frequently and is probably newer. This could be:
- Emails between one week and one month old
- PowerPoint decks created in the past three months
- Marketing collateral such as whitepapers
- HR staff records
Other data is accessed much more often, for example:
- Customer and accounting databases
- New hire records in HR
- The current manufacturing run
- Sales order processing
So we can allocate data to three overall categories based on access rate and newness. These could be called hot, cool and cold; or fast, medium and low-access. We could also say high access-rate data is high-value data.
Ideally we would put the relatively small amount of hot data on fast disks, which are expensive and low capacity. Medium access data could go on 10K drives; and low-access data, the bulk of our information, could go on high-capacity 7.2K Sata drives.
The difficult bit
That seems simple enough. What’s the problem?
The problem is threefold: data is not static; identifying its state is difficult; and moving it is tricky.
Data is created, used and then kept for reference. These three stages constitute a data lifecycle.
Newly created data could be stored on fast access disks, but as its access rate slows down it takes up space that is needed for newer data and should be moved to an intermediate tier of storage. On the intermediate tier, meanwhile, data is cooling and needs to be moved to the bulk Sata tier to make space.
Do we employ storage admins to identify data that is in the wrong storage tier and move it? Of course not, it should be automated.
System software in an array or server could track the access frequency over time to files and database records and move high-access rate data up the drive tiers and low-access rate data down the tiers.
Job done? Not quite. The tracking of access rates is a significant burden and the moving of data occupies array resources too. If you move large chunks of data then you reduce the number of move operations, which is good; but you might move inappropriate data, which is bad.
Sensible sort
A storage array delivers files or blocks. Ideally, a database should have its blocks allocated to different disk tiers according to the access rate of those blocks.
But that requires either the database to know about disk tiers or the storage array to present a single pool of storage to the database although it is spread across different kinds of disk. Clever software is needed whichever route is taken.
The same is true for large files. In principle, the smaller the unit of data moved, the more efficiently a storage array can match access frequency and storage tiers.
Some storage arrays – Dell Compellent, for example – track individual block access rates (you can’t get more granular than this) and move blocks up and down the tiers dynamically. That’s a lot of data moving going on inside the array, and the Compellent array operating system requires multi-core X86 processors to provide the CPU horsepower to do this.
What is the effect of using flash solid state drives in storage arrays on this automated data tiering?
None whatsoever, in principle, flash being just another tier. However, writing data to flash should be minimised in the interests of flash longevity.
Tiering has become a standard feature of all modern drive arrays, such as EMC’s Fast.
It is the best way to reduce the amount of expensive fast storage in arrays, using cheaper bulk storage for infrequently accessed data, medium-cost storage for medium-value data, and gold-standard fast storage for gold-standard data.
No more tiers for flatter networks • The Register
A very interesting article on how the traditional 3-tier networking model that Cisco popularized is beginning to break down under the weight of new cloud, web2.0, and high-speed/low-latency applications. The 3-tier model is beginning to give way to flatter networking paradigms.
There is a disconnect between data centre networks and modern distributed applications, and it is not a broken wire. It is a broken networking model.
The traditional three-tier, hierarchical data centre networks as defined and championed by Cisco Systems since the commercialisation of the internet protocol inside the glass house no longer matches the systems and applications that are running in those data centres.
Designed for the dotcom era, the hierarchical model is not fast enough or cheap enough for the cloud. And that is why so many companies have been picking at Cisco’s networking lunch and, in turn, been eaten by server makers who know they need to integrate networking in their systems to remain relevant.
Here’s the Cisco view of the networking world, somewhat simplified:
Cisco’s hierarchical network model
The three-tier network design has redundancy built into the core and aggregation layers, which are cross-connected for multi-pathing as well as for high availability.
It works well when end-users inside the firewall want to get at an application and can over-subscribe the networks because utilisation on them is generally low.
This low network utilisation goes hand in hand with low server utilisation, which was the norm for two decades. It was more important to isolate workloads on physical servers and give them a permanent home with their slice of the corporate network.
The kinds of data zinging around are fatter and more unpredictable than simple web and email traffic, too.
Saturation point
But what happens when you want to drive up utilisation on servers, usually through server virtualisation, while you keep adding more cores to the chip?
What happens is you saturate your network and the three-tier model starts breaking down – and you start looking at the supercomputing space for some inspiration.
“A lot of the applications coming out today – cloud, Web 2.0 or high-speed financial applications – have concepts from high-performance computing, which means doing things massively and in parallel,” says Dan Tuchler, vice-president of product management for IBM’s system networking division.
More than a decade after selling off its SNA networking business to Cisco and toeing the Cisco three-tier line, Big Blue spent a rumored $400m to acquire Blade Network Technologies, which makes integrated switches for IBM, Hewlett-Packard, NEC and other blade server manufacturers, as well as top-of-rack switches for rack-based servers.
Tuchler doesn’t think companies will start unplugging all of their core, distribution and access tier gear any time soon because they have made large investments, and the core switches allow them to plug in other features like firewalls and security appliances into them.
Pick a leaf
But for companies that need network traffic to move more efficiently at higher bandwidth and with lower latencies, then a leaf-spine network that has a flatter architecture, or perhaps a fat tree network inspired by supercomputers or a Clos network inspired by telecommunications, might be just the ticket.
Despite the extra devices used, they can be managed as one and still provide a lower latency than most core chassis devices.
The leaf-spine network architecture takes a top-of-rack switch that can reach down into server nodes directly and links it back to a set of non-blocking spine switches that have enough bandwidth to allow for clusters of servers to be linked to each other in the tens of thousands.
Generally speaking, a lot of leaf-spine networks don’t do oversubscription like the hierarchical networks do, because of high-bandwidth, low-latency demands. By nature you can easily scale this network design. You can start very small and there is virtually no limit to the number of nodes you can connect.
“We see customers with three and sometimes even four tiers in their networks, and it is not very efficient,” says John Monson, vice-president of marketing at Mellanox Technologies.
Mellanox is a networking ASIC, interface card and switch maker with expertise in the InfiniBand switch fabric (which by definition was supposed to be a flat network). It got into the Ethernet switching racket through its $218m buy last November of sometime rival and partner Voltaire.
“Scaling up the same old approach just doesn’t work,” explains Monson.
“I can’t wait forever for a virtual machine to go through four layers of switching to get from one point in the network to another. What matters for these modern workloads is efficiency and bandwidth, and if you have networks oversubscribed and hanging off cores, it doesn’t work.”
Change of direction
The east-west traffic problem is what is really killing the three-tier network in the data centre.
Traditionally, traffic through data centres flowed up and down through the network in a north-south orientation – from access, distribution and core layers and back again.
But no more. According to recent vendor surveys as much as 80 to 85 per cent of the traffic in virtualised server infrastructure – what we now call clouds – moves from server node to server node.
This is more like supercomputing than serving in the traditional sense and, not surprisingly, the networks are flattening out just like they have in high-performance clusters.
Driving up server utilisation on such compute clusters – whether they run financial trading and risk analysis programs, virtual server clouds or parallel supercomputer applications – requires not just a flatter network, but a faster one.
If you stay with Gigabit Ethernet interconnects on a three-tier network, how can you drive up server utilisation if you move from one to two to four to eight cores per processor? You will add cores, but they will spend more and more of their time waiting for data to come back to them over the network.
By hook or by crook
Mellanox is not seeing data centres tossing out their expensive three-tier networks. But for new applications – setting up a new trading system or a private cloud – they are rethinking the networks that lash the servers and storage together.
They are also, says Monson, podding servers together in pods of 500 or 1,000 machines, and then using bridges and gateways to hook these applications into the older networks.
“On a three-tier network, as you scale up the servers, those servers are spending all of their time waiting,” says Monson.
“When you flatten the network, the CPU utilisation goes up, and throughput in the application goes up in big jumps, like a factor of two or three times.”
That might mean you can support a given application workload with fewer servers on a new leaf-spine network than you would need on an old three-tier network.
And that is music to the ears of chief executives and financial directors.
NoSQL's Next Step Forward: DataStax Makes Cassandra Commercial
A really interesting description of how Cassandra works and what DataStax is doing to make the NoSQL database more mainstream (outside of Facebook)…
The huge problem for online services is that traditional SQL database managers don’t scale up when database sizes approach “exascale” - the tremendous and fast-growing repositories needed by services like Facebook and Twitter. There’s nothing conceptually wrong with SQL, it’s just that the underlying RDBMS architecture does not perform well with these tremendous workloads.
Simpler database constructs can handle bigger workloads, as long as the work they do stays more along the lines of simple storage and retrieval and doesn’t get too analytical. Today, a new vendor named DataStax whose backers include Rackspace is launching a commercial rendition of an exascale database manager that marries an open source database manager project launched at Facebook with an open source distributed processing project started at Google.
Facebook as it stands today would be impossible without its engineers having developed a new database construct for its Inbox Search system. Dubbed Cassandra in 2009, Facebook distributed this system to the open source community, perhaps with the understanding that even if Cassandra holds the keys to Facebook’s kingdom, it’s too late for anyone else to use those keys to knock its castle down.
Cassandra - which lies at the core of the new DataStax Enterprise project - is based on three and only three very simple methods:
insert,get, anddelete. That’s it; you’ve just learned the entire Cassandra API. Obviously the system omits the “select” construct that typifies SQL; it’s not about finding subsets that match given criteria. But in many Web services, that job isn’t even necessary anyway; the most common function is to retrieve one value from a list, or one node, using only one key as criteria.While at Facebook, engineers Avinash Lakshman and Prashant Malik had a brilliant revelation: As long as exascale databases didn’t need to be ordinated in rows and tables, perhaps their storage should not be considered on a two-dimensional scale - maybe it’s that two-dimensionality that’s slowing down the process. What if instead the database construct were like a bendy-straw whose head was stretched around and attached to its tail? Each node attached to this base would be like a key attached to a keychain. Who cares where the key is located; a random number generator could make that decision. Such randomization could take care of the whole load-balancing problem; as long as the algorithm is sound, keys are likely to be placed is a fairly well-distributed fashion.
So scaling up is no longer about building a tower on top of an existing foundation. Instead, you simply expand the bendy-straw, making the circle bigger but maintaining the even distribution. The data model can then align these nodes onto any number of simultaneous columns, making the structure essentially multi-dimensional as long as you consider a “dimension” a common attribute. And replication can take place at the per-node level, rather than making sure the entire database gets copied multiple places. This way, the software can be responsible for resolving node failures and doesn’t have to spend its time compensating for hardware failures which, as storage arrays get larger, will be an everyday occurrence. (A complete rundown of Lakshman’s and Malik’s Cassandra construct appears in this PDF.) …
In April 2010, with Rackspace’s blessing, two of that company’s engineers, Jonathan Ellis and Matt Pfeil, left to form Riptano, with the initial charter of providing commercial support to Cassandra adopters under the Apache license. Last January, in a makeover move to look more like a database vendor than a social network, Riptano became DataStax. Now, it’s this new company’s mission to put a SQL-free exascale database manager on the map, at the same time competitors such as Oracle are boosting their marketing to cement their customer bases.
“Cassandra is a highly scalable and high-performance distributed database management system that can serve as both a real-time database (the ‘system of record’) for online/transactional applications, and as a read-intensive datastore for business intelligence systems,” reads a DataStax white paper (PDF available here). “Cassandra is built with the assumption that failures can and will occur in a database infrastructure. Therefore, data redundancy to protect against hardware failure and other data loss scenarios is built into and managed transparently by Cassandra. Furthermore, this capability can be configured to be quite sophisticated so that data in a single cluster can be distributed across multiple, geographically dispersed data centers, between different physical racks in a data center, and between public cloud providers and on-premise managed data centers.”
The system is managed using a browser-based tool called OpsCenter, where admins may obtain the status of the system’s ring-shaped clusters, as well as perform Hadoop analytic jobs. With a number of major Cassandra deployments having already taken place, including with well-known names such as WebEx, DataStax’s buildout strategy looks sound and well-supported. The company is also making a Community Edition of the database available to developers and database architects free of charge.
