How I Explained REST to My Wife
A really good primer on what RESTful means —
Ryan: Machines don’t have a universal noun - that’s why they suck. Every programming language, database, or other kind of system has a different way of talking about nouns. That’s why the URL is so important. It let’s all of these systems tell each other about each other’s nouns.
…
Wife: What about verbs and pronouns and adjectives?
Ryan: Funny you asked because that’s another big aspect of REST. Well, verbs are anyway.
Wife: I was just joking.
Ryan: It was a funny joke but it’s actually not a joke at all. Verbs are important. There’s a powerful concept in programming and CS theory calledpolymorphism. That’s a geeky way of saying that different nouns can have the same verb applied to them.
Wife: I don’t get it.
Ryan: Well.. Look at the coffee table. What are the nouns? Cup, tray, newspaper, remote. Now, what are some things you can do to all of these things?
Wife: I don’t get it…
Ryan: You can get them, right? You can pick them up. You can knock them over. You can burn them. You can apply those same exact verbs to any of the objects sitting there.
Wife: Okay… so?
Ryan: Well, that’s important. What if instead of me being able to say to you, “get the cup,” and “get the newspaper,” and “get the remote”; what if instead we needed to come up with different verbs for each of the nouns? I couldn’t use the word “get” universally, but instead had to think up a new word for each verb/noun combination.
Wife: Wow! That’s weird.
Ryan: Yes, it is. Our brains are somehow smart enough to know that the same verbs can be applied to many different nouns. Some verbs are more specific than others and apply only to a small set of nouns. For instance, I can’t drive a cup and I can’t drink a car. But some verbs are almost universal like
GET,PUT, andDELETE.Wife: You can’t
DELETEa cup.Ryan: Well, okay, but you can throw it away. That was another joke, right?
On Developers and Technology Adoption – tecosystems
RedMonk argues that R is rapidly becoming a go-tool analytics tool thanks to bottom-up advocacy by statisticians who used it extensively in school -
The ongoing promotion of developers from serf to kingmakers has many implications, but perhaps none so important as technology adoption. For decades, we’ve been seeing the manifestation of this trend, as the growth in market share of technologies from Chrome to Linux to Mac to MySQL have been driven at least in part by developer populations that preferred them. This was in evidence yet again last week on a Google Hangout James and I participated in organized by IBM.
In discussing the market for analytics, one of the topics of discussion was people, or more specifically the lackthereof. One of the least controversial statements one can make in the technology industry today is that demand for talent is outstripping the supply, at least in most markets. Analytics are no exception. As a result of this shortage, many organizations are the proverbial beggars now unable to be choosers. Where they may previously have hired for analytical roles only those trained on sanctioned analytical tools, businesses are now compelled to hire outside of these comfort zones. And in the analytical world, this most often benefits the statistical language R.
When developers – or in this case, statisticians – are permitted to choose their own tools, an increasing number turn to R if for no other reason than the fact that it was the basis for their academic training. So prevalent is R in academic environments, in fact, that the datasets associated with the text taught in my second semester of statistics were available as a downloadable package via R’s repository, CRAN.
What this means, then, is that like Linux, MySQL, PHP, AWS and other technologies before it, R is being annointed in bottom up fashion as a fundamentally important technology moving forward – it’s not often we get to watch this in real time. Whether enterprises approve of that or not. Even those conservative enterprises who wish to dictate technology choices to their rank and file will find that increasingly difficult in a tight labor market. Fragmentation is the new reality, and as developers make more choices – if only by default – the number of technologies employed within enterprises will inevitably rise. Those businesses that understand this at a fundamental level and do not seek to oppose it will have significant advantages in both hiring and productivity. Those vendors, meanwhile, that appreciate that hetereogeneity is the new norm and optimize for interoperability will, likewise, benefit.
As for the developers? With their newfound influence comes greater responsibility; just as enterprises need to be more flexible in technology adoption, so too should developers be more careful in the selection process. Heterogeneity is fine, even beneficial. Chaos, less so.
High Scalability - High Scalability - Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
A technical look at the infrastructure that supports Tumblr and how it is designed to scale - very interesting as a real world case study.
With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.
Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.
One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr’s situation. Now with twenty engineers there’s enough energy to work on issues and develop some very interesting solutions.
Tumblr started as a fairly typical large LAMP application. The direction they are moving in now is towards a distributed services model built around Scala, HBase, Redis, Kafka, Finagle, and an intriguing cell based architecture for powering their Dashboard. Effort is now going into fixing short term problems in their PHP application, pulling things out, and doing it right using services.
The theme at Tumblr is transition at massive scale. Transition from a LAMP stack to a somewhat bleeding edge stack. Transition from a small startup team to a fully armed and ready development team churning out new features and infrastructure. To help us understand how Tumblr is living this theme is startup veteran Blake Matheny, Distributed Systems Engineer at Tumblr. Here’s what Blake has to say about the House of Tumblr:
Site: http://www.tumblr.com/Stats
- 500 million page views a day
- 15B+ page views month
- ~20 engineers
- Peak rate of ~40k requests per second
- 1+ TB/day into Hadoop cluster
- Many TB/day into MySQL/HBase/Redis/Memcache
- Growing at 30% a month
- ~1000 hardware nodes in production
- Billions of page visits per month per engineer
- Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
- Dashboard runs at a million writes a second, 50K reads a second, and it is growing.
Software
- OS X for development, Linux (CentOS, Scientific) in production
- Apache
- PHP, Scala, Ruby
- Redis, HBase, MySQL
- Varnish, HA-Proxy, nginx,
- Memcache, Gearman, Kafka, Kestrel, Finagle
- Thrift, HTTP
- Func - a secure, scriptable remote control framework and API
- Git, Capistrano, Puppet, Jenkins
Hardware
- 500 web servers
- 200 database servers (many of these are part of a spare pool we pulled from for failures)
- 47 pools
- 30 shards
- 30 memcache servers
- 22 redis servers
- 15 varnish servers
- 25 haproxy nodes
- 8 nginx
- 14 job queue servers (kestrel + gearman)
Architecture
- Tumblr has a different usage pattern than other social networks.
- With 50+ million posts a day, an average post goes to many hundreds of people. It’s not just one or two users that have millions of followers. The graph for Tumblr users has hundreds of followers. This is different than any other social network and is what makes Tumblr so challenging to scale.
- #2 social network in terms of time spent by users. The content is engaging. It’s images and videos. The posts aren’t byte sized. They aren’t all long form, but they have the ability. People write in-depth content that’s worth reading so people stay for hours.
- Users form a connection with other users so they will go hundreds of pages back into the dashboard to read content. Other social networks are just a stream that you sample.
- Implication is that given the number of users, the average reach of the users, and the high posting activity of the users, there is a huge amount of updates to handle.
- Tumblr runs in one colocation site. Designs are keeping geographical distribution in mind for the future.
- Two components to Tumblr as a platform: public Tumblelogs and Dashboard
- Public Tumblelog is what the public deals with in terms of a blog. Easy to cache as its not that dynamic.
- Dashboard is similar to the Twitter timeline. Users follow real-time updates from all the users they follow.
- Very different scaling characteristics than the blogs. Caching isn’t as useful because every request is different, especially with active followers.
- Needs to be real-time and consistent. Should not show stale data. And it’s a lot of data to deal with. Posts are only about 50GB a day. Follower list updates are 2.7TB a day. Media is all stored on S3.
- Most users leverage Tumblr as tool for consuming of content. Of the 500+ million page views a day, 70% of that is for the Dashboard.
- Dashboard availability has been quite good. Tumblelog hasn’t been as good because they have a legacy infrastructure that has been hard to migrate away from. With a small team they had to pick and choose what they addressed for scaling issues.
Old Tumblr
- When the company started on Rackspace it gave each custom domain blog an A record. When they outgrew Rackspace there were too many users to migrate. This is 2007. They still have custom domains on Rackspace. They route through Rackspace back to their colo space using HAProxy and Varnish. Lots of legacy issues like this.
- A traditional LAMP progression.
- Historically developed with PHP. Nearly every engineer programs in PHP.
- Started with a web server, database server and a PHP application and started growing from there.
- To scale they started using memcache, then put in front-end caching, then HAProxy in front of the caches, then MySQL sharding. MySQL sharding has been hugely helpful.
- Use a squeeze everything out of a single server approach. In the past year they’ve developed a couple of backend services in C: an ID generator and Staircar, using Redis to power Dashboard notifications
- The Dashboard uses a scatter-gather approach. Events are displayed when a user access their Dashboard. Events for the users you follow are pulled and displayed. This will scale for another 6 months. Since the data is time ordered sharding schemes don’t work particularly well.
New Tumblr
- Changed to a JVM centric approach for hiring and speed of development reasons.
- Goal is to move everything out of the PHP app into services and make the app a thin layer over services that does request authentication, presentation, etc.
- Scala and Finagle Selection
- Internally they had a lot of people with Ruby and PHP experience, so Scala was appealing.
- Finagle was a compelling factor in choosing Scala. It is a library from Twitter. It handles most of the distributed issues like distributed tracing, service discovery, and service registration. You don’t have to implement all this stuff. It just comes for free.
- Once on the JVM Finagle provided all the primitives they needed (Thrift, ZooKeeper, etc).
- Finagle is being used by Foursquare and Twitter. Scala is also being used by Meetup.
- Like the Thrift application interface. It has really good performance.
- Liked Netty, but wanted out of Java, so Scala was a good choice.
- Picked Finagle because it was cool, knew some of the guys, it worked without a lot of networking code and did all the work needed in a distributed system.
- Node.js wasn’t selected because it is easier to scale the team with a JVM base. Node.js isn’t developed enough to have standards and best practices, a large volume of well tested code. With Scala you can use all the Java code. There’s not a lot of knowledge of how to use it in a scalable way and they target 5ms response times, 4 9s HA, 40K requests per second and some at 400K requests per second. There’s a lot in the Java ecosystem they can leverage.
- Internal services are being shifted from being C/libevent based to being Scala/Finagle based.
- Newer, non-relational data stores like HBase and Redis are being used, but the bulk of their data is currently stored in a heavily partitioned MySQL architecture. Not replacing MySQL with HBase.
- HBase backs their URL shortner with billions of URLs and all the historical data and analytics. It has been rock solid. HBase is used in situations with high write requirements, like a million writes a second for the Dashboard replacement. HBase wasn’t deployed instead of MySQL because they couldn’t bet the business on HBase with the people that they had, so they started using it with smaller less critical path projects to gain experience.
- Problem with MySQL and sharding for time series data is one shard is always really hot. Also ran into read replication lag due to insert concurrency on the slaves.
- Created a common services framework.
- Spent a lot of time upfront solving operations problem of how to manage a distributed system.
- Built a kind of Rails scaffolding, but for services. A template is used to bootstrap services internally.
- All services look identical from an operations perspective. Checking statistics, monitoring, starting and stopping all work the same way for all services.
- Tooling is put around the build process in SBT (a Scala build tool) using plugins and helpers to take care of common activities like tagging things in git, publishing to the repository, etc. Most developers don’t have to get in the guts of the build system.
- Front-end layer uses HAProxy. Varnish might be hit for public blogs. 40 machines.
- 500 web servers running Apache and their PHP application.
- 200 database servers. Many database servers are used for high availability reasons. Commodity hardware is used an the MTBF is surprisingly low. Much more hardware than expected is lost so there are many spares in case of failure.
- 6 backend services to support the PHP application. A team is dedicated to develop the backend services. A new service is rolled out every 2-3 weeks. Includes dashboard notifications, dashboard secondary index, URL shortener, and a memcache proxy to handle transparent sharding.
- Put a lot of time and effort and tooling into MySQL sharding. MongoDB is not used even though it is popular in NY (their location). MySQL can scale just fine..
- Gearman, a job queue system, is used for long running fire and forget type work.
- Availability is measured in terms of reach. Can a user reach custom domains or the dashboard? Also in terms of error rate.
- Historically the highest priority item is fixed. Now failure modes are analyzed and addressed systematically. Intention is to measure success from a user perspective and an application perspective. If part of a request can’t be fulfilled that is account for
- Initially an Actor model was used with Finagle, but that was dropped. For fire and forget work a job queue is used. In addition, Twitter’s utility library contains a Futures implementation and services are implemented in terms of futures. In the situations when a thread pool is needed futures are passed into a future pool. Everything is submitted to the future pool for asynchronous execution.
- Scala encourages no shared state. Finagle is assumed correct because it’s tested by Twitter in production. Mutable state is avoided using constructs in Scala or Finagle. No long running state machines are used. State is pulled from the database, used, and writte n back to the database. Advantage is developers don’t need to worry about threads or locks.
- 22 Redis servers. Each server has 8 - 32 instances so 100s of Redis instances are used in production.
- Used for backend storage for dashboard notifications.
- A notification is something like a user liked your post. Notifications show up in a user’s dashboard to indicate actions other users have taken on their content.
- High write ratio made MySQL a poor fit.
- Notifications are ephemeral so it wouldn’t be horrible if they were dropped, so Redis was an acceptable choice for this function.
- Gave them a chance to learn about Redis and get familiar with how it works.
- Redis has been completely problem free and the community is great.
- A Scala futures based interface for Redis was created. This functionality is now moving into their Cell Architecture.
- URL shortener uses Redis as the first level cache and HBase as permanent storage.
- Dashboard’s secondary index is built around Redis.
- Redis is used as Gearman’s persistence layer using a memcache proxy built using Finagle.
- Slowly moving from memcache to Redis. Would like to eventually settle on just one caching service. Performance is on par with memcache.
Internal Firehose
- Internally applications need access to the activity stream. An activity steam is information about users creating/deleting posts, liking/unliking posts, etc. A challenge is to distribute so much data in real-time. Wanted something that would scale internally and that an application ecosystem could reliably grow around. A central point of distribution was needed.
- Previously this information was distributed using Scribe/Hadoop. Services would log into Scribe and begin tailing and then pipe that data into an app. This model stopped scaling almost immediately, especially at peak where people are creating 1000s of posts a second. Didn’t want people tailing files and piping to grep.
- An internal firehose was created as a message bus. Services and applications talk to the firehose via Thrift.
- LinkedIn’s Kafka is used to store messages. Internally consumers use an HTTP stream to read from the firehose. MySQL wasn’t used because the sharding implementation is changing frequently so hitting it with a huge data stream is not a good idea.
- The firehose model is very flexible, not like Twitter’s firehose in which data is assumed to be lost.
- The firehose stream can be rewound in time. It retains a week of data. On connection it’s possible to specify the point in time to start reading.
- Multiple clients can connect and each client won’t see duplicate data. Each client has a client ID. Kafka supports a consumer group idea. Each consumer in a consumer group gets its own messages and won’t see duplicates. Multiple clients can be created using the same consumer ID and clients won’t see duplicate data. This allows data to be processed independently and in parallel. Kafka uses ZooKeeper to periodically checkpoint how far a consumer has read.
Cell Design for Dashboard Inbox
- The current scatter-gather model for providing Dashboard functionality has very limited runway. It won’t last much longer.
- The solution is to move to an inbox model implemented using a Cell Based Architecture that is similar to Facebook Messages.
- An inbox is the opposite of scatter-gather. A user’s dashboard, which is made up posts from followed users and actions taken by other users, is logically stored together in time order.
- Solves the scatter gather problem because it’s an inbox. You just ask what is in the inbox so it’s less expensive then going to each user a user follows. This will scale for a very long time.
- Rewriting the Dashboard is difficult. The data has a distributed nature, but it has a transactional quality, it’s not OK for users to get partial updates.
- The amount of data is incredible. Messages must be delivered to hundreds of different users on average which is a very different problem than Facebook faces. Large date + high distribution rate + multiple datacenters.
- Spec’ed at a million writes a second and 50K reads a second. The data set size is 2.7TB of data growth with no replication or compression turned on. The million writes a second is from the 24 byte row key that indicates what content is in the inbox.
- Doing this on an already popular application that has to be kept running.
- Cells
- A cell is a self-contained installation that has all the data for a range of users. All the data necessary to render a user’s Dashboard is in the cell.
- Users are mapped into cells. Many cells exist per data center.
- Each cell has an HBase cluster, service cluster, and Redis caching cluster.
- Users are homed to a cell and all cells consume all posts via firehose updates.
- Each cell is Finagle based and populates HBase via the firehose and service requests over Thrift.
- A user comes into the Dashboard, users home to a particular cell, a service node reads their dashboard via HBase, and passes the data back.
- Background tasks consume from the firehose to populate tables and process requests.
- A Redis caching layer is used for posts inside a cell.
- Request flow: a user publishes a post, the post is written to the firehose, all of the cells consume the posts and write that post content to post database, the cells lookup to see if any of the followers of the post creator are in the cell, if so the follower inboxes are updated with the post ID.
- Advantages of cell design:
- Massive scale requires parallelization and parallelization requires components be isolated from each other so there is no interaction. Cells provide a unit of parallelization that can be adjusted to any size as the user base grows.
- Cells isolate failures. One cell failure does not impact other cells.
- Cells enable nice things like the ability to test upgrades, implement rolling upgrades, and test different versions of software.
- The key idea that is easy to miss is: all posts are replicated to all cells.
- Each cell stores a single copy of all posts. Each cell can completely satisfy a Dashboard rendering request. Applications don’t ask for all the post IDs and then ask for the posts for those IDs. It can return the dashboard content for the user. Every cell has all the data needed to fulfill a Dashboard request without doing any cross cell communication.
- Two HBase tables are used: one that stores a copy of each post. That data is small compared to the other table which stores every post ID for every user within that cell. The second table tells what the user’s dashboard looks like which means they don’t have to go fetch all the users a user is following. It also means across clients they’ll know if you read a post and viewing a post on a different device won’t mean you read the same content twice. With the inbox model state can be kept on what you’ve read.
- Posts are not put directly in the inbox because the size is too great. So the ID is put in the inbox and the post content is put in the cell just once. This model greatly reduces the storage needed while making it simple to return a time ordered view of an users inbox. The downside is each cell contains a complete copy of call posts. Surprisingly posts are smaller than the inbox mappings. Post growth per day is 50GB per cell, inbox grows at 2.7TB a day. Users consume more than they produce.
- A user’s dashboard doesn’t contain the text of a post, just post IDs, and the majority of the growth is in the IDs.
- As followers change the design is safe because all posts are already in the cell. If only follower posts were stored in a cell then cell would be out of date as the followers changed and some sort of back fill process would be needed.
- An alternative design is to use a separate post cluster to store post text. The downside of this design is that if the cluster goes down it impacts the entire site. Using the cell design and post replication to all cells creates a very robust architecture.
- A user having millions of followers who are really active is handled by selectively materializing user feeds by their access model (see Feeding Frenzy).
- Different users have different access models and distribution models that are appropriate. Two different distribution modes: one for popular users and one for everyone else.
- Data is handled differently depending on the user type. Posts from active users wouldn’t actually be published, posts would selectively materialized.
- Users who follow millions of users are treated similarly to users who have millions of followers.
- Cell size is hard to determine. The size of cell is the impact site of a failure. The number of users homed to a cell is the impact. There’s a tradeoff to make in what they are willing to accept for the user experience and how much it will cost.
- Reading from the firehose is the biggest network issue. Within a cell the network traffic is manageable.
- As more cells are added cells can be placed into a cell group that reads from the firehose and then replicates to all cells within the group. A hierarchical replication scheme. This will also aid in moving to multiple datacenters.
…
Software Deployment
- Started with a set of rsync scripts that distributed the PHP application everywhere. Once the number of machines reached 200 the system started having problems, deploys took a long time to finish and machines would be in various states of the deploy process.
- The next phase built the deploy process (development, staging, production) into their service stack using Capistrano. Worked for services on dozens of machines, but by connecting via SSH it started failing again when deploying to hundreds of machines.
- Now a piece of coordination software runs on all machines. Based around Func from RedHat, a lightweight API for issuing commands to hosts. Scaling is built into Func.
- Build deployment is over Func by saying do X on a set of hosts, which avoids SSH. Say you want to deploy software on group A. The master reaches out to a set of nodes and runs the deploy command.
- The deploy command is implemented via Capistrano. It can do a git checkout or pull from the repository. Easy to scale because they are talking HTTP. They like Capistrano because it supports simple directory based versioning that works well with their PHP app. Moving towards versioned updates, where each directory contains a SHA so it’s easy to check if a version is correct.
- The Func API is used to report back status, to say these machines have these software versions.
- Safe to restart any of their services because they’ll drain off connections and then restart.
- All features run in dark mode before activation.
Development
- Started with the philosophy that anyone could use any tool that they wanted, but as the team grew that didn’t work. Onboarding new employees was very difficult, so they’ve standardized on a stack so they can get good with those, grow the team quickly, address production issues more quickly, and build up operations around them.
- Process is roughly Scrum like. Lightweight.
- Every developer has a preconfigured development machine. It gets updates via Puppet.
- Dev machines can roll changes, test, then roll out to staging, and then roll out to production.
- Developers use vim and Textmate.
- Testing is via code reviews for the PHP application.
- On the service side they’ve implemented a testing infrastructure with commit hooks, Jenkins, and continuous integration and build notifications.
…
Lessons learned
- Automation everywhere.
- MySQL (plus sharding) scales, apps don’t.
- Redis is amazing.
- Scala apps perform fantastically.
- Scrap projects when you aren’t sure if they will work.
- Don’t hire people based on their survival through a useless technological gauntlet. Hire them because they fit your team and can do the job.
- Select a stack that will help you hire the people you need.
- Build around the skills of your team.
- Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
- Ask your peers. They talked to engineers from Facebook, Twitter, LinkedIn about their experiences and learned from them. You may not have access to this level, but reach out to somebody somewhere.
- Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.
What’s in Store for 2012: A Few Predictions – tecosystems
2012 predictions from RedMonk’s Stephen O’Grady. These are always good.
Data & The Last Mile
It is not technically correct to assert that large scale data infrastructure is a solved problem. Decades of innovation remain, as the Cambrian explosion of projects demonstrate. It is nevertheless true that relative to the user interface, data storage and manipulation is a solved problem. Since the original creation of Hadoop in 2006, for example, we have seen multiple user interfaces applied: connectors (e.g. R), standard MapReduce, scripting (e.g. Jaql/Pig), SQL (e.g. Hive), spreadsheets (e.g. BigSheets), client tooling (e.g. Karmasphere). Each has its strengths, none bridges the last mile: putting the power of Big Data in the hands of ordinary users.
Which is perhaps unsurprising; even the mature relational database world uses abstractions of varying levels of complexity to interface with business users. But with data driven decision making on the rise, premiums are being placed on tooling which can expose in sensible fashion data to those without degrees in computer science. Hence, the elevated visibility of startups such as Metamarkets, who excite data scientists with tools like Druid but whose valuation may ultimately depend on its last mile expertise.
At this point in time, whatever my preferred model for data storage and whatever the type, there will be greater than one credible option for a data engine. The same cannot be said for presentation. Which would be less problematic if the market for Big Data talent were not so desperate; outsourcing to shops like Mu Sigma will be an option in some quarters, but comes with its own inefficiences and risks, not to mention per inquiry premiums.
This, then, will be an area of focus in 2012, for both innovation (look for assisted anomaly and correlation identification, a la Google Correlate) and M&A.
Desktop Importance Declines
The most interesting characteristic of the forthcoming Windows 8 release isn’t the technology, which is curious because it’s revolutionary from a Microsoft standpoint. From the support for ARM to the addition of the Windows Store to the ability to author in JavaScript and HTML5, there is much to digest. Instead, the single most defining characteristic of the pending launch is apathy.
Overall inquiries and discussion of the platform demonstrate curiosity but limited interest; the visibility of the once dominant Windows platform is secondary to mobile platforms like Android and iOS.While this is not a function of any specific or general design failures on the part of Microsoft – indeed, the platform is incorporating important changes while making itself more developer accessible – it is symptomatic of a broader and more difficult to attack problem: the declining role of the desktop.
The desktop is simply not as important as it once was. Mobile usage is eroding the central role PC’s once played; while they are still the dominant form of computing, the trendline is declining and there is no reason to expect it to invert. It’s been suggested that mobile computing in general is additive; that it’s being used to extend the usage of computing to areas where PCs were not employed, and is thus non-competitive. But our data as well as Asymco‘s indicates that, at least in part, mobile usage is coming at the expense of traditional platforms. General search volume data, as we’ve seen, validates this assertion.
There are two implications here. Most obviously, Microsoft’s ability to generate interest in and thus leverage for its flagship operating system is jeopardized. Worldwide developer populations are not necessarily zero sum as skills overlap, but they tend to be rivalrous; an Android or iOS developer is often a lost potential Windows developer – experiments like BlueStacks aside. We can therefore expect Microsoft to have to expend more effort to attract fewer developers to their platform, a negative cycle which becomes cyclical. Second, as the desktop’s primacy abates, we can expect to see greater competition in the marketplace. As enterprises become by necessity more heterogeneous, incorporating Android and iOS devices, the costs of supporting second operating systems drifts towards marginal, which means that forecasts of greater Apple penetration become more probable.
Monitoring as a Service
We are not oriented around category definitions at RedMonk; we prefer market driven names to those conceived and marketed by the analyst industry. That said, it seems clear that the time of Monitoring-as-a-Service (MaaS) is at hand. New Relic’s growth led to a $15M round in November, Boundary took $4M a year ago this month, Monktoberfest speaker Theo Schlossnagle’s Circonus has been in market for over a year, and virtually every vendor that we speak with today is adding monitoring and management facilities, from 10gen’s MMS to Cloudera’s Cloudera Manager.
The proliferation of these services is a direct response to the increasingly heterogeneous nature of application architecture and the reality that the substrate is frequently network based, rather than local. Given accelerating rather than declining consumption of network resources, we predict a strong increase in interest and adoption of MaaS tools. Much as I don’t care for the term itself.
Intelligent usage of generated telemetry – which we’ll come back to – will further cement adoption, delivering previously unseen value.
Open Source and the Paradox of Choice
Gartner in March of last year asserted that open source had hit a tipping point, saying:
“Mainstream adopters of IT solutions across a widening array of market segments are rapidly gaining confidence in the use of open source software.”
We concur, although we would argue that the tipping point actually occured ten years or more prior. The Apache web server and MySQL were originally written in 1995. In 1999, we saw the public offering of Red Hat and the creation by IBM – as mainstream a technology brand as there is in the enterprise – of the Linux Technology Center. Firefox was first released in 2003. None of these reached their relative levels of popularity in the past twelve months; they have instead been the de facto infrastructure for the better part of the last decade.
Regardless of when one asserts that open source crossed the chasm, however, it remains that it is a model whose popularity is increasing over time. As understanding of the benefits increases and concerns about the risks abate, more organizations are not only consuming open source but contributing to it. Evidence suggests, in fact, that perceptions of the value of software are in decline – we’ll come back to that too, and that the end result of this is that more proprietary code is being released as open source software.
Widely perceived as a net benefit, however, the influx of new projects does present problems for would be adopters. Specifically, the paradox of choice implies that developers will increasingly be forced to select from a growing sea of projects which may or may not be suitable for their needs. And while the nature of open source guarantees developers the ability to apply this code to their projects without restriction or commercial engagement, this is a process with a limited ability to scale. Consider the NoSQL space, as an example. Presuming for the sake of argument that the developers in question understand the different categories of database – key value stores, document databases, columnar databases, MapReduce engines, graph databases and so on – well enough to understand their high level needs, there are at least two and sometimes as many as half a dozen credible options to consider.
This paradox of choice, or too much of a good thing, will become more problematic over time rather than less as contributions will continue to rise. The net impact is likely to be increased commercial opportunities around selection, and therefore attention to vendors like Black Duck, Open Logic, Palamida and Sonatype.
PaaS: The New Standard
It has been evident for some time that runtime fragmentation – an aggressive diversification of programming languages and frameworks, specifically – will change the development landscape. The market failure of the first generation PaaS providers, in fact, was primarily a function of their over-prescriptive natures. The benefits to outsourcing management and scale were obsoleted by the constraints; Java shops were never likely to rewrite their application stack in Python or Ruby strictly to benefit from a platform. Which is why virtually every relevant PaaS provider today offers a choice of runtimes, so as to maximize their addressable market.
But in a fragmented world, what might emerge as a standard? From a developers’ perspective, the standard is most often the framework they’re deploying to, whether that’s Django, Node.js, Lift, Play, Rails, Spring, the Zend Framework or another. From a vendor perspective, however, the new standard is likely to be one level of abstraction up from individual language frameworks: the platform itself. Certainly this is VMware’s opinion, as they are in Maritz’ words trying to construct “the 21st-century equivalent of Linux” – i.e. the substrate that everything else is built on top of.
In 2012, this will become more apparent. PaaS platforms will emerge as the new standard from a runtime and deployment perspective, the middleware target for a new generation of application architectures.
Service Proliferation
With the inevitable adoption of multiple third party services – varying cloud resources, multiple, possibly overlapping, management and monitoring services and so on – will come challenges in making sense of the whole. Overall, instrumentation and visibility on a per service level is improved, but aggregating these views into a cohesive picture of overall architectural health and performance is likely to be highly problematic. Not least because the services themselves may present conflicting information and data. Google Analytics and New Relic, for example, are frequently at odds over load times and other delivery related performance metrics. Introduce in to that mix services like Boundary or CloudWatch and the picture becomes that much more complex. Connecting their data back to underlying log management and monitoring solutions such as 10gen’s MMS or Splunk is more complicated still.
The challenges of service intregration will create commercial opportunities for aggregating services which consume individual performance streams, normalize it and present customers with a consolidated single picture of their network performance. Commercial solutions will not fully deliver on this vision in 2012, but we will see progress and announcements in this direction.
Telemetry Usage
Five years ago, we began publicly discussing revenue models based around what we termed telemetry, or product generated datastreams. The context was providing open source commercial vendors with a viable economic model that better aligned customer and vendor needs, but the approach is by no means limited to that category: Software-as-a-Service vendors, as an example, are well positioned to leverage the data because they maintain the infrastructure. In 2011, we finally began seeing vendors besides Spiceworks take the first steps towards incorporating data based revenue models. For products like Sonatype Insight [coverage], data is not a byproduct, but the product.
In 2012, this trend will accelerate as necessary monitoring capabilities are added to product portfolios and industry understanding and acceptance of the model overcomes conservative privacy concerns. Many more vendors will begin to realize that like New Relic, which observed a decline in commercial application server usage, their accumulated data is full of insights on customer behaviors and wider market trends both.
Value of Software Will Continue to Decline
Capital markets have not, traditionally, been overly fond of software firms, perhaps because comparatively few of them eclipse annual revenue marks of a billion dollars – less than twenty, by Forbes‘ count. Microsoft’s share price has languished for over a decade in spite of having not one but two licenses to print money. The mean age of the PwC’s Top 20 software firms by revenue is 47 years; a fact which cannot be encouraging to startups.
Higher valuations instead are being awarded to entities that employ software to some end, rather than attempting to realize revenue from it directly. Startups today realize this, and the value of software in their models has commensurately been adjusted downward. Tom Preston-Werner, for example, describes the GitHub philosophy as “open source (almost) everything.” Facebook, LinkedIn, Rackspace, Twitter and others exhibit a similar lack of protectiveness regarding their software assets, all having open sourced core components of their software infrastructure that would have been even five years ago fiercely guarded.
This is becoming the expectation rather than the exception because it is nothing more or less than an intelligent business strategy. Businesses can and will keep private assets they believe represent competitive differentiation, but it will be increasingly apparent that less and less software is actually differentiating. As a result, 2012 will see even less emphasis on the value of software and more on what the software can be used to achieve.
The memories of a Product Manager: Many Core processors: Everything You Know (about Parallel Programming) Is Wrong!
David Ungar of IBM discusses how programming pardigms will change to accommodate massively multi-core computing environments - pretty mind-boggling.
In the end of the first decade of the new century, chips such as Tilera’s can give us a glimpse of a future in which manycore microprocessors will become commonplace: every (non-hand-held) computer’s CPU chip will contain 1,000 fairly homogeneous cores. Such a system will not be programmed like the cloud, or even a cluster because communication will be much faster relative to computation. Nor will it be programmed like today’s multicore processors because the illusion of instant memory coherency will have been dispelled by both the physical limitations imposed by the 1,000-way fan-in to the memory system, and the comparatively long physical lengths of the inter- vs. intra-core connections. In the 1980’s we changed our model of computation from static to dynamic, and when this future arrives we will have to change our model of computation yet again.
If we cannot skirt Amdahl’s Law, the last 900 cores will do us no good whatsoever. What does this mean? We cannot afford even tiny amounts of serialization. Locks?! Even lock-free algorithms will not be parallel enough. They rely on instructions that require communication and synchronization between cores’ caches. Just as we learned to embrace languages without static type checking, and with the ability to shoot ourselves in the foot, we will need to embrace a style of programming without any synchronization whatsoever.
In our Renaissance project at IBM, Brussels, and Portland State, we are investigating what we call “anti-lock,” “race-and-repair,” or “end-to-end nondeterministic” computing. As part of this effort, we have build a Smalltalk system that runs on the 64-core Tilera chip, and have experimented with dynamic languages atop this system. When we give up synchronization, we of necessity give up determinism. There seems to be a fundamental tradeoff between determinism and performance, just as there once seemed to be a tradeoff between static checking and performance.
The obstacle we shall have to overcome, if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers. This assumption is deeply embedded in how we think about programming. The folks who build web search engines already understand, but for the rest of us, to quote Firesign Theatre: Everything You Know Is Wrong!
Revisiting the 2011 Predictions, Part 2 – tecosystems
More from O’Grady.
ARM Will Emerge as a Server Player
Whether they will ultimately emerge as a credible mainstream alternative remains to be seen, but ARM is indeed emerging as a server player. Though virtually all of them discuss it privately, HP (via Calexda) this year became the first major systems player to publicly detail plans for ARM servers – perhaps banking on the fact that the upcoming A15 processor is more server friendly,
Intel is predictably skeptical of ARM’s viability in its core markets, with CEO Paul Otellini bluntly dismissive: “It ain’t gonna work.” And while it certainly hasn’t proven to work thus far, and there are real architectural and software issues to address, the power profile continues to pique the interest of server manufacturers and customers alike. Even marginal power savings mean real dollars at scale.
I count this as a hit…
The NoSQL Marketplace Will Experience Consolidation
The merger of CouchOne and Membase into CouchBase in February provided some evidence that the long anticipated wave of consolidation in this space was beginning, but the balance of the year provided little evidence to support this aside from the acceleration of a few individual players such as MongoDB [coverage]. I remain convinced that the marketplace will be unable to sustain the current volume of would be commercial entities, but from our conversations with both those in a position to potentially impact consolidation and those interested in partnering with various NoSQL players, it is clear that consolidation will depend on clearer winners and losers to proceed. This should occur in 2012.
I’ll count this as a push in light of the CouchBase merger which subtracted one player but otherwise saw very few exits.
NoSQL Will Look More Like Pro-SQL
The implicit rejection of the Structured Query Language in the NoSQL term is ironic in light of the fact that a variety of projects are now adding similar features. Continuing in the proud tradition of Hive and Pig, which provide query language interfaces to Hadoop, DataStax announced CQL in June while CouchBase and SQLite announced UnQL in July [coverage].
Whether we’ll see a unified interface or a variety of engine-specific implementations as Alex Popescu would prefer remains to be seen, but query languages will be coming to the majority of NoSQL stores one way or another.
I count this as a hit.
Open Source of Non-Strategic Infrastructure Assets Will Increase
From Twitter open sourcing the Storm assets it acquired via the BackType transaction to the New York Stock Exchange’s donation of OpenMAMA to the Linux Foundation, it is increasingly clear even to traditional parties that the release of non-strategic code as open source has multiple benefits. GitHub’s Tom Preston-Werner’s list of same is difficult to improve upon:
- “Open sourcing code is great advertising for you and your company.”
- “If your code is popular enough…you will have created a force multiplier that helps you get more work done faster and cheaper. “
- “When you open source useful code, you attract talent.”
- “If you’re hiring, the best technical interview possible is the one you don’t have to do because the candidate is already kicking ass on one of your open source projects.”
- “Dedication to open source code is an amazingly effective way to retain that talent.”
- “[Assuming code will be open sourced] leads to effortless modularization.”
- “By getting code out in the public we can drastically reduce duplication of effort.”
- “It’s the right thing to do.”
It may or may not be beneficial to open source core strategic assets, as VMware did with Cloud Foundry, but it is increasingly hard to justify protecting those that are purely tactical in nature. The benefits in many if not most cases will outweigh the costs, which is why we’re seeing an increase in contributions to open source projects.
The data from the annual Eclipse surveys is one example of this. If we examine the percentage of organizations that contribute back to open source versus those that do not from 2007 to 2011, it is clear that comfort levels with open source generally are rising.
I count this as a hit.
What if IBM Software Got Simple? – James Governor's Monkchips
A rambling post from James Governor on convention versus configuration with some interesting thoughts about IBM’s new approach to simplifying software deployments.
Last month I attended IBM’s 10th Annual [Steve] Mills event, when industry analysts converge to hear what Software Group has been up to, and where its going. There is always a ton of content, which makes it hard to summarize, so I won’t even try. But there are a couple of key narratives I want to capture, and I will use separate posts for them.
The first narrative is Get Simple.
Generally IBM doesn’t understand simple. IBM likes to create systems that are infinitely configurable, to meet every possible “need” that an enterprise might ever think of. As I like to say: IBM never met a requirement it didn’t like. But configuration is expensive – it requires consultants (go IGS!) and a lot of time and unnecessary pain.
I had a number of conversations at the Mills event however, which indicated IBM is getting a feel for the new simple. Rather than telling customers they can have all their old complexity and cloud operations too IBM is going to start being opinionated about system images. One of the first IBM products built to this way of thinking is the new Intelligent Operations Center, used as the basis for IBM Smarter City and water management plays. Customers can basically acquire the IOC in four different versions… large on-prem, large SaaS, small on-prem, small SaaS, and that’s pretty much it. But more products are likely to take the same approach.
When I have spoken to IBM Distinguished Engineers and senior managers in the past they have tended to believe that complexity could be abstracted, but after the failure of models such as Ensembles, it seems a new pragmatism at work. I talked to Jason McGee and Rob High, both DEs, and they both talked to the new simplicity as a better way of doing things.
IBM of course isn’t the only one with the config problem – its practically definitional for Enterprise software. In 2006 I said “Microsoft servers are a configuration fetishists’ wet dream”.
Unlike the enterprise however the Web thrives on simplicity – certainly on the config and operations side. Ruby on Rails, the favorite framework of many web developers, is based on a core concept – Convention over Configuration.
The idea is summed up pretty nicely here:“Design a framework so that it enforces standard naming conventions for mapping classes to resources or events. A programmer only needs to write the mapping configurations when the naming convention” fails.” …
Now I don’t want to go down a Rails is more maintainable than Java rathole, especially because Web Companies seemingly turn into Java shops when they grow up (Twitter, for example, recently took a seat on the JCP). As Eric Baldeschweiler, founder of Hortonworks and the guy that hired Doug Cutting to build Hadoop for Yahoo, told me recently he learned to love Java precisely because it allowed Hadoop to evolve, because of its maintainability.
But its inarguable that reducing configuration options makes support and maintenance easier. You can’t automate your way out of complexity.
That’s the real lesson of the web. How does Twitter support a user population of hundreds of millions of people with such a small ops team? The answer is limiting the range of deployment options in terms of servers. The same is true of any of the major web firms.
As I wrote recently, VMware seems to understand this trend lessons of the web: on vmware, cloud and what comes next. I have to say its good news to see another client, in the shape of IBM, thinking the same way. The Web can teach the enterprise something about Cloud Computing – after all, that’s where it came from, right?
Am I saying that all IBM products are now easy to install? Of course not. This will be a multiyear effort with missteps along the way. But consider- when you build a new data center you don’t retrofit it with a bunch of old intel gear. You build out for scale, with the latest hardware. That’s the kind of model we need to move to in software.
As much as anything, I think its worth calling out that IBM is thinking this way.
Red Hat signs giants to anti-VMware open-source project • The Register
Red Hat has been joined by some major heavyweight - including IBM, Cisco, and Intel - to push a more full-featured version of KVM as an alternative to VMware.
Red Hat is taking on VMware with five enterprise heavyweights through a vendor-neutral virtualisation community project based on its RHEV-M stack.
Red Hat has been joined by Cisco, IBM, Intel, NetApp and SuSE to lead oVirt Project, planning on building a pluggable hypervisor management framework along with an ecosystem of plug-in partners around its virtualisation management tool for KVM.
To seed the project and motivate the community, Red Hat is releasing the RHEV-M code to oVirt under an Apache Software Foundation (ASF) licence.
oVirt’s official launch is in early November, at a workshop day that will be hosted by networking giant Cisco at its San Jose, California, campus. The ASF’d RHEV-M code will be released to Git repositories at this event. Today oVirt consists of 13 projects with the expectation for another two to be added by November…
The Linux distro vendor now hopes the OVA will become heavily involved in marketing oVirt. Joining Red Hat at the OVA launch earlier this year were IBM, Hewlett-Packard, Intel, Novell, BMC, and Eucalyptus Systems; today OVA claims more than 200 members.
The idea behind oVirt is to give people who want an alternative to VMware more than just the choice of going with a Red-Hat subscription for RHEV-M or using Microsoft’s Hyper-V, Red Hat told us.
“This will make things radically different,” Red Hat technical director Carl Trielof told The Reg. “Massive adopt of the technology as an alternative to VMware can only be a good thing.”
“This is going to be the first open and openly governed community that’s an alternative to VMware,” he continued. The watch word here is “OpenStack”, the open-cloud project Red Hat didn’t participate in because it disagreed with a governance model and direction dominated by a single company: Rackspace.
oVirt has been in development for three months with Red Hat working on the governance model in with the other five according to Trielof.
“We are not owning the community as Rackspace did around OpenStack. If we really believed there should be an open virtualisation solution then the first question is not what’s best for Red Hat, but it’s what’s best for the community and the user base. If the project thinks in those terms… then it’ll be good for us and everybody.”
Red Hat won’t lose out by putting RHEV-M into the community, he claimed, because the company will continue to sell enterprise subscriptions to the hypervisor and management software. He drew an analogy between Red Hat’s patronage of Fedora and its use as part of the company’s Linux distro.
Trielof said the oVirt Project is being set up as a hybrid of Apache and Eclipse: the former being a place that hot-houses projects and elevates individuals to leadership roles based on their contributions to projects while the latter is based on the ability to build a technology framework that’s enriched through a lively ecosystem of plug-ins from ISVs and individual developers. ASF founding member Jim Jagielski is helping on oVirt board to get it established.
Red Hat went with an independent organisation rather than dropping the oVirt projects into Apache because some features in the KVM are under the GPL and LGPL.
Also, there’s no opportunity in Apache to release the hoped-for 15 oVirt projects as an integrated release – projects would have to be developed separately.
Trielof added that Red Hat wanted to create a brand and a location for the community rather than simply shunting it into Apache.
The vision is to emulate Eclipse, not just from the perspective of having an ecosystem of plug-in providers, but also in terms of delivering an integrated platform composed of projects and technologies.
Eclipse now provides an annual synchronised update for many of its projects to remove bugs and give those using and developing products on Eclipse a reliable platform. Eclipse in June put out its biggest single release so far: 62 projects, or 46 million lines of code.
Eclipse was kick-started by IBM in November 2001 as a community and technology project. IBM primed the project by donating $40m worth of source code from its Websphere Studio Workbench to create an open-source framework for Java and C++ server-side development.
The idea, on paper, was to build an open IDE framework that didn’t lock in tools partners and that saved ISVs from having to constantly re-invent the basic building blocks of IDE such as syntax editors.
Red Hat was among the founding members of Eclipse with IBM, SuSE, Borland Software – now owned by MicroFocus – TogetherSoft, borged by Borland, and others. The group expanded quickly, hitting 80 members by the end of 2003, ultimately pulling in Oracle and SAP. By the late 2000s when it came to tools, the world was represented by Eclipse on Java and Microsoft on .NET.
Eclipse consolidated IBM’s WebSphere while putting some tools vendors, like Borland, out of business and driving others into the arms of IBM. Eclipse broke out of being just an IDE project for the server in into PC and web development, business intelligence and even runtimes.
If oVirt is successful the Eclipse model might prove a burden for Red Hat as well as hurt VMware. For all the success of Eclipse and its long-list of community members, it’s been IBM that over the years has consistently provided most of the manpower and code commits.
That has meant while Eclipse has helped IBM’s Java business, IBM has been left pick up the bill on development while struggling to sell paid-for tools against Eclipse, whose price point starts at zero dollars.
NoSQL is Catching On in the Enterprise
NoSQL is becoming less and less exotic everyday. A recent Evans study charts NoSQL adoption in the enterprise.
An Evans Data survey of 1,200 developers, 400 of which are enterprise developers, found that 56% of enterprise developers already use schemaless databases and 63% plan to use one in the next two years. Adoption is particularly strong in the APAC countries.
Among the general developer community, only 43% of respondents plan to use NoSQL in the next two years.
A few other findings from the survey, according to the announcement:
- Although Mac OS is now more popular than Linux as a development desktop environment, Windows continues to dominate with over 80% using some version of Windows as their primary platform.
- Almost 40% of North American developers are now working on apps for a wireless device.
- Eighty percent of North American developers expect to be writing multi-threaded apps in the next two years.