SQL Server on Amazon Web Services: 2014 in review

I have a particular interest in the combination of Microsoft SQL Server and the cloud services of Amazon Web Services, for the following reasons:

  • For almost all web applications (and for all SaaS applications in particular) the database system is by far the most business-critical component, where absolutely nothing should go wrong, both in performance and reliability.
  • Microsoft SQL Server has always been an incredibly robust and user-friendly relational database system, and of course a no-brainer if you work within the Microsoft ecosystem. Its (unavoidable) move to core-based licensing has driven up licensing costs (and lost it some friends), but especially the licensing costs of SQL Server Web Edition are still very low (the Enterprise edition is far too expensive, though, keeping it and its features out of reach of many companies).
  • Amazon Web Services (AWS) dwarves all other cloud-service providers, and is in most cases the default choice, even when using Microsoft products. 2014 has seen a very rapid pace of innovation from Amazon. The only time that AWS hasn’t been the leader was when in march 2014 Google drastically reduced the prices for its cloud storage (to $0,026/GB/month). Only a day later AWS followed with a similar price reduction of around 60% for cloud storage.

So, in review, some of the important features and innovations of AWS in 2014, relating to SQL Server:

  1. Cloud storage price reduction (march): The prices of Amazon S3 have been reduced by around 60% (to $0,03/GB/month). This makes it even easier to store a very generous amount of SQL Server backups and snapshots in S3. It will still be a negligible amount on your monthly bill from AWS. My backup strategy is now to to make EBS snapshots of the primary SQL Server data disks every 10 minutes, make transaction-log backups every 15 minutes and snapshot the disks with the log-backups, and make nightly full SQL Server backups, and store the full backups in S3 separately.
  2. New memory-optimized instances (april): The R3 instance types use new Intel Ivy Bridge processors, and go up to 32 cores and 244 GB RAM. These instances should now be the default choice when setting up your own SQL Server instance. Some important issues, though, with these instances:
    • The R3 instances have no direct competition from Google, so they have not been included in any price reductions and are not very cheap. r3.2xlarge, with 8 cores and 61GB RAM and SQL Server Web Edition is $1.555 per hour.
    • SQL Server Web Edition (the best choice for starting web companies) can only use 64GB of RAM, so any instance type bigger than r3.2xlarge is wasted.
  3. Multi-Availability zone (mirroring) support for SQL Server in Amazon’s Relational Database Service (RDS) (may): Although I’m personally moving away from using Amazon RDS  (Amazon’s RDMS PaaS solution), mainly because its lack of backup options on the database level, and its lack of storage scalability, RDS is still a great way to start with Amazon’s cloud services. This Multi-AZ support is based on SQL Server database mirroring, which brings a few issues:
    • SQL Server mirroring is not supported for the SQL Server Web Edition, so no Multi-AZ if you use this version of SQL Server.
    • Microsoft seems to be moving away from mirroring as a disaster recovery method, towards AlwasOn availability groups.
  4. SSD-backed Elastic Block Storage (june): Previously only the instance storage (that is lost on stopping and starting an instance) could be an SSD. Now all virtual drives can (and should be) SSD storage. This is a huge gain for SQL Server, which benefits tremendously from the increased seek time of SSD. When using SSD drives on Amazon I don’t even bother anymore with reserved IOPS, because a SSD of 1 TB (currently the maximum size) already gives you 3000 IO per second.
  5. New Amazon region in Frankfurt (october)Until now, Ireland was the only region (a set of datacenters) of Amazon within the EU. With this second region, it has become an option to spread out your SQL Server instances, backups, and snapshots over multiple regions, while still keeping all data within the EU.
  6. Private DNS (november). You can now give your SQL Server instance easy-to-use network names.
  7. New payment options for reserved instances (december)In most cloud solutions, the database servers are by far the most expensive instances. This also makes the upfront costs for reserving these instances a big hurdle, even if the costs over the full term are lower. There are now new payment options for reserved instances without upfront payments, where you still get a 30% price reduction if you reserve the instance for a year.

Announced, but no yet available:

  • EBS volumes larger than 1TB. I have always been skeptical of software-based RAID-ing of virtual drives in a cloud environment, so it’s great to see that EBS volumes of up to 16TB have been announced, making life a lot easier for administrators of large SQL Server databases.

Curious about:

  • Amazon Aurora – new “cloud-optimized” relational database engine. I’m always a bit skeptical when cloud-companies feel the need to build their own “cloud-optimized” relational database engine. As we have seen with Azure SQL, companies are reluctant or unable to re-architect their applications to effectively use these database engines, and also really need the full feature set of the full database product (such as database backups, point-in-time restores, etc.). But we’ll see how this turns out.

Wishlist for 2015:

  • SQL Server Enterprise as “license-included”. Currently, Amazon does not offer server instances that includes SQL Server Enterprise. You have to buy your own licenses, that need to include Microsoft Software Assurance, and somehow certify these licenses to use them in the cloud. The upfront cost and hassle of this is simply too great. Microsoft Azure offers instances with SQL Server Enterprise included, so is there a reason why Amazon can’t or is not allowed to?
  • Faster EBS snapshots. I really like the snapshot mechanism of EBS, but at the moment they often take such a long time, that in practice you can take them (at best) only every 15 minutes (I would prefer every 5 minutes), making them less than ideal as a disaster recovery solution.
  • Make RDS SQL Server more useful. Currently, RDS SQL Server has some limitations that make it impractical for quite a few use cases:
    1. Make storage scalable. Currently the storage used by a SQL Server RDS instance can not be increased.
    2. Make database-level backups available.
    3. Lift the limit of 30 databases.
  • Improve the AWS console UI. This is a common complaint from almost all AWS users: The web-based console UI is still very primitive (especially for such crucial cases as IAM security), and it’s always a guess which feature works in which browser.

All in all, it’s been a pretty good year for all of us running SQL Server in AWS, with the general availability of SSD-drives as the highlight. Happy 2015 everyone!

Why are there no out-of-the-box high-availability solutions for SQL Server in the cloud?

When selecting any cloud service, you would like to get some guarantees about the reliability (uptime) and durability (what is the chance that your data is lost) of the service. Now, let’s see what we can find on cloud relational database services, either as a PaaS (Platform as a Service), or installed as part of an IaaS (Infrastructure as a Service, i.e. virtual machines) configuration.

  • Windows Azure SQL Database gives an uptime SLA of 99.9%. The database is replicated treefold (within the same datacenter), but Microsoft gives no SLA for durability of the data. It indicates that to secure your data against widespread datacenter failure, you will need to export your complete database yourself (as a SQL script, or a BACPAC file).
  • Amazon RDS (Relation Database Service)  has no SLA at all for uptime or durability. In the past, RDS has turned out to be the most unreliable of Amazon’s services, because it is build on top of the EC2 services (virtual machines) and the EBS service (virtual drives). RDS uses SQL Server logfiles to be able to restore a complete server instance (not separate databases) to a maximum time of 5 minutes ago. It also maintains daily snapshots of the instance, which can be retained for a maximum of 35 days. RDS also offers live replication and failover to another datacenter (Multi-AZ), but this is not available for SQL Server.
  • SQL server running on virtual machines of Amazon (EC2) or Microsoft (Azure Virtual Machines) have no SLA. There is only an SLA for the underlying virtual machines from both Amazon and Microsoft for 99,95% uptime.

So, from this I have to conclude there’s really a lot of uncertainty about the reliability and durability of any database service in the cloud. It also means that for SaaS applications where you can not afford to a have a few hours (or more) downtime you have to look into separate high-availability solutions. What is available in this area, specifically for SQL Server?

  • Windows Azure SQL Database has no built-in high-availability solution. You can copy a database to a new database in the same datacenter, but to copy it to another datacenter you have to export/import all data to a new database. Failover to a database copy will have to be managed manually.
  • Amazon RDS has a Multi-Availability-Zone option (Multi-AZ), where a database replica is maintained in another datacenter in the same region. This option, however, is only available for Oracle and MySQL, not for SQL Server. For SQL server your only option is to make instance snapshots, but you can not copy RDS snapshots to another datacenter.
  • On Amazon or Microsoft Azure virtual machines, you can implement any of the standard high-availability solutions of SQL Server (SQL Server 2012 AlwaysOn, database mirroring, log shipping). But keep in mind the following:
    • SQL Server 2012 AlwaysOn, and asynchronous database mirroring,  are only available in the Enterprise Edition of SQL Server. You will have to buy your own license for this, which is extremely price ($7000 per core).

    All of these high-availability solutions that you implement yourself on top of cloud virtual machines require a lot of expertise, ongoing management, and more than double the cost of the solution.

Conclusion: In general, for SaaS suppliers durability of data is much more important than uptime. In most cases 99,9% uptime would be sufficient (less than 1 hour downtime each month), but data durability has to be “Nine Nines” (99,9999999%), because any permanent data loss is unacceptable. Currently, none of the cloud relational database suppliers can offer this reliabiliy “out of the box” for SQL Server. Amazon RDS’s Multi-AZ solution comes closest (at almost double the price), but it is not available for SQL Server. Suggested manual solutions (based on export/import of data) are in many cases completely unrealistic for large amounts of data, because it could take days to complete an import of terabytes of data.

With all this,  are cloud relational database services currently an option for business-critical applications? I think they are, but the situation with regards to high-availability needs to improve rapidly, so that high-availability is simply an option to check. In the mean time, especially when you need to keep costs in check, we will need to rely on old-fashioned methods like log-shipping and data-export to safeguard our own data.

Amazon RDS for SQL Server has some ridiculous storage constraints

When you think about cloud computing, you think scalability, flexibility, etc. Right? Not really when you look at the Amazon Relational Database Service (RDS) for SQL Server, at least the storage part of that service. RDS is Amazon’s database PaaS (Platform as a Service) offering for relational databases, and it does a lot of things right: New instances running with a few clicks, backups fully-automated with restore to point-in-time functionality, good monitoring tools, etc. However when it comes to the storage part of the SQL Server part of RDS, there are some serious constraints:

  • You can not change the allocated data storage after creating the SQL Server instance. That’s right, you have to allocate, and pay for, the maximum storage that you will ever use in the future. Not only is this impossible to predict, but it means that you pay for a lot of unused storage, something “the cloud” was supposed to prevent.
  • When selecting guaranteed IO performance (provisioned IO per second, or IOPS) additional constraints are present:
    • The storage must be an exact multiple of 100GB. Storage costs $0,138 per GB/month.
    • The selected IOPS must be exactly 10 times the storage in GB (so with 200GB storage you must select 2000 IOPS). IOPS costs $0,11 per IOPS/month.

Now, for any serious databases use, you need reliable performance, which means that you almost always have to select the provisioned IOPS option. So if I create a database instance with 1 TB storage (the maximum, which I have to select because later expansion is not possible), I have to select the 10,000 IOPS option, and I am paying $138 per month for the storage and $1100 per month for the IOPS. In addition to that, of course, you also pay for the instance itself.

Strangely enough, the storage constraints when using Oracle and MySQL are much less strict: You can scale the storage up at any time, without downtime, and when selecting provisioned IOPS the storage can be any value between 100GB and 3TB, and the IOPS/storage factor can be any value between 3 and 10.

So what is going on here? Do Windows Server and/or SQL Server have such inherent constraints that the same storage flexibility as for Oracle/MySQL can never be achieved? Or has the Amazon RDS team just not yet had the time to fully come to grips with the Microsoft systems?

Some important gotchas when starting with Amazon RDS SQL Server

RDS is the relational database service of Amazon Web Services. It is a ready-to-use cloud service: No OS or RAID to set up; you just specify the type of RDBMS you want, the memory/storage sizes, and a few minutes later you have a database instance up and running. The instance has its own (very long) DNS name, and you can access this database server from any place in the world using the standard SQL Server tools, if you give the client IP-addresses access through the “Security Group” linked to the database instance. Currently RDS is offered for Microsoft SQL Server, Oracle, and MySQL.

My company (Yuki) is currently using RDS for some smaller projects, and I’m investigating if/how our main database infrastructure could be moved to RDS, so that we can achieve improved scalability, world-wide reach, and lower maintenance costs on our database servers (which are by far the biggest bottlenecks for the web applications of Yuki). In this process I have discovered some important gotchas when using RDS SQL Server, that are not that well advertised, but can be big stumbling blocks:

  1. The sysadmin server role is not available. That’s right, you specificy a master user/password when creating the RDS instance, but this user is not in the sysadmin role; this user does have specific rights to create users and databases and such. Amazon has, of course, done this to lock down the instance. However, this can be a big problem when installing third-party software (such as Microsoft SharePoint) that requires the user to have sysadmin rights on the SQL Server.
  2. The server time is fixed to UTC. The date/time returned by the SQL Server function GETDATE is always in UTC, with no option to change this. This can give a lot of problems if you have columns with GETDATE defaults, or queries that compare date/time values in the database to the current date/time. For us, this currently is a big problem, which would require quite extensive changes in our software.
  3. No SQL Server backup or restore. Because you have no access to the file system (and the backup/restore rights are currently locked down in RDS), you can not move your data to RDS by restoring a backup. You have to use BCP or other export/import mechanisms. It also means, that you can only use the backup/restore that Amazon offers for the complete instance, meaning that you can not backup or restore individual databases. This point could easily be the biggest hurdle for many companies to move to RDS SQL Server.
  4. No storage scaling for SQL Server. Both Oracle and MySQL RDS instance can be scaled to larger storage without downtime, but for SQL Server you are stuck with the storage size you specify when you create the instance. This is a huge issue, since you have to allocate (and pay for!) the storage right at the start, when you have no idea what your requirements will be in a year’s time. This greatly undermines the whole scalability story of AWS.
  5. No failover for SQL Server. Again, both Oracle and MySQL can be installed with “Multi-AZ Deployment”, meaning there is automatic replication and failover to a server in another datacenter in the same Amazon region. No such option for SQL Server, meaning your only option in failure situations is to manually restore a backup that Amazon made of your instance.

All in all, still quite a few shortcomings, and some of which can be hurdles for deployment that can not be overcome. Personally, I love the service, as setting up the hardware and software for a reliable database server is a really complex task, which not very many people have any serious knowledge of. Let’s hope that Amazon keeps up its quick pace of innovation to improve the above points for RDS SQL Server.

Microsoft SQL Server Blog

Thoughts about SaaS software architecture

Brent Ozar Unlimited®

Thoughts about SaaS software architecture

Visual Studio Blog

Thoughts about SaaS software architecture

Microsoft Azure Blog

Thoughts about SaaS software architecture

AWS News Blog

Thoughts about SaaS software architecture

%d bloggers like this: