Amazon EC2 monthly price tables for SaaS-relevant configurations, in euros

EC2 is Amazon’s cloud service for flexible use of virtual machines. One of the big advantages of this service is that Amazon’s hourly prices also include the cost for the Windows and/or SQL Server license. Amazon has just announced lower prices for Windows EC2 instances.

It’s not always easy to calculate total monthly costs for various configurations, so in this post I will try to give an indication of montly costs for configurations that would be relevant for SaaS applications with 1,000 – 50,000 customers.
Note: In the monthly prices below I have incorporated the upfront costs for 1- and 3-year contracts in the monthly prices (spread out over the full term).
All prices are calculated for the EU region (Ireland).

Windows web server. No additional storage.

Configuration Resources On-demand (hourly) p/m 1-year term p/m 3-year term p/m
M1 large instance 2 cores, 7.6 GB memory €200 €131 €99
M1 extra large instance 4 cores, 15 GB memory €400 €262 €198
M3 double extra large instance 8 “second-generation” cores, 30 GB memory €859 €554 €419

Traffic out (to the internet) is an additional €92 per TB.

Load-balanced high-activity Windows web server.

Amazon Elastic Load Balancer
3 M3 double extra large EC2 instance on 3-year terms
5 TB of internet traffic per month
Total cost per month = €1763

Database server, EC2 instance with Windows server 2008/2012 and SQL Server Web Edition 2008/2012.

Prices below include 1TB of database storage on EBS (Elastic Block Storage, Amazon’s virtual drives). For instances without guaranteed I/O I have included 500 million I/O requests per months. Upfront costs have been spread over the months of the term.

Configuration Resources Guaranteed I/O On-demand (hourly) p/m 1-year term p/m 3-year term p/m
High-memory extra-large instance 2 cores, 17 GB memory (no guaranteed I/O) €463 €292 €249
High-memory double extra-large instance 4 cores, 34 GB memory 500 Mbps €798 €467 €384
High-memory quadruple extra-large instance 8 cores, 68 GB memory 1000 Mbps €1491 €829 €666

Note that you would also need additional storage for database backups, which could add €100-500 per month, depending on backup methods and backup retention.

Meta-database design in the cloud

When designing the (relational) database infrastructure for a SaaS (Software as a Service) application, one of the most important decisions is how to store the data of each SaaS customer:

  1. Store the data of each customer in a separate database
  2. Store the data of all customers in a single database
  3. Store the data of each customer in a separate schema in the same database
  4. Store groups of customers in the same database, using several (but a relatively small number of)  databases

Here I’d like to discuss my thoughts and preferences about some of these options.

Database terminology

I am using the terminology as it is used for Microsoft SQL Server (which mostly is the same for MySQL),  but which is different for Oracle:

  • A database-server instance is the complete database service. In Oracle this is called a database.
  • A database  is a group of data with its own database schema, storage, security, and backup/recovery. By default in SQL Server each database has as its storage one database file and one log file.
    Oracle doesn’t have the concept of a database: what comes closest is a schema in Oracle, but this is not exactly the same. Nevertheless, for simplicity most people use: “database in SQL Server/MySQL” = “schema in Oracle”.
  • A schema is a group of data-objects within a database, with its own definition. A database can contain multiple schemas. Usually is schema is tied to a user (in Oracle the correspondence between a user and a schema is still strict), but in SQL Server (since SQL Server 2005) a schema is not strictly tied to a specific user anymore. MySQL doesn’t have the concept of multiple schemas within a database.

Advantages of having a separate database for each customer

Now, in my experience, this subject of how to design the meta-database structure for a SaaS application is something that is not always well thought out, or not even thought about at all. Also, in internet discussions no agreements or guidelines on this subject can be found at all. Personally, I am in favour of solution 1 above, giving each customer a separate database, for these reasons:

  1. Security of database queries. The data of each customer is separated for all other customers, and there is no need to think about that each SQL query or statement must always include a customer identifier. This is a huge issue, because small mistakes, like adding OR-clauses to an SQL query without adding the correct parentheses, can have a dramatic impact, if, for example, you delete data for all customers instead of just one.
  2. Straightforward scalability. The databases of each customer can be spread out of over as many servers as you like (if the SaaS applications are designed in such a way that they use a separate database connection for each database), resulting in very easy scaling when the number of customers grows. Also, it is very easy to redistribute databases to different servers.
  3. Easy backup and restore of data of a single customer. With each customer having its own database, restoring data is simply a case of restoring the backup of a single database (which is pretty easy). Having all customers in a single database, this is very complicated, needing custom tools, or even impossible.
  4. Separate SQL query plans for each customer. Having all customers in the same database can give huge problems with the query plans the database system sets up. Basically, the first query of a specific type run after startup of the server determines the query plan. If other customers have a different data-distribution, this will not work well at all.
  5. Perfomance tuning per customer. There a customers with just a few MBs of data, and customers with TBs of data. Data will be distributed differently for each customer. In each separate database you can tune the indexes, views, and stored procedures to give optimal results for that customer. For example, in SQL Sever, very small databases might need no indexes at all, whereas very large databases might need extensice index tuning, or even things like materialized views.
  6. Version-migration. Updating the SaaS software to a newer version almost always requires changes to the database schema. Having all customers in the same database requires that all customers are always migrated simultaneously. Having a separate database for each customer, you can upgrade customers as you see fit, giving, for example, options such as piloting changes on specific customers.
  7. Database customization. Usually, application customizations for specific customers also require extra tables or columns in the database. Again, having a separate database for each customer, this is much easier to realize, than when all customers are in the same database (and share exactly the same database schema).

Drawbacks of having a separate database for each customer

Of course, almost just as many drawbacks can be found:

  1. Maintenance on many databases. If you are used to having a DBA (Database Administrator) that does lots of manual, daily, maintenance tasks on a single, large, database, this will not work if you have hundreds or thousands of databases. All database-maintenance tasks will need to be full automated (which is a good thing anyway, I think).
  2. Limit on number of databases per server. SQL Server has a hard limit of 32,767 databases per server instance. Long before you reach this number, the number of databases will be limited by the amount of memory and other server resources. However, I have had 30,000 databases (lightly used) running on a single server with 8GB of RAM, so this hard limit is not as theoretical as some people think.
  3. Cross-customer queries. Having all customers in the same database, makes it easier to do queries over all customer (or groups if customers, such as all customers managed by the same accountant). With separate databases you will need to have a separate data-warehouse that collects all data needed over all customers (but having a separate data-warehouse for this is not a bad idea anyway).
  4. Database mirroring. Trying to do real-time database-mirroring on large amount of databases will not work (because of network resources and threads needed for each databases). Microsoft gives approximately 200 as the maximum number of databases you can mirror on a single server. If you have thousands of separate databases, fully-automated log-shipping is currently probably the best way to minimize data-loss.
  5. Direct SAN-level snapshots can not be used. Many SANs (or similar network-based storage solutions) offer quick storage backups on the block-level of the storage. For this to give consistent backups of a database server, all database I/O to the storage needs to stop for a moment. SAN snapshot tools achieve this in cooperation with the Windows Volume Shadow-copy Service (VSS). However, this will stop all I/O to the volume for a few seconds for each database. For thousands of database this is not an options.
    Best solution for this, when having thousands of separate databases, is to first make standard database backups to a separate storage volume, and then do a SAN-snapshot of that backup volume.

Cloud-based relational database service options in regards to number and size of databases

OK, let’s now look at how all of this is relevant when moving the database infrastructure to the cloud. Currently, the cloud-based relational database options are:

  • Amazon Relational Database Service (RDS). This is a service build on top of Amazon’s EC2 (virtual machines) and EBS (virtual hard drives). With it, you get a ready-to-use SQL Server, Oracle, or MySQL instance, but without access to the underlying OS or file system.
    • Maximum number of databases per instance is 30.
    • Maximum storage per instance is 1TB for SQL Server, and  3TB for MySQL or Oracle
    • Default maximum number of instances is 10. Need to ask Amazon for more.
    • Price (for SQL Server Web edition) is €100-1500 per instance per month (depending on CPU and memory)
  • Microsoft Azure SQL Database. This cloud-service offers “databases as a service”. So you don’t get a full server instance, but you get completely separate databases.
    • Maximum number of databases is 150. Credit check by Microsoft for more.
    • Maximum database size is 150GB.
    • Price is €4 per month for a database < 100 MB to €170 per month for a database of 150GB.
  • Cloud-based virtual machines running a database server. You can, of course, always run a full database server instance on any cloud-base virtual machine. You then have to set up the storage (RAID), backup, and failover systems yourself, which requires a lot of expertise.

With this, it is clear that the current cloud-based database services have problems for any SaaS meta-database infrastructure:

  • The number of databases you can have in a cloud-database service is quite low, and/or prohibitivly expensive for very many (thousands) of databases.
  • The maximum database size is also quite low, so putting all SaaS customers in a single database is also not an option (besides negating the advantages of having a separate database for each customer that I have mentioned above).

Then, what should you do, if you have thousands of customers for your SaaS application, and want to move your database infrastructure to the cloud? Right now the solution would be:

  • Put your biggest databases (databases with master data, central customer data, or databases of biggest customers) into a cloud-based database service, for biggest reliability and performance.
  • Run all other customer databases on various cloud-based virtual machines with your own DMBS (Database Management System)  configuration.

Conclusion

All in all, the situation with respect to cloud-based relational databases services I not ideal yet, and I hope that both the manageability and pricing of having very many databases in a cloud-based relational database service improves soon.

Iain Banks, world’s greatest living science-fiction writer, is terminally ill

Most of his readers probably already know this, but I only read about it yesterday: Iain Banks, in my opinion, the greatest science-fiction writer of the last 25 years, has announced that he’s terminally ill, with possibly only a few months to live. Sad news. Best wishes too all.
Having discovered his books 4 years ago, Banks’ books made me read science-fiction again after many, many years of disinterest (I read all the classic science-fiction works by Heinlein, Herbert, Vance, etc. in my teens). Nobody in human history has described in so much detail a utopian society, with all its challenges and possibilities as Banks has done with his “Culture”. His books are in turn very clever, witty and uplifting, and bleak and even hopeless at other times.

Some Banks trademarks, which have become science-fiction standards:

  • The AI characters (in the form of spaceships or drones) are the funniest and smartest in the book.
  • The spaceship names are poetry in themselves. Some of my favorite spaceship names in Banks’ books:
    • “Attitude Adjuster”
    • “All through with this Niceness and Negotiations Stuff”
    • “Me, I’m Counting”

    Too many to mention, this is a full list.

My favorite Banks novels:

  • Against a Dark Background. Great adventure (a classic quest, the search for a superweapon, the “Lazy Gun”), superb heroine and supporting characters. Great, if bleak ending.
  • Use of Weapons. Banks’ masterpiece. Very complex (but extremely rewarding) story. Has Banks’s funniest AI (the drone Skaffen-Amtiskaw), and his most enduring heroine (Diziet Sma). A must-read.

Why are so few people using Google’s 2-factor authentication?

Google’s 2-factor authentication makes it exponentially harder for your Google account to be hacked, by requiring, after entry of your password, an extra code generated by a smartphone app on a smartphone that was previously linked to your Google account. Since securing your e-mail account is crucial (especially since your e-mail account is used for password retrieval/reset of very many websites and applications), I view it as something everyone with a Google account simply must use, to be able to sleep easy. Especially now you only have to enter the generated security code only once per device ever (and not every 30 days anymore, as was the case previously), I find it pretty painless to use.

However, of all the people I know with a Google account (almost everyone), only one person (besides myself) is actually using Google 2-factor authentication. Why is that? When asked (and some guessing from my side), the following reasons emerge:

  1. Never heard of it. Google 2-factor authentication isn’t very well advertised. It’s not like you get a message every time you log in without using it.
  2. Don’t care about security. Most people I know simply don’t care about online security. They vaguely hear some things about it, but it never comes up that you can actively do something about it. If some hacking event happens to themselves, it’s treated as a fact of life, that you simply can not help.
  3. Too much effort. For many people even using different, strong, passwords for every website, and using a password manager, is already way too much of an effort. Using a smartphone during login, and typing an extra code, is unthinkable.
  4. Setting up Google’s 2-format authentication is too complex. The process is pretty straightforward (you use your smartphone to take a picture of a QR code, to link your smartphone), but still a big hurdle for many people to even consider.
  5. Re-authentication after loss of phone is cumbersome. If you lose or reset your phone (or buy a new one), you first have to unlink your previous phone (using one of the recovery codes, that you hopefully printed out), before you can link your new phone. After having this done once, many people come to the conclusion never to do that again, and don’t re-activate 2-factor authentication.
  6. Application-specific passwords are hard to find and use. Some applications need to have access to your Google account without you being present to login interactively (think mail and calendar applications on desktops and devices). For this, Google has so-called application-specific passwords, which are 16-letter passwords that can be (or at least should be) used for only one application, and are displayed only once by Google (after having used them you can not view them again, you would have to generate a new one).
    Not only is the place to generate these passwords very hard to find (hidden somewhere in your Google account “Security” settings), but the whole concept of what these passwords are, and how they should be used is foreign to most users. Also, because using an application-specific password for only one application is not (and can not be) enforced, it can be a cause of security loopholes.

I am certainly not a security evangelist, but I think that Google has spend a lot of effort in making 2-factor authentication as easy and painless to use as possible (as opposed to other companies, such as Blizzard, where you have to call support, and supply credit-card info etc. to link your account to a new phone), and I think that everyone with a Google account should use it. Still, as the points above indicate, there’s still a long way to go until everyone understands the need for it, and the process becomes easy enough for absolutely everyone to use it.

Some important gotchas when starting with Amazon RDS SQL Server

RDS is the relational database service of Amazon Web Services. It is a ready-to-use cloud service: No OS or RAID to set up; you just specify the type of RDBMS you want, the memory/storage sizes, and a few minutes later you have a database instance up and running. The instance has its own (very long) DNS name, and you can access this database server from any place in the world using the standard SQL Server tools, if you give the client IP-addresses access through the “Security Group” linked to the database instance. Currently RDS is offered for Microsoft SQL Server, Oracle, and MySQL.

My company (Yuki) is currently using RDS for some smaller projects, and I’m investigating if/how our main database infrastructure could be moved to RDS, so that we can achieve improved scalability, world-wide reach, and lower maintenance costs on our database servers (which are by far the biggest bottlenecks for the web applications of Yuki). In this process I have discovered some important gotchas when using RDS SQL Server, that are not that well advertised, but can be big stumbling blocks:

  1. The sysadmin server role is not available. That’s right, you specificy a master user/password when creating the RDS instance, but this user is not in the sysadmin role; this user does have specific rights to create users and databases and such. Amazon has, of course, done this to lock down the instance. However, this can be a big problem when installing third-party software (such as Microsoft SharePoint) that requires the user to have sysadmin rights on the SQL Server.
  2. The server time is fixed to UTC. The date/time returned by the SQL Server function GETDATE is always in UTC, with no option to change this. This can give a lot of problems if you have columns with GETDATE defaults, or queries that compare date/time values in the database to the current date/time. For us, this currently is a big problem, which would require quite extensive changes in our software.
  3. No SQL Server backup or restore. Because you have no access to the file system (and the backup/restore rights are currently locked down in RDS), you can not move your data to RDS by restoring a backup. You have to use BCP or other export/import mechanisms. It also means, that you can only use the backup/restore that Amazon offers for the complete instance, meaning that you can not backup or restore individual databases. This point could easily be the biggest hurdle for many companies to move to RDS SQL Server.
  4. No storage scaling for SQL Server. Both Oracle and MySQL RDS instance can be scaled to larger storage without downtime, but for SQL Server you are stuck with the storage size you specify when you create the instance. This is a huge issue, since you have to allocate (and pay for!) the storage right at the start, when you have no idea what your requirements will be in a year’s time. This greatly undermines the whole scalability story of AWS.
  5. No failover for SQL Server. Again, both Oracle and MySQL can be installed with “Multi-AZ Deployment”, meaning there is automatic replication and failover to a server in another datacenter in the same Amazon region. No such option for SQL Server, meaning your only option in failure situations is to manually restore a backup that Amazon made of your instance.

All in all, still quite a few shortcomings, and some of which can be hurdles for deployment that can not be overcome. Personally, I love the service, as setting up the hardware and software for a reliable database server is a really complex task, which not very many people have any serious knowledge of. Let’s hope that Amazon keeps up its quick pace of innovation to improve the above points for RDS SQL Server.

Introduction

My name is Sebastian Toet, I live in the Netherlands, and I’m the Chief Software Architect at Yuki, a SaaS (Software as a Service) company that offers online accounting and general business administration services. I was trained as a theoretical physicist (at the University of Technology in Eindhoven, and the Department of Theoretical Physics of the University of Amsterdam). I’m active in software development since 1993, first at Exact Software in Delft (The Netherlands), and since 2005 at the company I co-founded, that was first called FamilyWare, and is now called Yuki.

Since 1995 I have worked on web-based ERP/accounting systems, and some of my main interests lie in the following areas:

  • Relational database systems and structures for complex data, complex reporting needs, and high number of concurrent users.
  • Optimum strategies in Software Release Management for SaaS providers.
  • Workflow systems that deliver the actual service in SaaS.

The last point is pretty important to me, as many SaaS providers don’t actually provide a real service, but just provide online access to software. At Yuki my main attention goes to building great workflow systems, that deliver the actual (accounting) service that really allow users to leave the work to others.

In this blog I want to talk about, and discuss, some of the experiences and insights I have had in the last 15 years in creating web-based ERP and accounting systems, and especially about those subjects that are generally not discussed very much, or at all, live macro-scale database infrastructure  and design, internal workflow system for SaaS providers, patch management for SaaS providers , etc.

SQL Server Blog

Thoughts about SaaS software architecture

Brent Ozar Unlimited®

Thoughts about SaaS software architecture

Visual Studio Blog

Thoughts about SaaS software architecture

Microsoft Azure Blog

Thoughts about SaaS software architecture

AWS News Blog

Thoughts about SaaS software architecture

%d bloggers like this: