Monthly Archives: August 2016

2 posts

How to choose between Database Options in Amazon AWS

Image From Amazon
Image From Amazon

Every DB admin knows the painful process of patching and updating DB servers and not to mention maintaining the backup and high availability of your database.

The importance of this subject comes from the fact that your data is probably the most important part of your IT asset. If you lose your servers, instances, or network devices there is a good chance that you can replace them, but if you lose your data the cost of loss can be irreparable.

Another limitation of using in-house database is your intention to stick to one platform because of the large maintenance overhead and licensing cost of dealing with multiple platforms, so you intentionally impose another limitation to your application.

Plus you probably need to consider scalability and availability factors for each platform before you start your deployment because your database server is not something that you can easily shutdown for upgrade and reconfiguration.

The better way, though, would be utilising AWS managed services which are ready for you to start your database server. Using a managed database service gives you several benefits:

  • Super easy to setup through wizard
  • Simple to operate and reduce the admin tasks
  • Cost efficient
  • Easy to scale to a larger instance
  • Built-in disaster recovery into multiple availability zones
  • Choose between SQL or No-SQL database option (DynamoDB)
  • Flexible to use multiple RDS platform (Amazon Aurora, Oracle, SQL Server, PostgreSQL, MySQL, MariaDB)

Amazon provides a fully managed Relational  (Amazon RDS) and NoSQL (Amazon DynamoDB) database services along with in-memory caching (ElastiCache) and data warehouse service (Amazon Redshift);

Which database to choose highly depends on your application and data types; Do you need to maintain data models? How many concurrent users do you want to support? What is the size and type of objects? How big is your data and what is your growth rate?

 

Amazon Relational Database Service (RDS)

A relational database is a database that organises data into tables( or “relations”) with columns and rows, and unique key, identifying each row. Relational Databases provide many benefits:

  • You can run complex queries and have flexible indexing
  • Data duplication is eliminated by design
  • More granular security as the data is split into tables

With Amazon RDS offerings you can start and run your choice of database platform in minutes and be worry free on maintenance and operations overhead; Here are different RDS options from Amazon:

  • Amazon Aurora: MySQL-compatible relational database engine with the simplicity and cost-effectiveness of open source databases and up to five times better performance than MySQL.
  • Amazon RDS for MySQL: A managed MySQL database with full features and capabilities of a MySQL.
  • Amazon RDS for MariaDB: Scalable and resizable managed MariaDB deployments in the cloud.
  • Amazon RDS for PostgreSQL: A full featured PostgreSQL database with all the capabilities of the open source installation.
  • Amazon RDS for Oracle: Deploy multiple editions of Oracle Database in minutes with cost-efficiency and resizable hardware capacity.
  • Amazon RDS for SQL Server: Deploy multiple editions of SQL Server (2008 R2, 2012 and 2014) including ,Web, Standard and Enterprise (2008 R2 and 2012 only for Enterprise).

If you need more capacity, you can Scale your Amazon RDS vertically or horizontally:

  • Vertical Scaling: through updating to a larger instance and/or faster storage
  • Horizontal Scaling: You can create read-only replicas of your production and horizontally scale your database. If you want to distribute the load between multiple instances you may need to use  a data partitioning approach in which your application needs to be aware of this configuration type.

All Amazon instances including databases are running on a highly available and durable infrastructure; If you want to achieve higher availability at data centre level you can run your database under Amazon RDS Multi-AZ deployment which creates a synchronous instance of your production database to another availability zone (in the same region) as a standby version. Amazon automatically failover to this standby instance should your primary database experience an outage.

 

Amazon NoSQL database (Dynamo DB)

NoSQL is a type of database that doesn’t use relational tables as you see in relational databases. NoSQL databases are mainly used in big data and real-time web applications. In NoSQL databases, you can use a variety of data models, like Key-Value pairs, graphs … .

Comparing with RDS here are some benefits of using NoSQL databases:

  • Can handle Large volumes of structured, semi-structured and unstructured data
  • Can be easily designed and deployed as there can be no structure in the data
  • Low latency and high performance of accessing larger data types (Document and Key-Value store)
  • Highly scalable through horizontal scaling

Amazon DynamoDB is a fully managed NoSQL database offering from Amazon that can provide all the benefits of NoSQL database in a fully managed service that is flexible, fast and scalable.

Scalability of a NoSQL database is achieved through data partitioning that can scale the read/write capacity by adding more instances horizontally. The Amazon DynamoDB scalability is built-in to the service and will grow based on you database load.

The high availability of Amazon DynamoDB is achieved through synchronisation of data replicas across three facilities in an AWS region.

 

Data Warehouse

Data Warehouse is a type of database that is used for data analysis and reporting on the larger amount of data. Data Warehouse is a core component of business intelligence which collects and integrates data from different sources in business, for example, IT, Sales, Marketing, etc and provides the required data for your reporting service.

Data warehouse system collects and process data from multiple sources, so in most situation the rate of data growths and the need for scaling the server is unavoidable. As the data warehouse is relational by nature the scalability process would be a complicated and costly task.

Amazon Redshift is a fully managed data warehouse system which removes all the pains of maintaining an in-house data warehouse system.

Here are some features/benefits of using Amazon Redshift:

  • It is Fast: Optimised for data warehouse by utilising parallel processing architecture, reduce the required I/O for queries with data compression, zone maps …
  • It is Scalable: If you need to add more nodes or increase capacity it’s a matter of a few clicks. your database will be in read-only mode during the upgrade and you have your new data warehouse server with more capacity.
  • It is Cheap: Ther is no upfront cost and you just pay for the resources / Instances that you use.
  • It is Fully managed: It means it is easy to start, operate and maintain. Built-in fault tolerance and automated backup give you a complete peace of mind.

How to Design for a Loosely Coupled System

Loose CouplingYou may have heard the term Loose Coupling in the SOA (Service Oriented Architecture) in software design or IaaS (Infrastructure as a Service) or PaaS (Platform as a Service) in the cloud automation topics. For me, it was a bit hard to catch at the beginning, but this simple explanation helped me better understand the actual meaning of Loose Coupling:

If your Apple IPOD’s battery dies, you need to replace the whole device as you can’t  simply change the battery! That can be an example of a tightly coupled system which the health of your IPOD is tightly depended on the battery which is not replacable. An example of a loosely coupled in that aspect, I’m sure you can make a lot of examples.

When designing in the cloud (or any complex system) Loose Coupling plays an important role towards scalability and reliability of your applications which result in an easier automation. Loose Coupling is an approach against having a big complex (IT) system or applications; Instead  design for smaller and simpler elements which can cooperate with each other to provide the same service.

In a loosely coupled system:

  • Each component/element can scale independently if needed.
  • Each can be modified separately.
  • Failure in an element does not affect the rest of a system and
  • Recovery from a failure is by far easier comparing a tightly coupled complex system.

All of these make a loosely coupled system to be more manageable, reliable and scalable and as a result, the operation’s tasks can be easily automated.

A very simple example of decoupling is when you remove a database service from an application server and deploy it on a dedicated server; The Same concept applies when you want to design your loosely coupled application, or decouple your existing complex system.

If you want to create a loosely coupled application you first need to make sure you Separate the Components, then define Standard Communication Interfaces for your elements; You may need to design an  Automatic Service Discovery at each layer or design based on Asynchronous Integration method; Another aspect of a loosely coupled system is how Graceful is a Failure handled which leads to minimum/no effect to the rest of the system. These concepts are explained in this Amazon best practice document, And here is a summary:

 

Loosening as a Mindset

Whenever you built a component (can be a physical, virtual, or a piece of code ,.. ) stop for a moment and ask these questions?

  • What happens if this element fails and what would be the effect on the service?
  • Can I scale/change/recover this object without touching another part of the system?

Based on your answer to above questions, you need to rethink your design. Here are some strategies that can help you create a Loosely Coupled applications:

 

Build on top of a Standard Communication Interface

Instead of creating heavily customised connections between your components, create standard interfaces and APIs (like RESTful API) and make the communications through these interfaces; This way, you reduce/remove dependencies between your components as they are all communicating through your standard communication  APIs; As they are less dependent to each other, failure in any part doesn’t prevent others from doing the normal operations.

One of the tools that can help you develop/build standard Interfaces is Amazon API Gateway. With this fully managed service, you can create APIs that can act as a “front door” for you applications to access their data and decouple them from the rest of the system.

 

Create a Built-in Service Discovery

When your components are decoupled and separated, they simply need a way to find and communication with each other. In this situation you have two option:

Hard code the communication information (e.g., the static IP address of the database server in the application server)
Develop an automatic service discovery built into your system

The latter is preferable as your sub-application can consume without prior knowledge about other components; as a result, you can add/remove without any service outage or required change.

One way you can achieve service discovery is through Amazon ELB (Elastic Load Balancer); With the unique DNS name of each load balancer you can assure a reliable service discovery from your application/web server whenever they want to consume another service (e.g, database). You can even add more abstraction layer by creating a DNS CNAME to decouple at load balancer from the DNS name.

 

Asynchronous Integration

If you have applications that don’t need an immediate response in the communication, you can decouple them from directly interacting each other by utilising and an intermediate storage layer (Like Amazon SQS). In this communication, normally one component generates events and another  one consumes that event. So the source system sends the message to the external queuing system – instead of directly to the target application- and the target (consumer) consumed the message from the queue:

Image From Amazon
Image From Amazon

As you can see if any of the controllers fails other controllers can continue operating as per normal by putting/getting the message from the queues. you can also scale up/down each controller without affecting other layers.

 

Graceful Failure (Detect/Recovery)

A loosely coupled system can tolerate fault by gracefully recovery from failure.  Here are some of the strategies to develop a graceful failure in your system:

  • Amazon Route 53 DNS failover feature can monitor and detect a failed server and stop referring traffic to the failed component.
  • You can utilise front-end caching systems which can redirect users to the cache content when the main wen site fails
  • A failed task can be stored in a queue to be processed later when the system is healthy.