Daily Archives: July 21, 2016

3 posts

Automating the IT Operations Tasks in Amazon AWS! Reality or Illusion!

Image from Amazon
Image from Amazon

Automation is one of the Principles of Architecting in the Cloud and for most IT operations engineers is the end goal. But can we really get to the point that all the operation’s tasks became automated?

The answer is yes!

When you properly design your applications and infrastructure and make them stateless and reusable resources by the techniques we’ve discussed here and here, you’ve already started your automation journey:

  • Making your applications and components stateless you can easily scale them with minimal/no manual process involved which you can take it as automation.
  • When you make your servers stateless and reusable you’ve already utilised some automation to bootstrap your computing resources.
  • Having your Infrastructure available as a code is also a step towards automation on your infrastructure resources.

Now that you did your homework to design an automation-ready application, servers and infrastructure, you can sit back and let AWS do most of the manual jobs for you.

Autoscaling, AWS Elastic Beanstalk, CloudWatch, OpsWork and Lambda are some of the amazing Amazon’s tools that can help you increase the efficiency and reliability of you IT service as well as increasing the productivity of your IT operations.

These services can help you automate responding to multiple events that you would manually react in traditional IT environments.

Auto-Scaling

Scalability has always been an important factor in designing at any layer of your IT infrastructure. If your applications and infrastructure have been designed for scalability you can move to the next step and automate your scalability tasks.

Using Auto Scaling feature in AWS you can dynamically add more server instances when there is a demand for it -triggered by CPU, Memory and Load – so you maintain the availability of your service. You can remove your resources on quiet times automatically  and save some money.

I believe this is one the coolest AWS features which gives you the peace of mind on your application availability and reliability and also remove the cost of underutilised servers.

AWS Elastic Beanstalk

Let’s imagine you have an application and you want to deploy and provision required resources for it. Yes, you need to provision the servers. load balancers, database, application servers (PHP, Java, Python,…), configure all of the components and connections, patch your server and start uploading you application code to your environment. This will take hours if  not days assuming you have all the resource available.

With AWS Elastic Beanstalk all you need is to do is upload your application code and Elastic Beanstalk does the hard job for you. It handles all the resource deployments, capacity provisioning, load balancing, auto-scaling and health monitoring for you so your application will be up and running in minutes.

You may ask what application platform is supported? the answer is almost all of the well-known platform are supported by AWS Elastic Beanstalk:

Java, Net, PHP, Node.js, Python, Ruby, Go and Docker

And it’s free!! There is no extra charge for Elastic Beanstalk and you just pay for the AWS resources needed to store and run you application.

Amazon CloudWatch

CloudWatch is a monitoring service for AWS Cloud services. You can collect and track metrics/logs from AWS resources and set alarms based on the status of your service. And most importantly you can automatically respond to any change in your AWS resource.

For example, you can Automatically Recover your EC2 Instance in case of failure. EC2 instances are similar you your (virtual) server is AWS world. you can create a CloudWatch alarm to monitor the health of your EC2 instance and automatically recover it in case of any hardware/software failure. The new instance is completely identical to the original one with same ID, private/public IP ,… Note that this feature is not supported by all the instance types.

You can also call/run functions (through Amazon SNS and Lambda) when a specific CloudWatch Alarm triggers as a result of a metric change.

AWS OpsWorks

One of the features of AWS OpsWork is that you can automatically update your configuration based on the life cycle events of instances. For example when a new database instance is created in your server farm – a life cycle event -, OpsWork can call for a Chef recipe that is responsible for updating your application servers so that they can use this new database server. This is a very common example of Continuous Configuration of instances.

Why Your Servers are Not Important Anymore

Image from Amazon AWS
Image from Amazon AWS

Unlike the old days that you had to build server from scratch and spend hours to install the software, patch the servers and  setup the static configuration (IP, Name, …), in cloud computing, you can dynamically create lots of servers with all the required components and configurations pre-deployed; As a result:

Your servers become just a temporary computing resources to do the processing for you.

Any update, patching or fix required for the server, not a problem in the cloud! The old server will be removed and replaced with a new updated, patched and healthier version.

So making your servers as Disposable Compute Resource is one of the Principles of Architecting in the Cloud; But how to convert the time-consuming task of a server build to a smooth reusable task/ process?

 

Bootstrapping

Even with the dynamic provisioning of resources in the cloud, your servers still come with a default configuration and installed applications. Bootstrapping is:

Pushing some script to the server to customise your OS from top to bottom including installing and setting up a piece of software.

There are multiple ways to bootstrap servers:

  • You can push power shell /bash scripts to the server, or
  • You may use configuration management tools like Puppet or Chef recipes;
  • Cloud-init and user data scripts are also other ways to auto-config servers during the boot process.
  • AWS CloudFormation and AWS OpsWork are two main tools in AWS that can help you bootstrap your server

 

Golden Images

If you need some faster approach with fewer dependencies on external components, you can apply all your nice customization to a server and prepare it to be your golden image.

Now with your golden image, you can deploy as many servers as you want from your image which includes all the pre-build software and configurations.

 

Hybrid (Bootstrapping & Golden Image)

You can use the combination of the two methods to get the best out of your auto provisioning. The question is when/where to use each method over another!? It all depends on your deployment but as a general rule:

  • Things that are less likely to change between your instances (i.e, software installations) are best items to be put in your golden image; Installing software even automatically can be a time-consuming job.
  • On the other hand, things that are more likely to change in different deployments better to be deployed by bootstrapping. (i.e, Minor software updates and application specific configuration like database configuration.)

The good example of this is AWS Elastic Beanstalk that provides pre-configured servers with all the required software but lets you also use bootstrapping to customise your environment variables.

 

Bootstrapping, Golden Image or the combination of the two, are part of Server Instantiating Approach by which you make your server provisioning an automated/repeatable process.

Let’s move one step forward on this. What if you want to extend your automation job beyond your servers and make your entire infrastructure acting as  programmable resources that itself can be converted to a reproduced  process.

 

Infrastructure as Code

When you transform your whole infrastructure to codes, a new window of possibilities opens to you:

Anything that can be converted to software, can be programmed, and anything that can be programmed can be reused,reproduced and automated.

An example of a tool that can help you move towards to an infrastructure as code enabled environment is AWS CloudFormation; With AWS CloudFormation you can create, manage and develop your AWS resource as codes (Networks, Load Balances, Security Policies, ….); Multiple AWS resources can be programmed together and attached to your application to enable creating a reusable end-to-end environment including your server resources as well as your infrastructure resources.

 

How to Design Scalable Applications in Amazon AWS

Image from Amazon AWS
Image from Amazon AWS

Almost every applications and systems grow over time (users, data, traffic, …); You may have different application with different growth rate, but how to make sure your infrastructure can catch up with your application/data getting bigger and bigger:

You must build your applications on top of a scalable infrastructure that can grow with your system and handle extra loads over time.

This is why building a highly scalable system is one of the Important Factors when Designing in the Cloud.

When thinking about scalability in IT world you have two general options:

  • Vertical
  • Horizontal

Vertical Scaling (Scale Up)

Vertical scaling is basically adding more CPU, RAM, I/O or networking capacity to a Physical or Virtual server , storage or networking device. But there is a problem with this kind of scaling:

You have a limit on the amount of resource you can add  to a server, memory and CPU for example; Plus you put your application atvthe risk of a big server goes down.

Horizontal Scaling (Scale Out)

When you scale horizontally you increase the number of resources – for example adding more server or network devices-. The benefit for this kind of scaling is:

Technically you can scale with no limit and start creating an Elastic system that can scale up/down based on your application requirements.

But you need to consider the fact that not all the applications support distributing the load on multiple resources; In other words, your application needs to be Stateless.

Stateless Application

Stateless Application doesn’t need to know about user’s session or any previous user’s interaction with an application so any node can be gracefully added/removed with no end-user disruption. For example, a static web website that doesn’t provide any user login feature so doesn’t need to keep the user session information.

Stateless Component

It’s too good to have everything stateless but in reality, you always have some sort of session/state you need to maintain in your applications, here are two situations:

  • You may have your user’s session/login information in your web application that you need to maintain.
  • In multi-step data processes, you need to keep track of previous tasks/activities in your process flow.

In these situations you can remove these state data from your nodes and store them somewhere else and let the Components be Stateless:

  • You can save user’s session data in the Managed Databases (Amazon DynamoDB) and detach session data from your servers and make them stateless.
  • If you need to keep your user’s files (pictures, data, batch processing results …) you can put them on a highly available  Managed Shared Storage ( Amazon S3, Amazon EFS).
  • You can use a Managed Workflow Services (Amazon Simple Workflow Service (SWF)) to store the execution history in a central shared location when you want to keep track of multi-step workflow process and make it stateless.

Stateful Applications/Components

What if you can’t (or don’t  want to) make your component stateless:

  • Like some legacy applications that doesn’t support this by nature.
  • Or you may not want your users to move between your nodes. (i.e, some multi-player gaming applications which require very low latency when users playing the game.)

There is a way to scale these Stateful Applications/Components which is what we know as Session Affinity strategies. when using session affinity there are some limitations on how well you can distribute the load when you add/remove the node to the cluster as users are attached to a specific node; As a result of users not being able to move between servers you may not have a fully load balanced servers.

After you create your beautifully designed scalable systems you need to distribute the load to these nodes, there are two high-level Load Distribution Strategies you can use: Push Method / Pull Method.

Distributed Processing

Imagine a case which you need to process a massive amount of data and hardly you can find a single server that is capable of handling the processing load. Here is another type of Horizontal Scaling comes into play:

Distributed Processing is splitting a big task (and it’s big data) to many smaller jobs, and process them in parallel and in multiple/many servers.

Here are two general solutions to handle distributed processing:

  • You can use a Distributed Data Processing Engine (e.g, Apache Hadoop, Amazon Elastic MapReduce(EMR)) to manage and process a massive amount of distributed data.
  • In case you have a large stream of real-time data you can use Amazon Kinesis to divide your data into multiple portions and process them by multiple computing resources in your server farm.