HA Architecture: An introduction to High Availability
Damage to reputation can be as crippling as financial losses when an organization’s computer systems and applications become inaccessible due to unplanned downtime. This is especially true if it takes hours for the affected systems to go back online. To prevent extended downtime, you should implement a high availability (HA) architecture for your IT infrastructure, allowing you to achieve as much as 99.999% uptime and keep service disruptions to a minimum. This article explains HA architecture in more detail.
Definition of HA Architecture
There are organizations that require their systems to be operational 24/7. For these organizations, HA architecture is essential. While HA does not guarantee that systems will not be hit by unplanned interruptions, it minimizes the impact of such interruptions on your operations. A more responsive system is another benefit of HA.
HA architecture ensures that your systems are up and running and accessible to your users in the face of unforeseen circumstances such as hardware and software failures. With it, you use multiple components to ensure continuous and responsive service. You must make sure that these components complement each other, or you will just be adding potential points of failure to your applications, increasing the probability of downtime.
Below are the four traits that a HA architecture must possess:
- Redundant hardware: Lack of redundant hardware means no requests can be served until a server is restarted after a crash. When this happens, downtime is inevitable. Thus, your HA architecture must include backup hardware such as servers or server clusters that take over automatically in case of production hardware crashes.
- Redundant software and applications: To prevent potential downtime whenever there are failures in the software and applications used in your production environment, it is crucial that your HA architecture includes backup software and applications.
- Redundant data: Database servers that go offline for one reason or another can wreak havoc on your production environment. Your HA architecture should include provisions for backup database servers to which processing can be shifted whenever a production database server goes offline.
- No single point of failure: A failure in a single component should not crash your entire infrastructure. With redundancy in hardware, software, and data, single points of failure are eliminated.
For organizations where continuous operation is not essential, HA may not be necessary, especially since it requires investments in new hardware and software, and it can drive up your maintenance and other related costs. Before deciding on a HA architecture, make sure to factor in the costs associated with adding more components to your infrastructure. If you do decide to integrate HA into your infrastructure, choosing the cloud over on-premises infrastructure can help your organization save on costs. Make sure that the returns are worth the investments you put in additional components.
How to Attain High Availability
The ideal HA architecture should include elements to ensure redundancy, data backup and recovery, automatic failover, and load balancing.
Redundancy
As discussed above, redundancy is a crucial trait of HA, although it can drive up the costs of achieving HA. When adopting redundancy, you can choose from five models, with each model progressively more costly as more components are required.
- N+1 model: This requires an independent backup for each component in your infrastructure. It can be active/passive, meaning that the backup components are on standby and ready to take over when a main component goes down, or active/active, meaning the backup components are running simultaneously with the main components. Although this is the least costly model, it is not entirely redundant. Thus, it may not be entirely suitable for large systems.
- N+2 model: This is like the N+1 model but requires two backup components for each main component. If a backup component also fails, the other backup component takes over.
- 2N model: This doubles the number of resources required to run the system. For example, if a system requires four servers to run, a 2N model adds another four servers to the system, putting the total number of servers at eight. Thus, the system always has the capacity to run, even if multiple components go down.
- 2N+1 model: This is like the 2N model except that it adds another backup component that can take over when there are downtime issues with the additional capacity.
- Geographic redundancy: This is the most expensive model as it distributes systems in servers across multiple locations. When a location goes down, another site takes over, keeping your operations running. Given its costs, signing up with a cloud services provider with datacenters around the world is your best option if you decide to go with this model.
Data Backup and Recovery
Regular, full data backups are required to ensure HA and should be included in your disaster recovery planning. Make sure to test your data backups regularly and see that they can be recovered in no time at all.
Moreover, you should replicate your data by storing them in secondary servers or standby instances across multiple locations. The data in these locations should always be synchronized with the data in your primary location. The other locations should be ready to take over when disaster strikes your primary location.
Automatic Failover with Failure Detection
In cases of failure, backup systems should be ready to take over instantly in a process known as automatic failover. Timely failure detection is crucial for this system to work.
HA architecture with automatic failover looks like the following:
- There is a main system and a backup system known as the hot spare.
- Constant monitoring goes on between the main and backup systems.
- When the main system goes down, the backup system, or hot spare, takes over automatically. New requests are now handled by the backup system, which is now acting as if it is the main system.
- When the issues in the main system are resolved, it comes back online and resumes its original role. The hot spare goes back to being the backup.
Users are not aware of any changes throughout the above process. When handled properly, the failover is transparent to them.
Load Balancing
HA architecture ensures better and more reliable application performance using load balancing, a process that involves distributing network traffic across multiple servers using either a hardware- or software-based solution.
You should configure your load balancer to use an algorithm suitable for your requirements. Common load balancing algorithms include:
- Round robin: The load balancer directs incoming requests to the first server, then the second, and so on.
- Least connection: The load balancer finds the server with the least number of connections and directs incoming traffic to that server.
- Source Internet Protocol (IP) hash: The load balancer always directs requests according to the source’s IP address. This involves looking for a server that is closest to the incoming request.
By itself, load balancing does not assure HA since it can still be a single point of failure. To resolve this potential issue, you should implement redundancy for your load balancing solution.
How to Measure High Availability
While often used interchangeably, availability and uptime are different from each other. While a system can be up and running, it may not necessarily be available to users due to factors such as network issues. Moreover, uptime is just one area of availability, with downtime being the other.
Availability refers to the probability of a system working during a specific period. Expressed in percentage, it is an important metric to consider when assessing service level agreements (SLAs) with potential vendors of hosted solutions.
To calculate availability, consider the following:
- Annual operating time
- Standby time
- Total time devoted to annual preventive maintenance
- Total time devoted to annual corrective maintenance
- Time spent waiting on administrative and logistical delays
The ideal measurement of availability is termed as the five 9s, or 99.999% availability over a given period. This means downtime that is slightly above five minutes in a year. If 24/7 operations are crucial to your organization, you should strive for 99.999% availability.
Types of High Availability Clusters
Server clusters are groups of servers that support your HA architecture. The health of each node in the cluster is monitored constantly using dedicated network connections. When a node goes down, another node takes over its operations.
When designing your HA architecture, you can choose from different cluster types, including the following:
- Active/passive cluster: In this cluster type, there is a main node that is always active, meaning it handles all requests coming from your users. The backup node is passive or inactive, meaning it cannot handle incoming requests. However, when the main node goes down, the passive node takes over. When the issue with the main node is resolved, it resumes handling all requests while the backup node goes back to being passive again.
- Active/active cluster: In this cluster type, the nodes are all active and can handle incoming requests. When a node goes down, one of the other active nodes takes over. When the issue with the other node is resolved, the requests are again distributed to all nodes within the cluster.
- Shared-nothing vs shared-disk cluster: With a shared-nothing cluster, each node has its own database, which is synchronized with the databases in the other nodes. Thus, when a node goes down, the other nodes remain operational. Thus, a shared-nothing cluster is crucial to achieving HA. In contrast, a shared-disk cluster is characterized by all the nodes sharing a single database, meaning that when the database goes down, all nodes go down as well.
Parallels RAS: High Availability at No Additional Cost!
Parallels® Remote Application Server (RAS) features High Availability Load Balancing (HALB) out of the box. With HALB, Parallels RAS removes traffic restrictions on multiple gateways, allowing any active gateway to handle incoming traffic. Moreover, since it allows multiple HALB appliances to run simultaneously, Parallels RAS maximizes throughput and reduces the potential for downtime.
Parallels RAS lets you choose from two methods for handling incoming traffic, namely, resource-based and round-robin. Resource-based load balancing involves routing incoming requests to the gateway handling the least traffic. Round-robin load balancing means that requests are routed to available gateways in sequential order.
Want to explore how you can use Parallels RAS within your HA architecture?