{"id":8525,"date":"2023-07-29T11:03:51","date_gmt":"2023-07-29T09:03:51","guid":{"rendered":"https:\/\/tekmart.co.za\/t-blog\/?p=8525"},"modified":"2023-07-29T11:03:52","modified_gmt":"2023-07-29T09:03:52","slug":"what-is-fault-tolerance-in-a-computer-system-a-simple-definition","status":"publish","type":"post","link":"https:\/\/tekmart.co.za\/t-blog\/what-is-fault-tolerance-in-a-computer-system-a-simple-definition\/","title":{"rendered":"What is fault-tolerance in a computer system?-a simple definition."},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time-approximately:<\/span> <span class=\"rt-time\"> 3<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<h2 class=\"wp-block-heading\"><strong>Fault-tolerant technology is a capability of a computer system, electronic system or\u00a0network\u00a0to deliver uninterrupted service, despite one or more of its\u00a0components\u00a0failing. Fault tolerance also resolves potential service interruptions related to software or logic errors. The purpose is to prevent\u00a0catastrophic failure\u00a0that could result from a\u00a0single point of failure.<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn.ttgtmedia.com\/rms\/onlineImages\/kranz_garry.jpg\" alt=\"Garry Kranz\"\/><\/figure>\n\n\n\n<p>By <a href=\"https:\/\/www.techtarget.com\/contributor\/Garry-Kranz\">Garry Kranz<\/a><\/p>\n\n\n\n<p>VMware vSphere 6 Fault Tolerance is a branded, continuous data availability architecture that exactly replicates a VMware\u00a0virtual machine\u00a0on an alternate physical\u00a0host\u00a0if the main host\u00a0server\u00a0fails.<\/p>\n\n\n\n<p>Fault-tolerant systems are designed to compensate for multiple failures. Such systems automatically detect a failure of the\u00a0computer processor unit, I\/O subsystem, memory cards, motherboard, power supply or network components. The failure point is identified, and a\u00a0backup\u00a0component or procedure immediately takes its place with no loss of service.<\/p>\n\n\n\n<p>To ensure fault tolerance, enterprises need to purchase an inventory of formatted computer equipment and a secondary\u00a0<a href=\"https:\/\/tekmart.co.za\/data-center-infrastructure\/ups-uninterruptible-power-supply\/uninterrupted-power-supply-ups\">uninterruptible power supply\u00a0device<\/a>. The goal is to prevent the crash of key systems and networks, focusing on issues\u00a0related to uptime and downtime.<\/p>\n\n\n\n<p>Fault tolerance can be provided with&nbsp;software&nbsp;embedded in&nbsp;hardware, or by some combination of the two.<\/p>\n\n\n\n<p>In a software implementation, the operating system (<a href=\"https:\/\/tekmart.co.za\/software\/microsoft-server-products-dsp\" data-type=\"URL\" data-id=\"https:\/\/tekmart.co.za\/software\/microsoft-server-products-dsp\">OS<\/a>) provides an interface that allows a programmer to\u00a0checkpoint\u00a0critical data at predetermined points within a transaction. In a hardware implementation (for example, with Stratus and its Virtual Operating System), the programmer does not need to be aware of the fault-tolerant capabilities of the machine.<\/p>\n\n\n\n<p>At a hardware level, fault tolerance is achieved by\u00a0duplexing\u00a0each hardware component.\u00a0Disks\u00a0are mirrored. Multiple processors are lockstepped together and their outputs are compared for correctness. When an anomaly occurs, the faulty component is determined and taken out of service, but the machine continues to function as usual.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Fault tolerance vs. high availability<\/strong><\/h3>\n\n\n\n<p>Fault tolerance is\u00a0closely associated with maintaining\u00a0business continuity\u00a0via\u00a0highly available\u00a0computer systems and networks. Fault-tolerant environments are defined as those that restore service instantaneously following a service outage, whereas a high-availability environment strives for five nines of operational service.<\/p>\n\n\n\n<p>In a high-availability\u00a0cluster, sets of independent servers are loosely coupled together to guarantee system-wide sharing of critical data and resources. The clusters monitor each other&#8217;s health and provide fault recovery to ensure applications remain available. Conversely, a fault-tolerant cluster consists of multiple physical systems that share a single copy of a computer&#8217;s OS. Software commands issued by one system are also executed on the other system.<\/p>\n\n\n\n<p>The trade-off between fault tolerance and high availability is cost. Systems with integrated fault tolerance incur a higher cost due to the inclusion of additional hardware.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What is graceful degradation?<\/strong><\/h3>\n\n\n\n<p>Fault tolerance is often used synonymously with\u00a0graceful degradation, although the latter is more aligned with the more holistic discipline of\u00a0fault management, which aims to detect, isolate and resolve problems pre-emptively. A fault-tolerant system swaps in backup componentry to maintain high levels of system availability and performance. Graceful degradation allows a system to continue operations, albeit in a reduced state of performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Matching data protection and fault tolerance<\/strong><\/h3>\n\n\n\n<p>Fault tolerance hinges on\u00a0redundancy. Namely, information is redundantly protected via data\u00a0Replication\u00a0or\u00a0synchronous mirroring\u00a0of\u00a0volumes\u00a0to an off-site\u00a0data center. For physical redundancy, extra hardware equipment remains on standby for\u00a0failover\u00a0of operational systems.<\/p>\n\n\n\n<p>Data backup is frequently combined with redundancy. Both strategies are intended as a safeguard against data loss, although backup tends to focus on\u00a0point-in-time recovery, including granular\u00a0recovery\u00a0of a discrete data\u00a0object. Redundant systems are engineered specifically for application\u00a0workloads\u00a0that tolerate very little downtime.<\/p>\n\n\n\n<p>When implementing fault tolerance, enterprises should match\u00a0data availability\u00a0requirements to the appropriate level of data protection with redundant array of independent disks (RAID). The\u00a0RAID technique\u00a0ensures data is written to multiple\u00a0hard disks, both to balance\u00a0I\/O\u00a0operations and boost overall system performance.<\/p>\n\n\n\n<p>Organizations that prioritize fault tolerance above speed and performance would be best served by\u00a0RAID 1\u00a0disk mirroring or\u00a0RAID 10, which combines disk mirroring and\u00a0disk striping. If fault tolerance and system performance are equally important, an enterprise may find it worthwhile to spend a little extra money combining RAID 10 with\u00a0RAID 10 with RAID 6, or double-parity RAID, which tolerates the loss of two disk failures before data is lost. Aside from higher cost, the other drawback is data writes occur more slowly to the RAID set.<\/p>\n\n\n\n<p>Aside from hardware, a fault-tolerant architecture should be coordinated with regularly scheduled backups of critical data, perhaps including a mirrored copy at a secondary or alternate location.\u00a0Security\u00a0needs to be part of the planning to prevent unauthorized access, and to apply\u00a0antivirus tools\u00a0and the most recent version of the computing system OS.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Which industries depend on system fault tolerance?<\/strong><\/h3>\n\n\n\n<p>Fault tolerance refers not only to the consequence of having redundant equipment, but also to the ground-up methodology computer makers use to engineer and design their systems for\u00a0reliability. Fault tolerance is a required design specification for computer equipment used in\u00a0online transaction processing\u00a0systems, such as airline flight control and reservations systems. Fault-tolerant systems are also widely used in sectors such as distribution and logistics, electric power plants, heavy manufacturing,\u00a0industrial control systems\u00a0and retailing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time-approximately:<\/span> <span class=\"rt-time\"> 3<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>Fault-tolerant technology is a capability of a computer system, electronic system or\u00a0network\u00a0to deliver uninterrupted service, despite one or more of its\u00a0components\u00a0failing. Fault tolerance also resolves potential service interruptions related to software or logic errors. The purpose is to prevent\u00a0catastrophic failure\u00a0that could result from a\u00a0single point of failure. By Garry Kranz VMware vSphere 6 Fault Tolerance is a branded, continuous data<\/p>\n<p><a class=\"more-link\" href=\"https:\/\/tekmart.co.za\/t-blog\/what-is-fault-tolerance-in-a-computer-system-a-simple-definition\/\">Read More<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38,35,8,4,30,3,265,9],"tags":[],"class_list":["post-8525","post","type-post","status-publish","format-standard","hentry","category-best-practices-for-data-center-operations","category-data-center-facilities","category-data-center-hardware","category-datacenter-news","category-expert-advise-and-opinion","category-industry-news-and-expert-advise","category-tech-definition-updates","category-tech-definitions"],"_links":{"self":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/8525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/comments?post=8525"}],"version-history":[{"count":1,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/8525\/revisions"}],"predecessor-version":[{"id":8526,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/8525\/revisions\/8526"}],"wp:attachment":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/media?parent=8525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/categories?post=8525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/tags?post=8525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}