{"id":3199,"date":"2020-07-27T17:54:09","date_gmt":"2020-07-27T15:54:09","guid":{"rendered":"https:\/\/tekmart.co.za\/t-blog\/?p=3199"},"modified":"2020-07-27T17:54:09","modified_gmt":"2020-07-27T15:54:09","slug":"how-to-prevent-and-recover-from-server-failure","status":"publish","type":"post","link":"https:\/\/tekmart.co.za\/t-blog\/how-to-prevent-and-recover-from-server-failure\/","title":{"rendered":"How to prevent and recover from server failure"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time-approximately:<\/span> <span class=\"rt-time\"> 3<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<h2 class=\"wp-block-heading\"><strong>Hardware, software and facility issues can lead to server failure. With the right protocol and preventive maintenance, you can reduce failure amount and troubleshoot time.<\/strong><\/h2>\n\n\n\n<p>By<\/p>\n\n\n\n<p><a href=\"https:\/\/www.techtarget.com\/contributor\/Jacob-Roundy\">Jacob Roundy<\/a><\/p>\n\n\n\n<p>Published:&nbsp;<strong>02 Jun 2020<\/strong><\/p>\n\n\n\n<p>Server failure is a common issue that affects all organization types and sizes, and the cost of server downtime can include days without system access to loss of critical business data. This can lead to operation issues, service outages and repair costs.<\/p>\n\n\n\n<p>Potential causes of failure can originate in the server hardware, software or the data center facility. If you understand what can cause server failures, you can head off issues before they develop and avoid downtime altogether, but it&#8217;s best to have a contingency plan in place if a server failure does happen.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What can cause a server to fail?<\/strong><\/h3>\n\n\n\n<p>If you receive an alert or notice something off, the first step to resolve server failure is identify how and why a server failed; how fast you can do this can be the difference between\u00a0minutes and days of downtime. Common reasons for server failure include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Overheating.<\/strong>\u00a0If a server runs at too high a temperature, it can lead to poor performance or complete failure.<\/li><li><strong>Hardware issue.<\/strong>\u00a0Sometimes, a hardware component simply breaks. This could be due to a failure in the actual component, such as a battery failure or a disk failure, a malfunction in the cooling system or the equipment&#8217;s age.<\/li><li><strong>Software issue.<\/strong>\u00a0An outdated OS can collapse under high-traffic operations, and unvetted patches can lead to bugs or data corruption.\u00a0Software upgrades and updates\u00a0can also fail and cause new issues.<\/li><li><strong>System overload.<\/strong>\u00a0Peak traffic periods and full server logs can result in system overload and failure.<\/li><li><strong>Cyberattack.<\/strong>\u00a0A lack of network security or an outdated, unsupported OS can leave servers\u00a0vulnerable to cyberattacks\u00a0that can paralyze or crash the server.<\/li><li><strong>Natural disaster.<\/strong>\u00a0Earthquakes, fires, flooding and thunderstorms can wreak havoc on network systems and cause service outages.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How to prevent common server failures<\/strong><\/h3>\n\n\n\n<p>Constant reboots and sudden slowness indicate a faulty server. The better you can spot these signs, the faster you can act. A\u00a0server monitoring\u00a0software can help you keep tabs on servers and let you closely monitor critical systems and get alerts for any potential issues.<\/p>\n\n\n\n<p>Along with a monitoring tool set, there are also preventive maintenance steps you can take to ensure server uptime and health.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>Ensure optimal environment temperature.<\/strong>\u00a0Servers need proper ventilation and\u00a0temperature control\u00a0to avoid overheating. Check for dirt and dust buildup on both interior and exterior surfaces and adjust temperature settings as needed.<\/li><li><strong>Conduct routine maintenance.<\/strong>\u00a0Hardware issues tend to be the most difficult to predict and prevent because they can happen at random. Pay attention to the age of each server, perform routine disk checks and regularly update\/upgrade the system. When the time comes, replace outdated parts or the machine altogether. Predictive analytics can also help identify when parts might fail.<\/li><li><strong>Regularly install updates.<\/strong>\u00a0Install software, OS updates and patches on a regular basis. This keeps performance up and protects servers from easily exploitable software vulnerabilities.<\/li><li><strong>Maintain strict access control and detailed event logs.<\/strong>\u00a0Human error is nearly impossible to eliminate. Automation can minimize human error, but human intervention is still required. To lower risk, maintain strict records of who can access the server room and management software. You should also\u00a0keep detailed event logs\u00a0and review them on a regular basis.<\/li><li><strong>Monitor performance trends.<\/strong>\u00a0With continuous performance monitoring reviews, you can better predict required resources for peak periods and identify sluggish performance, which might be a sign of an imminent failure. These trends might also reveal potential hardware and software issues or areas of a server room that require additional cooling. Make sure you maintain log files, empty the recycling bin, delete files in temporary folders, and defragment hard drives tasks to preserve performance levels and avoid system overload.<\/li><li><strong>Develop a server contingency plan.<\/strong>\u00a0Redundancy is a big component to prevent downtime from server failure. A server contingency plan should establish available secondary hardware such as multiple power sources,\u00a0redundant RAM\u00a0and backup servers.<\/li><li><strong>Design a disaster and data recovery plan.<\/strong>\u00a0In the event of a natural disaster or security breach, a\u00a0disaster recovery plan\u00a0and a data recovery plan will save you from long periods of downtime and catastrophic data loss. Having a backup plan is essential for the worst-case scenarios.<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How to resolve and recover from server failure<\/strong><\/h3>\n\n\n\n<p>Even if your servers fail despite preventive maintenance, there are steps you can take to effectively recover. Aside from a restart, there are visual cues and diagnosis software you can use to narrow down a possible cause.<\/p>\n\n\n\n<p>Once you&#8217;ve identified the root cause, then you can switch to a backup server and take the requisite steps to repair the machine failure.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time-approximately:<\/span> <span class=\"rt-time\"> 3<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>Hardware, software and facility issues can lead to server failure. With the right protocol and preventive maintenance, you can reduce failure amount and troubleshoot time. By Jacob Roundy Published:&nbsp;02 Jun 2020 Server failure is a common issue that affects all organization types and sizes, and the cost of server downtime can include days without system access to loss of critical<\/p>\n<p><a class=\"more-link\" href=\"https:\/\/tekmart.co.za\/t-blog\/how-to-prevent-and-recover-from-server-failure\/\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38,39,16,57],"tags":[],"class_list":["post-3199","post","type-post","status-publish","format-standard","hentry","category-best-practices-for-data-center-operations","category-data-center-systems-management-2","category-how-tos-and-other-useful-tips-and-tricks","category-server-hardware-strategy"],"_links":{"self":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/3199","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/comments?post=3199"}],"version-history":[{"count":1,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/3199\/revisions"}],"predecessor-version":[{"id":3200,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/posts\/3199\/revisions\/3200"}],"wp:attachment":[{"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/media?parent=3199"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/categories?post=3199"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tekmart.co.za\/t-blog\/wp-json\/wp\/v2\/tags?post=3199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}