So an entire server can go down, and the other servers pick up the slack, but if one hard drive fails it throws a wrench in the "cloud" and the whole network goes down?
Or in this latest instance, did the entire SAN fail and need to be replaced. I presume that the SAN is also redundant, or is that an incorrect assumption.
I really do want to understand, not just bash you guys when things are down. I simply expected that the sites would always be up because if one part failed it could be replaced while the other parts of the system took over the load of the damage piece. I assumed that would include hard drives, since they do fail.
Results 11 to 17 of 17
Thread: Server down again?
01-25-2012, 10:35 PM #11
- Join Date
- Feb 2006
01-27-2012, 03:30 PM #12
The offending hard drive was a part of the SAN, not one of the servers within the Cloud providing resources like CPU and RAM. If the offending hard drive was on one of the servers in the Cloud it would fail over to another server. In this particular issue, because the SAN controls the Cloud, the accounts controlled by that particular SAN saw intermediate downtime or poor performance until SAN array was rebuilt.
The SAN isnít mirrored. When we had the mirrored SAN, restored drives had to be re-synced with the main SAN for the system to run correctly and the amount of time that took was monstrous -- larger than that of a standard drive replacement and data restore [which you saw the other day]. The SAN runs better with one set of drives [which we keep backups of] for overall performance.
There should not be repetitive SAN issues. Not all accounts were taken down when one of the SAN drives failed, but of the accounts affected they did need to wait until the drive was replaced and the backup data restored. Usually harddrive failure can be caught [and has been in the SAN before] before the drive totally fails to respond, and we strive to do better in making sure that we are able to try and prevent these issues going forward.
01-27-2012, 03:49 PM #13
This might help some understand what SAN is. http://en.wikipedia.org/wiki/Storage_area_network It is a bit technical but from it I get that our actual data is on the SAN.
01-27-2012, 04:57 PM #14
Correct. The RAM, CPU, and general function of the account is maintained through a series of servers working in harmony. Data needs to be stored in one place and we have the current configuration running with a single SAN where all the content is physically stored. Not all accounts are on one harddrive, but if one drive fails outright before we can replace it more smoothly, an outage like what was experienced can occur. I do not see it occur often in our Cloud hosting environment, but do apologize whenever something system-wide like this causes downtime for anyone's website. I hope that we can continue to improve all our hosting systems for the benefit of all clients at WestHost.
04-05-2012, 03:26 PM #15
- Join Date
- Oct 2011
The only thing you can do is find a competent web hosting company! I'm looking myself ...
04-06-2012, 06:59 AM #16
- Join Date
- Oct 2011
I've never seen such gross incompetence in the 15+ years since I've been in the business.
04-06-2012, 04:47 PM #17
I gave you a call and left my direct line so you could get a hold of me. I would be happy to help any way I can. Here at WestHost we are more then happy to make sure our client's are taken care of. Please feel free to PM me and I am more then happy to help you.