PDA

View Full Version : Server down again?



richardleggett
01-16-2010, 06:23 AM
The other day one of Westhost's servers was down for a couple of hours. You couldn't even log into SSH.

This seems to be happening again for me, although the netstatus.westhost.com page shows everything is fine.

My domain is richardleggett.co.uk

When I try to SSH in I get a connection closed immediately. Same for FTP.

This is really becoming a problem, Westhost, can you do anything about this? Is anyone else having problems?

scain
01-16-2010, 06:35 AM
Dear Richard,

I sincerely apologize about the problems you have been experiencing on that server and I am glad you brought this issue to our attention this morning. We strive for 99.9% uptime so anytime you notice any of your services not responding please do not hesitate to let us know. In this situation it was indeed a server issue, and an outage was posted shortly after our chat conversation. I have posted the information that was posted at http://netstatus.westhost.com and I have also forwarded this post to our administrators to see if they have any further information to provide on the particular outage.

Anytime any of our servers experience any issues we take the matter very seriously and do everything we can to ensure a repeat issue does not occur. We appreciate your understanding in this matter and if you have any other concerns please do not hesitate to ask.

Information from http://netstatus.westhost.com


This server is undergoing emergency maintenance and will be available again as soon as possible.

RoxyLopez
10-24-2011, 10:13 PM
I as well cannot access my site, is has been down for over an hour. This is happening a lot more often, and I would like to know why please and what I can do about it.


Thanks,
Roxy

wildjokerdesign
10-25-2011, 05:09 AM
Roxy,
Have you contacted WH yet? They need to have your information so they can look into the issue.

adamcroshaw
10-25-2011, 12:31 PM
Roxy

We would love to help you! Do you have a ticket number that we can reference to make sure your issue is taken care of. You are also welcome to PM me and I will make sure you are taken care of.

Adam

RoxyLopez
10-25-2011, 07:36 PM
Thanks, it took 4 hours but WH fixed the roblem and I thank you all for your help!!!!

Roxy
www.thetruthdenied.com

Andy435345345
01-19-2012, 08:27 AM
I'm still down - going on six hours. http://netstatus.westhost.com/ shows no servers down .... Someone please explain why there is no way to switch servers when this happens. Under no circumstances should a site EVER be down for more than a few minutes.

On the home page under 'Could' hosting it states 'Self-Healing', How do you people sleep at night selling such horse shit?

adamcroshaw
01-19-2012, 03:22 PM
Andy

We are sorry that you experienced downtime today. We have been updating our http://netstatus.westhost.com/ page through out the day with information regarding the performance issues that the HSI Cloud is experiencing. We understand this is frustrating and we are doing everything we can to resolve the issues as quickly as we can.

Adam

barry
01-19-2012, 07:19 PM
I'm still down - going on six hours. http://netstatus.westhost.com/ shows no servers down .... Someone please explain why there is no way to switch servers when this happens. Under no circumstances should a site EVER be down for more than a few minutes.

On the home page under 'Could' hosting it states 'Self-Healing', How do you people sleep at night selling such horse shit?

I'd like to understand the Self-Healing concept myself. This appear to be a down right lie. My sites are still down.

ifurniss
01-20-2012, 09:29 AM
I'd like to understand the Self-Healing concept myself. This appear to be a down right lie. My sites are still down.

The self-healing concept is in relation to the servers running all of the cloud accounts -- when one of those goes does down then the rest of the servers in the Cloud pick up the slack and are able to continue providing the necessary resources to the accounts.

However, the Clouds have to be controlled by something. For all of the data this is a machine called the SAN or Storage Area Network. There are a couple running at any given time, but if they fail, get out of sync, etc. that can cause an outage like what you were seeing.

In this instance we replace the offending hardware but then need to re-sync the SAN and copy all the data over to the replaced hardware. Since it's controlling the data for the Cloud this can take some time to complete and you may continue to experience some slowness or intermittent connectivity issues during the process.

barry
01-25-2012, 10:35 PM
So an entire server can go down, and the other servers pick up the slack, but if one hard drive fails it throws a wrench in the "cloud" and the whole network goes down?

Or in this latest instance, did the entire SAN fail and need to be replaced. I presume that the SAN is also redundant, or is that an incorrect assumption.

I really do want to understand, not just bash you guys when things are down. I simply expected that the sites would always be up because if one part failed it could be replaced while the other parts of the system took over the load of the damage piece. I assumed that would include hard drives, since they do fail.

ifurniss
01-27-2012, 03:30 PM
The offending hard drive was a part of the SAN, not one of the servers within the Cloud providing resources like CPU and RAM. If the offending hard drive was on one of the servers in the Cloud it would fail over to another server. In this particular issue, because the SAN controls the Cloud, the accounts controlled by that particular SAN saw intermediate downtime or poor performance until SAN array was rebuilt.

The SAN isnít mirrored. When we had the mirrored SAN, restored drives had to be re-synced with the main SAN for the system to run correctly and the amount of time that took was monstrous -- larger than that of a standard drive replacement and data restore [which you saw the other day]. The SAN runs better with one set of drives [which we keep backups of] for overall performance.

There should not be repetitive SAN issues. Not all accounts were taken down when one of the SAN drives failed, but of the accounts affected they did need to wait until the drive was replaced and the backup data restored. Usually harddrive failure can be caught [and has been in the SAN before] before the drive totally fails to respond, and we strive to do better in making sure that we are able to try and prevent these issues going forward.

wildjokerdesign
01-27-2012, 03:49 PM
This might help some understand what SAN is. http://en.wikipedia.org/wiki/Storage_area_network It is a bit technical but from it I get that our actual data is on the SAN.

ifurniss
01-27-2012, 04:57 PM
Correct. The RAM, CPU, and general function of the account is maintained through a series of servers working in harmony. Data needs to be stored in one place and we have the current configuration running with a single SAN where all the content is physically stored. Not all accounts are on one harddrive, but if one drive fails outright before we can replace it more smoothly, an outage like what was experienced can occur. I do not see it occur often in our Cloud hosting environment, but do apologize whenever something system-wide like this causes downtime for anyone's website. I hope that we can continue to improve all our hosting systems for the benefit of all clients at WestHost.

Andy435345345
04-05-2012, 03:26 PM
The only thing you can do is find a competent web hosting company! I'm looking myself ...

Andy435345345
04-06-2012, 06:59 AM
I've never seen such gross incompetence in the 15+ years since I've been in the business.

adamcroshaw
04-06-2012, 04:47 PM
Andy

I gave you a call and left my direct line so you could get a hold of me. I would be happy to help any way I can. Here at WestHost we are more then happy to make sure our client's are taken care of. Please feel free to PM me and I am more then happy to help you.

Thanks

Adam