PDA

View Full Version : Cloud hosting experience?



ali_shah
10-29-2012, 04:11 PM
Hi,
I've been a WH customer for many years (at least 8-9) and in general been very happy with their service and support! I started with a VPS and then went over to a dedicated server as well.

Recently, I moved to a Cloud Server (Reseller Plus) package, as it seemed to give more power at a good price (and also my dedicated server was getting too expensive for what I was getting elsewhere comparatively). As usual support, and migration went well! No complaints here!

But, now with the new Cloud based server, I have been experiencing a lot of issues with downtime and have been forced to restart the server several times a week, and apache almost every day. However, when I look at the process tables, I dont seem to see that much memory being used, CPU utilization remains very low (normally single digit) and I cant see that many incoming HTTP requests either. At the same time I see spike in 'load' (using uptime or top to check), where it can go upto over 100 at times. The system is at a grinding halt at these times and normally requires a restart.

If I look at my traffic and load, its actually gone down due to all these issues compared to when it ran on my puny dedicated server with less than half the RAM and CPU power, and I never faced issues there. I had uptime of months there without any intervention.

The support staff at WH have been very helpful trying to resolve the problem, but now weeks later I still see these spikes which I cannot explain, and the system comes to a grinding halt giving me downtime. Why the spikes in load while CPU, memory, disk IO all are rather low and under-utilized?

Am I the only one who has problems like this? Or others have also had similar issues on their cloud accounts?
When I search the web on VPS.NET issues, which seems to be the same system used by WH (I believe they all belong to same group of companies), I did see people facing similar issues and those were related with network and Disk/SAN setups within the cloud that manifest themselves on the actual accounts. These issues were about a year old, so its not certain that these can be blamed.

Regardless my situation is that I still have problems and I'm losing customers and reputation and sleep on this! WH support has been helpful but my problem still remains unsolved. I would like to find a solution to this, and maybe someone else has similar issues or solved similar issues and can guide me/WH on the solution?


thanks,
Ali

wildjokerdesign
10-29-2012, 04:31 PM
You are right that vps.net is under the same umbrella company that WH is. They do use some of the same servers but the technologies is a bit different as I understand.

There where some issues at one time when they first added the Cloud services but that was sorted out pretty fast and simply a server they where trying something new on that did not work out. I would just stick with WH support and keep feeding them as much information as you can. What you see going on when you see spikes or have to re-start. That info may help them to narrow down what is going on.

ali_shah
10-30-2012, 03:50 AM
There where some issues at one time when they first added the Cloud services but that was sorted out pretty fast and simply a server they where trying something new on that did not work out. I would just stick with WH support and keep feeding them as much information as you can. What you see going on when you see spikes or have to re-start. That info may help them to narrow down what is going on.

Hi, believe me I've been working with them for a long time, and am desperate to solve my issues. Just suffering another spike in load which I cannot explain, but all my websites are down.
I am willing to 'upgrade' to the next package in cloud computing, but I'm not seeing any of my existing resources being exhausted - memory is not swapping, CPU utilization is very low.

Last few times the support team said that the maximum apache processes were hit and they increased it twice. But when I look at my apache logs from Munin I don't see that many active processes. I guess my frustration is that I'm not seeing why the load is going up as all the normal OS tools like ps, top, that I'm used to do not give me any guidance of who/what processes are the culprits.

Hence, looking for those of you who have experience with hosting on the cloud servers to share your ideas/tips if you've sen similar issues.

ali_shah
10-30-2012, 04:04 AM
Take a look at the problem here. Here's output from 'top':
My load average is through the roof (125!!).
My CPU is 98% idle.
My memory usage does not look like an issue either.


1:Blk - 05:54:41 up 9 days, 41 min, 2 users, load average: 125.24, 111.52, 90
Tasks: 447 total, 1 running, 442 sleeping, 0 stopped, 4 zombie
Cpu(s): 0.4%us, 1.2%sy, 0.0%ni, 0.0%id, 98.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1925120k total, 1895480k used, 29640k free, 6020k buffers
Swap: 1048568k total, 621624k used, 426944k free, 117796k cached

1 PID USER S WCHAN %CPU %MEM SWAP TIME+ COMMAND
921 root D sync_page 0.0 0.0 0 0:25.40 [loop0]
922 root D sync_buff 0.0 0.0 0 0:03.65 [kjournald]
4898 snganet D - 0.0 1.7 121m 0:00.25 /usr/bin/php /home/snganet/
4903 autodeal D - 0.0 0.8 135m 0:00.15 /usr/bin/php /home/autodeal
4917 autodeal D sync_page 0.0 0.8 135m 0:00.12 /usr/bin/php /home/autodeal
4939 autodeal D - 0.0 0.6 135m 0:00.18 /usr/bin/php /home/autodeal
4985 autodeal D - 0.0 0.6 135m 0:00.15 /usr/bin/php /home/autodeal

2 PID USER %CPU %MEM SWAP S PR NI TIME+ COMMAND
10044 root 0.6 0.1 11m R 15 0 0:00.16 top
18135 root 0.2 0.1 17m S 15 0 0:01.85 dovecot-auth
921 root 0.0 0.0 0 D 0 -20 0:25.40 loop0
922 root 0.0 0.0 0 D 10 -5 0:03.65 kjournald
4898 snganet 0.0 1.7 121m D 16 0 0:00.25 php
4903 autodeal 0.0 0.8 135m D 16 0 0:00.15 php
4917 autodeal 0.0 0.8 135m D 16 0 0:00.12 php

3 PID USER TIME TIME+ %CPU %MEM SWAP S COMMAND
1935 mysql 331:12 331:12.42 0.0 2.2 460m S /usr/sbin/mysqld --basedir=/ -
104 root 0:46 0:46.09 0.0 0.0 0 S [kswapd0]
2075 root 0:39 0:39.85 0.0 0.2 255m S /usr/local/apache/bin/httpd -k
921 root 0:25 0:25.40 0.0 0.0 0 D [loop0]
272 root 0:21 0:21.21 0.0 0.0 0 S [kjournald]
1796 root 0:14 0:14.48 0.0 1.3 789m S /usr/local/jdk/bin/java -Djava
2784 tomcat 0:11 0:11.17 0.0 0.7 770m S jsvc.exec -user tomcat -cp ./b
2679 root 0:05 0:05.38 0.0 0.2 68m S cpsrvd (SSL) - waiting for c -

4 PID USER VIRT SWAP %MEM RES CODE DATA SHR nFLT nDRT COMMAND
1796 root 814m 789m 1.3 25m 36 720m 2188 2340 0 /usr/local/jdk/bin
2784 tomcat 784m 770m 0.7 13m 40 695m 1700 1046 0 jsvc.exec -user to
1935 mysql 501m 460m 2.2 41m 6136 470m 3760 3642 0 /usr/sbin/mysqld -
5291 snganet 280m 267m 0.7 13m 7512 9948 7468 375 0 /usr/bin/php /home
8755 fasic 288m 266m 1.1 21m 7512 14m 7728 59 0 /usr/bin/php /home
4987 nobody 259m 257m 0.1 2248 1472 84m 948 11 0 /usr/local/apache/
5471 nobody 259m 257m 0.1 2248 1472 84m 948 0 0 /usr/local/apache/
5862 nobody 259m 257m 0.1 2252 1472 84m 952 2 0 /usr/local/apache/

ali_shah
10-30-2012, 04:14 AM
And finally, many times on contacting support in situations like this I get a message like:
'we are currently experiencing some slowness with our cloud servers, I'll restart your account and it will be working here shortly'

to me it points to something deeper that is beyond my control?
Is a Ghz and GB on the cloud really the same as a Ghz or GB on a non-cloud server?

wildjokerdesign
10-30-2012, 05:01 AM
Do you know if your problem has been escalated so that a senior tech is looking at it? Make sure you are contacting via the ticket system so there is a good record of what is going on. I think you may be right that this is something beyond your control and may take system administrator to get things worked out. The first line techs you get a hold of when you use chat or phone support may not have the access to the system needed to figure this out.

J_M
10-30-2012, 07:12 AM
I agree and experience the same problems. And seems to be increasing again recently. I went through a long discussion with tech support when they were initially playing with the settings. Everyone was extremely nice but way too much too much time was spent pointing the problem back at our sites and that is not the problem. Our sites use very few resources, our situation sounds very similar to yours.

Westhost - what is the status of the cloud server setup, excess resources etc.? I don't have the patience or the time right now to enter into the tech queue and this is not an I dividual issue, it effects everyone. I'm also on the East coast and the storm has just left and our access to the net is limited.

Thanks in advance, please post any information on the problem and what WH has in the plans to fix these problems.

ali_shah
10-30-2012, 03:28 PM
Finally, I am not alone! I thought so too, but good to get confirmation on this!

wildjokerdesign, yes, I've been raising tickets all the time. I've raised more tickets in a month than I raised during the previous years combined and I've been a customer for long. Last ticket was responded that I should try to find someone online for a live chat to look at the system when the problem occurs, so hence I went for that today. Not very helpful, as it seemed the tech staff at night time was busy, so I got very much the same answer as to restart it and 'we're experiencing slowness'. I was online for 45 minutes but didnt make much progress. How does one get attention of a senior support guy? Do I have to threat to leave WH?

They were upgrading the WHM today and that was the explanation given for slowness. But my issues have been around for much longer, so not very optimistic that this will solve it - I wish it would though!

Another thing I notice. In my timezone its around 12 PM, but in Utah its 4 AM and I see the server load spike a lot during the 4-6 AM Utah timeslot. I wonder if its some maintenance cron jobs or backups kicking in that eat up system resources?

wildjokerdesign
10-30-2012, 05:40 PM
I'll see if I can get the attention of some of the senior staff to see if they can give you some answers.

rcardon
10-31-2012, 03:10 PM
Ali,

I am sorry to hear you have been having such issues with your Cloud Reseller account. I believe I was the one that suggested you contact us while the issue is occurring, as it should be easier for us to see the issue.

If you could PM me a list of the ticket numbers you have opened on this issue, as well as any chat transcripts, I will be happy to dig deeper into this issue for you.

ali_shah
11-01-2012, 01:41 AM
Reed,
Just PMed you!