Vendor vs. Whitebox – Reliable vs. Unreliable – Redux vs. The “unnamed”
As an organization premised on stability and reliability, we opt for vendor built, tested, distributed AND supported server solutions. Is there a premium to be paid? Without question. Is it worth it? Most certainly.
Unnamed web hosts (which will remain unnamed) “leading the industry” in unreliable hardware architecture, wholeheartedly disregard the importance of reliable server builds. Perhaps in the realm of 6,000% oversold bottom lines and brewing napoleon dynamited web 2.1 (credit to firefallpro for the term) websites, or reinventing the CRM wheel, an additional $800-$1200 on a server purchase for Gold Level 24/7/365 2 hour TAM support is something to scour at. In the Redux world of redundant services, these extra dollars go the extra distance when a RAID array alerts on thresholds of predictive failure over the christmas holidays (been there, and unfortunately done that).
A vendor branded server, in our case, Dell PowerEdge servers, coupled with exceptional support services (Silver/Gold 2 hour guaranteed parts and onsite-engineers-flown-in-from-Austin assistance if needed), is truly the only way to offer a reliable, honest, and decent service to your customer base. Filling up datacenters with low cost, bottom barrel components, sometimes assembled onsite, offer absolutely no level of protection or guarantee to the customer.
I do not want to hear the arguments regarding housing 1,000s of cloned systems, where replacement parts are literally a dollar a dozen, we are not in the year 2250 and all your bases do not belong to anyone. You cannot tell your customer base in good faith, that you are going to maintain an acceptable level of service when you aren’t even willing to expend resources on the data integrity and protection of RAID 1/5. Reliable services should not be premised on a providers (again, no names) ability to copy data from one failed OEM drive to what will eventually be another in 6 months, or swing a complete chassis replacement at 4AM, rather, it should (and is) premised on internal redundancies, pre-detection mechanisms, and an internal array of brilliance that only qualified server vendors (Dell/HP/IBM) are capable or providing. Quite honestly, if your provider is reporting that your server is being taken offline in the middle of the night to deal with “bad ram,” this translates to “we have no idea what the problem is, the team at Newegg told us to do this.” (unfair jab at Newegg, if you ever need low cost components to build a gaming rig, they are the place to go). But again, The Microsoft Windows mentality “lets reimage the entire system and set a job/cron to reboot it nightly,” is ever so present in this regard
The next item of importance are the toolsets provided by vendors (HP OpenView, Dell OpenManager) to assist in server management. The primary purpose of an OpenManager-ish suite of hardware and system management/diagnostic utilities is to keep an administrative team informed as to what is going on inside the box. Do we really care what speeds each of the 27 fans inside our systems are running, or what the temperature is around the system backplane? No, but this briefly describes the levels to which vendor management utilities will go. Pre-detection allows for a sysadmin to address a problem before it occurs, diagnostic utilities allow for in depth diagnosis of the actual issues prior to performing the far too frequent “chassis replacement.” By no means am I declaring a redundant system immune from disaster (see Massachusetts logs from December 2005: http://noc.networkredux.net).
How can a provider truly know that a memory module is the reason for server irregularity when there is no legitimate testing or diagnostic probing of the system? And how can anyone expect this level of service and manageability out of a whitebox server which cost these unnamed providers $200-$350 to assemble and put on a LAN rack? Countless gigE links BGPified, in turn solidified by a network of desktop PCs and 250GB OEM disk drive in RAID non-existent?
I’m not going to argue in favor of one vendor or the other, all contain a varying degree of pros and cons. What I am standing on a pedestal to argue is a higher threshold of server architecture builds and components across the web hosting industry. Less time developing useless in house applications that your customers have little to no interest in, or figuring out a flippin’ sweeter way to web 2.0ize your production site while handing out free vote for pedro t-shirts to new customers. Lets do the consumers a favor and redirect these resources toward hardware reliability and redundancy.
This entry was posted on April 19, 2006, 7:46 am and is filed under Comments. You can follow any responses to this entry through RSS 2.0.
You can leave a response, or trackback from your own site.
Vendor vs. Whitebox – Reliable vs. Unreliable – Redux vs. The “unnamed”
As an organization premised on stability and reliability, we opt for vendor built, tested, distributed AND supported server solutions. Is there a premium to be paid? Without question. Is it worth it? Most certainly.
Unnamed web hosts (which will remain unnamed) “leading the industry” in unreliable hardware architecture, wholeheartedly disregard the importance of reliable server builds. Perhaps in the realm of 6,000% oversold bottom lines and brewing napoleon dynamited web 2.1 (credit to firefallpro for the term) websites, or reinventing the CRM wheel, an additional $800-$1200 on a server purchase for Gold Level 24/7/365 2 hour TAM support is something to scour at. In the Redux world of redundant services, these extra dollars go the extra distance when a RAID array alerts on thresholds of predictive failure over the christmas holidays (been there, and unfortunately done that).
A vendor branded server, in our case, Dell PowerEdge servers, coupled with exceptional support services (Silver/Gold 2 hour guaranteed parts and onsite-engineers-flown-in-from-Austin assistance if needed), is truly the only way to offer a reliable, honest, and decent service to your customer base. Filling up datacenters with low cost, bottom barrel components, sometimes assembled onsite, offer absolutely no level of protection or guarantee to the customer.
I do not want to hear the arguments regarding housing 1,000s of cloned systems, where replacement parts are literally a dollar a dozen, we are not in the year 2250 and all your bases do not belong to anyone. You cannot tell your customer base in good faith, that you are going to maintain an acceptable level of service when you aren’t even willing to expend resources on the data integrity and protection of RAID 1/5. Reliable services should not be premised on a providers (again, no names) ability to copy data from one failed OEM drive to what will eventually be another in 6 months, or swing a complete chassis replacement at 4AM, rather, it should (and is) premised on internal redundancies, pre-detection mechanisms, and an internal array of brilliance that only qualified server vendors (Dell/HP/IBM) are capable or providing. Quite honestly, if your provider is reporting that your server is being taken offline in the middle of the night to deal with “bad ram,” this translates to “we have no idea what the problem is, the team at Newegg told us to do this.” (unfair jab at Newegg, if you ever need low cost components to build a gaming rig, they are the place to go). But again, The Microsoft Windows mentality “lets reimage the entire system and set a job/cron to reboot it nightly,” is ever so present in this regard
The next item of importance are the toolsets provided by vendors (HP OpenView, Dell OpenManager) to assist in server management. The primary purpose of an OpenManager-ish suite of hardware and system management/diagnostic utilities is to keep an administrative team informed as to what is going on inside the box. Do we really care what speeds each of the 27 fans inside our systems are running, or what the temperature is around the system backplane? No, but this briefly describes the levels to which vendor management utilities will go. Pre-detection allows for a sysadmin to address a problem before it occurs, diagnostic utilities allow for in depth diagnosis of the actual issues prior to performing the far too frequent “chassis replacement.” By no means am I declaring a redundant system immune from disaster (see Massachusetts logs from December 2005: http://noc.networkredux.net).
How can a provider truly know that a memory module is the reason for server irregularity when there is no legitimate testing or diagnostic probing of the system? And how can anyone expect this level of service and manageability out of a whitebox server which cost these unnamed providers $200-$350 to assemble and put on a LAN rack? Countless gigE links BGPified, in turn solidified by a network of desktop PCs and 250GB OEM disk drive in RAID non-existent?
I’m not going to argue in favor of one vendor or the other, all contain a varying degree of pros and cons. What I am standing on a pedestal to argue is a higher threshold of server architecture builds and components across the web hosting industry. Less time developing useless in house applications that your customers have little to no interest in, or figuring out a flippin’ sweeter way to web 2.0ize your production site while handing out free vote for pedro t-shirts to new customers. Lets do the consumers a favor and redirect these resources toward hardware reliability and redundancy.
This entry was posted on April 19, 2006, 7:46 am and is filed under Comments. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.