Storagebod Rotating Header Image

Supporting Role?

If you work for a vendor and after reading this you feel that this entry is about your company, then you are right! And if you work for a vendor and after reading this you feel that this entry has nothing to with your company, then you are totally wrong!

I've been working in IT now for more than twenty years; I've done a variety of roles but a big chunk of my career has been running support teams and I've done this both sides of the fence i.e both at an end-user but also at a reseller. I know a lot of people who have worked in support on both sides of the fence and pretty much all of us have coming to the same conclusion; over the past ten years or so, vendor support has got markedly worse! Of course, it may be that we are looking at the past through rose-tinted glasses but I do not believe so. I would also say that this is pretty much across the board; this is not a storage problem but it impacts hardware and software support across the board.

There are a variety of reasons for this decline in quality and ironically, it is some of the services that we as end-users have demanded that may have driven the decline in service. 

The Internet is a big driver in the decline of quality; there are huge amounts of support information and detail on line and we as end-users have demanded that the vendor support databases are put on line; unfortunately, this information is often presented in a manner which it is pretty much impossible to effectively search and you find yourself trawling through screens of unrelated sludge before you find the answer you are looking for. But because all of this information has been provided in a 'self-service' portal, this appears to have been used as an excuse to reduce the number of qualified people that you have in a support centre. 

The Internet has also further enabled the stock holding questions from the support centre and having worked both sides, I know that these are holding questions in general.

1) Are you at the latest level? 

2) Can you send me the logs?

3) Can you send me the configuration files?

Now all of these are valid questions in many circumstances but too often it feels that you are dealing with a robot who is working off a script! This is a Support Centre not a Call Centre; there is a difference! And my (and many other's) responses to the questions are

1) Why? Will it fix my problem? Where in the release notes does it talk about something even vaguely related to my problem? And if I'm in a really sceptical mood, can you send me the piece of the code which fixes the problem. And you realise that this is a live production system and just applying fixes is unacceptable, I will get asked all the questions I've just asked you by the change board. And if it makes it worse, it's my job on the line.

2) Why? Will that log help you diagnose the problem because I've already looked at the log and I can't see anything. Do you have any idea how frustrating it is when you have got a disk off-line and marked failed to be asked for logs from the array?

3) Which files do you want to see and why? 

And often, I've already attached the information to the ticket already; so you asking for them again shows that you have not read the ticket or are just working off the script.

Constantly in conversations with our various hardware engineers from all vendors, we discover reductions in headcount, experience engineers being retired early etc and territory coverage per engineer being increased. We are not buying less hardware, we are buying more and there are more systems out there to go wrong. Now arguably, systems are becoming more reliable but there are more of them. And in the world of storage, we have lots of moving parts; disks spin, tapes spool, robots move and these are all things which wear out and break. Disks get bigger and the potential impact of a disk failure and the resultant rebuild times gets ever larger. 

Talking to people who work in the support centres; it appears to be more important to keep the queue within the targets than solving the customer's problem at point of first contact. There is no longer time to do follow-up calls; for example, calling the customer who had the severity 1 call to ensure that they are happy with the fix. 

This is just the tip of the ice-berg and I could rant on and on about this subject; I was ranting about the decline in service years ago and yet it really is not getting better. For example, I am personally aware of four companies this year who have experience major outages due to problems caused by vendor support; it may be that now I have a fairly high profile in the industry that people tell me their war stories but it seems we are on an upward trend. 

I think it's about time that the vendors started to review what and how they are providing support (fix your websites or at least put a decent search capability on it, it is pretty awful that generally I find myself using generic search engines and the site: directive to search your site); it is also about time that they started treating support centre staff with respect and giving them time to do a great job. 


21 Comments

  1. Hi Martin.
    Great rant… I think you raise many valid points of concern.
    One comment, you state in point 1 that:
    “And you realise that this is a live production system and just applying fixes is acceptable”
    I suspect you meant: “unacceptable” ?

  2. Martin G says:

    Edited to reflect what I really meant. It is really unbelievable that after all these years that we still get this a lot with no appreciation of change control within a large environment.
    Even if it would fix the problem, 100% guaranteed; there is still often the problem in a complex environment about the impact that it will have on interoperation.

  3. When I worked for a vendor selling multiprocessor UNIX systems a support contract for 3 years for a 4 socket system was 10.000+ Dollar. Support cost for the same service level on a 4 sockets x86 system is today less than 2,000 Dollar. That is a factor 5 decrease in service pricing. I don’t say that your observation is wrong, but you shouldn’t solely compare the service that you get but also the price that you pay.

  4. Martin G says:

    Okay, if I look at the support costs as a percentage spend based on my capital budget; they have not declined especially. If I look at the number of software calls that my teams raise these days; they have declined and we really only raise them when we are in trouble; so in fact my cost per call is at best constant but probably in all reality, it has increased.
    I think that it is also true that I raise less hardware calls. And although there are more systems which are customer maintainable, we generally pay the uplift to have an engineer as our data centres are remote from the support staff and in some cases, they have oceans between them and the support staff.
    The more mundane problems, we generally handle ourselves and obviously, there is no longer the calling up and requesting the latest fix pack; we can download those. This means that we actually need a better quality of support person on the end of the phone; without denigrating the people that we speak to, it often takes a lot longer to get to the expert and we are filtered through a number of levels before we get there.
    Generally when we call, we want Batman…not his answering service.

  5. SR says:

    I agree with these, we have had almost exactly the same experience of a tech drone asking for 50 pieces of information completely unrelated to the trouble ticket in question.
    The fact that his happened with a Tier 1 storage vendor is even more disheartening.
    We have not experienced support costs going down. They are still at 8-12% of hardware costs, and have stayed that way for at least the past 7 years.
    SR

  6. Support Engineer says:

    Wow. A bash vendor support article. Here’s the real reasons behind the 3 questions as someone who has worked support by choice for a long time. I’m fully qualified to do many other jobs in IT, but I enjoy support.
    Q1 – It’s impossible for me to keep up with all the fixes that go in to a new release. How about we start out with the best foot forward and upgrade to rule out one of the ones I don’t know about. Your change management restrictions are your own doing. If you pay us enough money or gripe loudly enough we might eventually find the specific bug before you upgrade. Guess what you pay for not taking us up on this? Hope you said a quick resolution if it was something already fixed in the latest version.
    Q2 – Because no one would ever put in the logs something that you couldn’t understand or you would never miss anything in the logs. It’s a very arrogant response to this request from someone who wants to help you.
    Q3 – There’s a few reasons to request this. 1. The people who wrote the code that support interacts with often wants these questions answered. 2. Misconfiguration is a common cause of the problem you’re having. But, of course you never misconfigure anything. 3. It helps to reproduce the problem in the lab exactly as you have to test fixing it before running you through a ton of changes on a production system that might not fix the problem.

  7. Martin G says:

    Okay, lets just take your first point about change management. You have obviously never worked in a large, complex environment; going to the latest release can potentially have huge knock-on impacts across a whole environment. For example, going to the latest level on a storage array can mean recertifying every server, SAN switch etc…Do you really think I am going to go through all that work if you can’t guarantee that your fix will work?
    Just send us the release notes if you can’t be bother to read them yourself?

  8. Roger Luethy says:

    Hi Martin
    Unfortunate you are right. The support sites are mostly a nightmare to navigate and to find the right information is a pure game of luck.

  9. Support Engineer says:

    I only work on large complex environments for over 10 years.
    You’re naive if you think any major company puts every change in the product in the release notes. Release notes are marketing materials, not software audit reports.

  10. Martin G says:

    Come back after you’ve been doing this for 20 years then but obviously 10 years is not long enough for you to understand good change management. Or perhaps, you’ve been working in support for too long and it’s time to change path.
    Release notes btw are part of the QA cycle for any release…all software should have good release notes which do document all the changes. You may not release those to the end-users but they should exist and if your internal developments are not briefing you properly, I’d suggest you speak to your management.
    [I run a QA team as well as a storage team, we use release notes as part of our testing; I suspect yours do to]

  11. Support Engineer says:

    Your post and your responses to me make it very obvious to me why you have a hard time with support.
    You’d rather believe I’m too lazy or incompetent to read release notes, I need 10 more years of experience to understand how difficult it is to make changes (nevermind 90%+ of the software and hardware in IT is less than 10 years old), or every problem that pops up in your environment is well documented in some mythical internal document just go look for it. Oh and I shouldn’t ask for logs or config details either. I’m just supposed to mind read the problem from you and presto give you a fix.
    What world do you live in? Very revealing discussion. Thanks.

  12. Martin G says:

    Remind me who you work for again? You see I know..
    And I never said you shouldn’t ask for logs and config; actually most of the time, you will already have the logs and the configurations, most of the time they are already attached to the problem record. And most of the time, you are working for a script. And most of the time, you’d rather not be…actually, I generally get very apologetic support people who want to do a better job…but the process that they have to follow gets in the way.
    If you read the whole post, you will find that that I am pretty sympathetic to the support teams; you generally aren’t given enough time to do a good job and the quality has gone down because of this.
    If you took the time to talk to experience customers, you would find that this is not exactly an uncommon tho’.
    And yes, you do need experience working in a large end-user as opposed to a vendor; you do need to sit in change-boards and learn how complex systems hang together, you do need to know that C-level sometimes want proof that the fix you are going to put is not going to have adverse impact and take the whole environment down (yes it’s happened).
    And yes, you should read you own bug-tracker; you will find that in several occasions that going to the latest fix would have made the problem worse and actually broken things even more.

  13. Scott H says:

    I can understand sending the logs and that’s pretty much done by our teams when we open cases with vendors so they don’t even have to ask. But the notion that we have to automatically upgrade to the latest version is absurd for the reasons Martin stated. In a large enterprise environment we have strict change control and version control policies for very good reasons. If a support agent is going to require us to upgrade to some new OS version, they’re going to have to show me a reference (release notes, previous customer cases, etc) that show that 1) the issue I’m reporting has been reported before and 2) that it is fixed in the version they are recommending we upgrade to.
    And if a vendor is not putting out comprehensive release notes, I consider that a major knock against that vendor for future purchase consideration.

  14. Support Engineer says:

    It’s not difficult with my E-mail address that you have to figure out who I am or who I work for. Hell, if you E-mailed me and asked I could have saved you the googling. I’m confident I’ve helped your company recently (E-mail me if you want details). I just don’t think it’s fair to overgeneralize support and stereotyping only makes resolving the issue which I genuinely want to do harder. I still need the exact same information. I can tell you with certainty, your 3 dreaded questions from support are not just holding questions.

  15. Support Engineer says:

    Scott H, The question from support about upgrading can usually be answered exactly as I’m sure you do.
    “It’s a production environment and we can’t upgrade without knowing for certain it’ll fix this specific problem.”
    Sometimes when we ask the question it’s preprod or a lab and that’s not the answer. That saves all of us time and possibly money. BTW, most of the time I want to find the problem first as you do. I discourage support staff from taking the just upgrade approach for every problem. You’re right that it can cause more problems than it fixes. Total time to ask question and get that answer is about 30 seconds. If I was using that to buy time, it didn’t work. 🙂 I do what I do for the rush that comes from really resolving an issue. I don’t want to just dump you off the phone or close a case. I think my company is a little different about support though which is why I still do it.

  16. Martin G says:

    Actually, I think the first question is to just see how stupid the caller is! If they’ll blindly upgrade to the latest version, you’ve got a naive muppet on the call and they’ll probably believe whatever else you suggest.
    The second two can be legitimate but not always…and should not be used to hold off sending out an engineer with replacement parts. This is absolutely being done, I’ve had discussions with field engineers who agree and actually, they really enjoy working with us because we understand them and the organisational issues that they are suffering.
    And we always ask why you want particular information; we don’t just blindly handover information. I expect you to know the answer and not just flannel with some meaningless process-related crap. Or at least be honest and say, ‘it’s process related crap…’
    Support has gone backwards; believe me….there still good people working in support; they don’t have enough time to do a decent job.

  17. Support Engineer says:

    “Actually, I think the first question is to just see how stupid the caller is!”
    You don’t see how that’s over the top on cynical?
    If you’d have started your post with some of that Martin I could explain even better why we do what we do.
    Many problems that log errors that look like hardware problems can be attributed to software issues. My company has worked very hard to make sure we address the problem the first time and not spend a lot of time and money on chasing hardware that we know to be software problems. We need logs to say for certain. I think you’d also rather us fix the issue instead of blindly ship hardware only to find out it’s a software bug. See your own comments and post about difficulty getting maintenance window. There’s never enough time to do it right the first time, but there’s always enough time to do it again.
    Asking why we want some specific information is fair and proactive on your part. For some support staff and issues it’s required information to engage a certain group or expert. Some experts in support won’t look at an issue without having certain information to piece together what’s broken based on the experience they have that you need.

  18. Martin G says:

    Of course I’m cynical…I deal with vendors on a day to day basis; I explain to them on a day to day basis why we can’t just upgrade to the latest level. I spend time explaining to them what change management is, what problem management is; I explain to them what a critical system is.
    Yes, I’d prefer that you fixed the problem first time; which is why installing the latest levels on your say so without evidence is something I’m not willing to do. When I’ve had an array running stable at a particular level and nothing has changed; a failed disk is often just a failed disk. Dispatching a hardware engineer should be the first response…and actually swapping redundant hardware is generally a damn sight easier than arranging a firmware/OS/patch. I’ve recently had a spate of cases where vendors have been working very hard to avoid doing the sensible thing until they’ve got enough staff in to handle things.
    I will stick by my statement that support has got worse; a mixture of call-centre mentality with understaffing by the powers that be has caused this.
    And I quote from this blog
    ” I think it’s about time that the vendors started to review what and how they are providing support (fix your websites or at least put a decent search capability on it, it is pretty awful that generally I find myself using generic search engines and the site: directive to search your site); it is also about time that they started treating support centre staff with respect and giving them time to do a great job.”
    You will see that I am standing up for the people who actually work on the coal-face.

  19. Scott H says:

    I know who Support Engineer works for based on Martin’s tweet earlier and I have to say that at least for us, his company has been less guilty of the “support by rote-script” behavior Martin is rightfully protesting. Just last week I opened an issue with your company and was presently surprised out how quickly we were engaged and the fact that the very first person I spoke to was the right person, was engaged with the issue, and was capable of doing an in depth analysis and asking the right kinds of support questions.
    I was surprised because he wasn’t engaging in the behavior martin discusses which absolutely have become the norm in the industry. When we call for support, the issues are typically complex and are coming from a relatively complex environment. More often than not, we go through at least 1 front-line agent if not 2 before any real diagnosis and support starts. So, yeah, i was pleasantly surprised when the first person I spoke to got right down to it and rendered solid support. It’s simply not the current industry norm.
    And, across the industry, the problem is generally getting worse. Martin’s analysis of vendor support site search capabilities is also dead-on accurate in my experience.

  20. Martin G says:

    It’ll be interesting to see if NetApp can continue with such quality as they continue to undergo rapid growth; or will they fall into the trap that so many of their competitors have? They are also fortunate that they have a single product (more or less), so their people only have to be an expert in a single product line. This is not the case for many other vendors.
    I remember a time when I could call pretty much any vendor and get to speak to the right person pretty much immediately. Now, far too often we find ourselves raising calls and then going round the process to get the support we need. It doesn’t happen all the time but it’s with increasing frequency.

  21. Scott H says:

    Very true. That’s one of netapp’s differentiators is that OnTap is OnTap regardless of hardware. At other orgs, there is a wide disparity between platforms and that makes there support scenario much more complex as well as expensive for them.
    I also had a surprise recently calling in to Oracle for support on an ACSLS software issue. Again, I immediately got distributed to the right person on the initial call. But I think this was due to two factors
    1) I was calling right in the middle of a US business day
    2) ACSLS is rather unique. Your call is going to get routed to a relatively small field of experts to deal with an issue for that item. It’s a whole lot different than calling in with some sort of Solaris OS issue.

Leave a Reply

Your email address will not be published. Required fields are marked *