2010 September

Archive for September, 2010

postheadericon Remember the 3 D’s when dealing with ISP’s

It’s no secret that I am a big fan of Pathview (both Premise & Cloud) and since becoming a partner last year it has really made a difference in our ability to assess, troubleshoot and monitor our customers’ network.  When first watching a few of the UTube infomercials about Pathview, one of the terms that caught my attention was ‘MTI’ – Mean Time to Innocence.  I thought immediately that the speaker must have had experience as a phone guy somewhere in his past and his soul, like mine, had been scarred for life!

Coming from a voice background, one of the first things you learn is that you are guilty of  all things until you prove – and often to the satisfaction of  completely clueless people (see figure 2) – that you or the equipment you installed was not at fault for something not working.  Furthermore,  just because you correctly identified the problem as not being yours does not a) make you popular or b) excuse you from having to fix it anyway.   Our customers have greatly benefited from Pathview and, one would think, that because of the highly technical nature of networks, ISP’s would welcome the assistance from someone armed with such an application.  Amazingly, this is not so.

Figure 1. Customer reported that the Internet seemed to suddenly slow to a crawl. ISP response - “We are testing good to the modem at 3Megs. Contact your IT person. It’s obviously your equipment.” After hours of arguing with them (Comcast), it turned out to be their faulty modem.

True story

Those of you familiar with Wireshark know it to be a very popular packet sniffer. If you have been looking at Wireshark recently you will know the name of Laura Chappell and her very thorough book “Wireshark Network Analysis” released last March.  In one of her case studies was a very amusing yet oh-so-true true situation.

It was a familiar story of a user having trouble connecting to the corporate office and the IT department pointing the finger towards the ISP.  The ISP calmly assured them that they were not blocking any traffic.  When confronted with a packet capture showing otherwise,  the ISP confirmed that they were – as they put it – blocking  “ports in question”.  Begrudgingly, the ISP allowed these port through but with a few caveats.  In an almost whimsical way, the author ended his report as follows;

“We now have a happy user, but I can’t help but wonder how many other customers of this ISP are encountering similar issues and wondering why it takes so many attempts to get connected to their corporate network.”

If you haven’t had an experience like this with an ISP, you are either very new, very lucky or VERY oblivious.  In the words of Morphius to Neo …. “Welcome to the real world.”

Here’s the fact, ISP’s have for years held – and lorded over – their two trump cards to end users and vendors alike, i.e. “We’re more technical than you could ever hope to be” followed closely by “Prove it …!” said usually with a subtle yet detectable sneer either on the phone or through an email.  Unfortunately, things like “it seems slow …” or “I don’t think we are getting the bandwidth we are paying for …”, or even, “my voice is definitely choppy at certain times of the day …” just doesn’t cut it with these guys.

Baselining – Getting Prepared for the 3 D’s

When a customer approaches us with their story of woe, the first thing we do is establish a baseline with Pathview using targets both inside their network and outside to one of our micro appliances.   The more elusive or intermittent the problem, the longer the base line needs to be and by that I mean minimum one day and as much as a week or even longer.  It is also recommended to see what their system looks like when things are being backed up.

Field note – if the customer can’t tell you when the system gets backed up, it is probably related to when they are having some type of network problem.

Figure 2. The weekend staff kept complaining about the Internet being slow even when only 4 of them were watching the ball games … on their pc’s.

I usually tell the customer that if the problem looks like the carrier, it will take a minimum of 4 calls to their Customer Care/Support center to get to someone who can actually do something.  If you are a consultant, this is something that you charge for (what?! you don’t think a lawyer would charge for his time to do this?).  Once contact with the carrier has been made, you will begin with the first of the 3 D’s.

DENY

The way ISP’s decide to address a customer’s problem depends on their technical level in the Customer Service Center which generally falls under one of the following two axioms:

Axiom 1 – The less technical they are, the more they will want to sell you additional bandwidth.

Slow Internet? More bandwidth.

Getting cut off altogether?  More bandwidth.

PC making strange noises? More bandwidth.

Think some of your ports are being blocked?  Get more bandwidth and, oh by the way,

there may be an extra charge for unblocked ports (whatever they are).

Axiom 2 – The more technical they are, the likelier they are to put all the blame on the user’s equipment.

Usually they will say something like “Well, I’m logged into your router now and I don’t see any packet loss at this time or in the last 48 hours …”.  One time with the NOC guy on the line, I told him that I was going to make a programming change in the router and actually unplugged the carrier’s equipment.

“How does it look now?”  I asked after about a minute.

“Looks the same, no packet loss.”
No surprise, he was looking at the wrong circuit all along.  That one was easy.

The hard ones are when they are actually looking at the right circuit and for some reason insist they are not seeing what you are seeing.  This, by the way, is what sold me on Pathview and led me to an almost X-File motto,

“The truth is out there … somewhere, but you will need Pathview to find it.”

(I should probably copyright that before Jim grabs it!).

In the chart below,  the customer had installed an IP based pbx and had just converted to SIP trunks.  They kept losing calls and their voice quality was very choppy – other than that, it was great …

Figure 3. Using Pathview Cloud looking towards the SIP provider

The SIP provider, Broadvox in this case, insisted that the problem was not at their end and to contact their local carrier.  The carrier, a wireless outfit, claimed that it could not be anything at their end so it had to be the customer’s equipment.

A quick look at their connection from our point of view (outside-looking-in to their pbx using Pathview Premise) and then to the SIP provider from the customer’s point of view (inside-looking-out using Pathview Cloud), shockingly revealed that the problem wasn’t the customer’s equipment at all but was instead the hop (in this case, the  wireless tower) just past the customer’s router.  Granted, the customer was in a remote area (northeast Texas) which is why they HAD to go with wireless for a while (but this was about to change).   The customer and I called the carrier and incredibly, got a hold of the operations manager on the first try.  Thus began the second “D” when dealing with the ISP.

DEFLECT

He had us on the speakerphone and it seemed as though he was trying to demonstrate to someone(s) in the room how to deal with a complaining end user.  In rapid fire he began to drill us –

“How much packet loss? I will need to know the exact percentage and where it is occurring”

“When did this start?”

“Why are you just noticing it now?”

“How come we’re not seeing it?”

“Are you sure you have power?”

“Have you replaced your equipment?”

“Is it raining there?” ç That was an interesting one!

“We have VoIP sets and we’re not having problems.”

There were a few others whose relevance I questioned but it was clear that customer sensitivity was not in the field ops hand guide.  The only question he didn’t ask is whether we though we needed more bandwidth but then again he was obviously an Axiom 2 guy. He finally ended it with a ‘and-don’t-call-me-back-until-you-have-all-this-information’ “Okay?”.

For a moment it seemed as though he was leaning back in his chair while looking at his understudies with the air of  “ … and THAT’S how you handle that!”.  At the same time I heard my customer softly chuckle since he had already seen the Pathview charts and tests.  Needless to say, we were quickly placed on hold and picked up elsewhere – not on speakerphone  – and he asked us to email the information.  I don’t know if he actually looked at it or not but within moments a service call was scheduled with the results in figure 4.

Figure 4. “Okay, should be working fine now. The tech ran a few DSL Reports tests and then a traceroute for good measure. Looks great!” Are you kidding me?!

I was stunned.  Yes, it looked better but the customer was, well, let’s just say he was still experiencing problems.  This time the customer forwarded the reports to the ops manager and anyone whose email address he had.  On to the third “D”.

DELAY

The response was that they did not see any problems – which also falls under “DENY” – but they promised to continue monitoring it and get back to us.  After a few days of non-returned emails and phone calls it was clear they were going to leave it as is and just wait us out.   Long story short, the customer switched Internet providers the next week and, while not perfect, things greatly improved.   I also heard that they raised the rent for the carrier’s repeater that was on their property.

Figure 5. “We don’t measure MOS in this department but that looks normal.” Is there someone I can go over this with? Hello? … Hello?

In the very old days of T-1 (for both voice and data),  and to a much lesser extend today,  the only way to really trouble shoot the circuit was to go on premise with a T-Berd and do a head-to-head test which meant disconnecting the circuit altogether from any customer equipment and start running tests directly back to the central office.  This was an after hours adventure known as a “vendor/telco meet” and had to be scheduled a few days in advance unless you happened to have your own T-Berd (which was not cheap) in which case you could almost do it on the fly.  The upshot was there was usually a conclusion one way or the other, the guilty were persecuted and sentenced while the innocent were absolved.

Figure 6. “We’re not seeing anything but go ahead and send us your graphs and we’ll forward them up the line.” Two days and 4 calls later I got a hold of a sharp router tech who found the problem in 15 minutes. “User provided graphs? No …. I’m not sure what they do with that stuff.”

But that was when the playing field consisted of AT&T, GTE/Verizon and then everyone else.  Nowadays, it’s all about how to repackage the same service that everyone else has and only worry about the larger customers.  The smaller the fish, the more they are going to have to put up with – “ ….just let them try to get out of our contract!”  Ironically, this is actually good for companies like us because the SMB’s of the world just simply don’t know who to turn to.

Parting Shots

When we started using Pathview, aside from the occasional ticker tape parade,  I was looking forward to the time and aggravation we would save ourselves and the customer.  What we got was all of the above along with the revelation that ISP’s ;

1)      Don’t look at packets they way they are actually used

2)      Don’t want your help when trouble shooting

3)      Often don’t have the software  tools, training or even the inclination to  look beyond the single port of a router.

4)      Would rather put you or your customer through the 3 D’s than to actually fix the problem.  How that happened is for another blog.

Copyright Eric Knaus 2010