network

Posts Tagged ‘network’

postheadericon Remember the 3 D’s when dealing with ISP’s

It’s no secret that I am a big fan of Pathview (both Premise & Cloud) and since becoming a partner last year it has really made a difference in our ability to assess, troubleshoot and monitor our customers’ network.  When first watching a few of the UTube infomercials about Pathview, one of the terms that caught my attention was ‘MTI’ – Mean Time to Innocence.  I thought immediately that the speaker must have had experience as a phone guy somewhere in his past and his soul, like mine, had been scarred for life!

Coming from a voice background, one of the first things you learn is that you are guilty of  all things until you prove – and often to the satisfaction of  completely clueless people (see figure 2) – that you or the equipment you installed was not at fault for something not working.  Furthermore,  just because you correctly identified the problem as not being yours does not a) make you popular or b) excuse you from having to fix it anyway.   Our customers have greatly benefited from Pathview and, one would think, that because of the highly technical nature of networks, ISP’s would welcome the assistance from someone armed with such an application.  Amazingly, this is not so.

Figure 1. Customer reported that the Internet seemed to suddenly slow to a crawl. ISP response - “We are testing good to the modem at 3Megs. Contact your IT person. It’s obviously your equipment.” After hours of arguing with them (Comcast), it turned out to be their faulty modem.

True story

Those of you familiar with Wireshark know it to be a very popular packet sniffer. If you have been looking at Wireshark recently you will know the name of Laura Chappell and her very thorough book “Wireshark Network Analysis” released last March.  In one of her case studies was a very amusing yet oh-so-true true situation.

It was a familiar story of a user having trouble connecting to the corporate office and the IT department pointing the finger towards the ISP.  The ISP calmly assured them that they were not blocking any traffic.  When confronted with a packet capture showing otherwise,  the ISP confirmed that they were – as they put it – blocking  “ports in question”.  Begrudgingly, the ISP allowed these port through but with a few caveats.  In an almost whimsical way, the author ended his report as follows;

“We now have a happy user, but I can’t help but wonder how many other customers of this ISP are encountering similar issues and wondering why it takes so many attempts to get connected to their corporate network.”

If you haven’t had an experience like this with an ISP, you are either very new, very lucky or VERY oblivious.  In the words of Morphius to Neo …. “Welcome to the real world.”

Here’s the fact, ISP’s have for years held – and lorded over – their two trump cards to end users and vendors alike, i.e. “We’re more technical than you could ever hope to be” followed closely by “Prove it …!” said usually with a subtle yet detectable sneer either on the phone or through an email.  Unfortunately, things like “it seems slow …” or “I don’t think we are getting the bandwidth we are paying for …”, or even, “my voice is definitely choppy at certain times of the day …” just doesn’t cut it with these guys.

Baselining – Getting Prepared for the 3 D’s

When a customer approaches us with their story of woe, the first thing we do is establish a baseline with Pathview using targets both inside their network and outside to one of our micro appliances.   The more elusive or intermittent the problem, the longer the base line needs to be and by that I mean minimum one day and as much as a week or even longer.  It is also recommended to see what their system looks like when things are being backed up.

Field note – if the customer can’t tell you when the system gets backed up, it is probably related to when they are having some type of network problem.

Figure 2. The weekend staff kept complaining about the Internet being slow even when only 4 of them were watching the ball games … on their pc’s.

I usually tell the customer that if the problem looks like the carrier, it will take a minimum of 4 calls to their Customer Care/Support center to get to someone who can actually do something.  If you are a consultant, this is something that you charge for (what?! you don’t think a lawyer would charge for his time to do this?).  Once contact with the carrier has been made, you will begin with the first of the 3 D’s.

DENY

The way ISP’s decide to address a customer’s problem depends on their technical level in the Customer Service Center which generally falls under one of the following two axioms:

Axiom 1 – The less technical they are, the more they will want to sell you additional bandwidth.

Slow Internet? More bandwidth.

Getting cut off altogether?  More bandwidth.

PC making strange noises? More bandwidth.

Think some of your ports are being blocked?  Get more bandwidth and, oh by the way,

there may be an extra charge for unblocked ports (whatever they are).

Axiom 2 – The more technical they are, the likelier they are to put all the blame on the user’s equipment.

Usually they will say something like “Well, I’m logged into your router now and I don’t see any packet loss at this time or in the last 48 hours …”.  One time with the NOC guy on the line, I told him that I was going to make a programming change in the router and actually unplugged the carrier’s equipment.

“How does it look now?”  I asked after about a minute.

“Looks the same, no packet loss.”
No surprise, he was looking at the wrong circuit all along.  That one was easy.

The hard ones are when they are actually looking at the right circuit and for some reason insist they are not seeing what you are seeing.  This, by the way, is what sold me on Pathview and led me to an almost X-File motto,

“The truth is out there … somewhere, but you will need Pathview to find it.”

(I should probably copyright that before Jim grabs it!).

In the chart below,  the customer had installed an IP based pbx and had just converted to SIP trunks.  They kept losing calls and their voice quality was very choppy – other than that, it was great …

Figure 3. Using Pathview Cloud looking towards the SIP provider

The SIP provider, Broadvox in this case, insisted that the problem was not at their end and to contact their local carrier.  The carrier, a wireless outfit, claimed that it could not be anything at their end so it had to be the customer’s equipment.

A quick look at their connection from our point of view (outside-looking-in to their pbx using Pathview Premise) and then to the SIP provider from the customer’s point of view (inside-looking-out using Pathview Cloud), shockingly revealed that the problem wasn’t the customer’s equipment at all but was instead the hop (in this case, the  wireless tower) just past the customer’s router.  Granted, the customer was in a remote area (northeast Texas) which is why they HAD to go with wireless for a while (but this was about to change).   The customer and I called the carrier and incredibly, got a hold of the operations manager on the first try.  Thus began the second “D” when dealing with the ISP.

DEFLECT

He had us on the speakerphone and it seemed as though he was trying to demonstrate to someone(s) in the room how to deal with a complaining end user.  In rapid fire he began to drill us –

“How much packet loss? I will need to know the exact percentage and where it is occurring”

“When did this start?”

“Why are you just noticing it now?”

“How come we’re not seeing it?”

“Are you sure you have power?”

“Have you replaced your equipment?”

“Is it raining there?” ç That was an interesting one!

“We have VoIP sets and we’re not having problems.”

There were a few others whose relevance I questioned but it was clear that customer sensitivity was not in the field ops hand guide.  The only question he didn’t ask is whether we though we needed more bandwidth but then again he was obviously an Axiom 2 guy. He finally ended it with a ‘and-don’t-call-me-back-until-you-have-all-this-information’ “Okay?”.

For a moment it seemed as though he was leaning back in his chair while looking at his understudies with the air of  “ … and THAT’S how you handle that!”.  At the same time I heard my customer softly chuckle since he had already seen the Pathview charts and tests.  Needless to say, we were quickly placed on hold and picked up elsewhere – not on speakerphone  – and he asked us to email the information.  I don’t know if he actually looked at it or not but within moments a service call was scheduled with the results in figure 4.

Figure 4. “Okay, should be working fine now. The tech ran a few DSL Reports tests and then a traceroute for good measure. Looks great!” Are you kidding me?!

I was stunned.  Yes, it looked better but the customer was, well, let’s just say he was still experiencing problems.  This time the customer forwarded the reports to the ops manager and anyone whose email address he had.  On to the third “D”.

DELAY

The response was that they did not see any problems – which also falls under “DENY” – but they promised to continue monitoring it and get back to us.  After a few days of non-returned emails and phone calls it was clear they were going to leave it as is and just wait us out.   Long story short, the customer switched Internet providers the next week and, while not perfect, things greatly improved.   I also heard that they raised the rent for the carrier’s repeater that was on their property.

Figure 5. “We don’t measure MOS in this department but that looks normal.” Is there someone I can go over this with? Hello? … Hello?

In the very old days of T-1 (for both voice and data),  and to a much lesser extend today,  the only way to really trouble shoot the circuit was to go on premise with a T-Berd and do a head-to-head test which meant disconnecting the circuit altogether from any customer equipment and start running tests directly back to the central office.  This was an after hours adventure known as a “vendor/telco meet” and had to be scheduled a few days in advance unless you happened to have your own T-Berd (which was not cheap) in which case you could almost do it on the fly.  The upshot was there was usually a conclusion one way or the other, the guilty were persecuted and sentenced while the innocent were absolved.

Figure 6. “We’re not seeing anything but go ahead and send us your graphs and we’ll forward them up the line.” Two days and 4 calls later I got a hold of a sharp router tech who found the problem in 15 minutes. “User provided graphs? No …. I’m not sure what they do with that stuff.”

But that was when the playing field consisted of AT&T, GTE/Verizon and then everyone else.  Nowadays, it’s all about how to repackage the same service that everyone else has and only worry about the larger customers.  The smaller the fish, the more they are going to have to put up with – “ ….just let them try to get out of our contract!”  Ironically, this is actually good for companies like us because the SMB’s of the world just simply don’t know who to turn to.

Parting Shots

When we started using Pathview, aside from the occasional ticker tape parade,  I was looking forward to the time and aggravation we would save ourselves and the customer.  What we got was all of the above along with the revelation that ISP’s ;

1)      Don’t look at packets they way they are actually used

2)      Don’t want your help when trouble shooting

3)      Often don’t have the software  tools, training or even the inclination to  look beyond the single port of a router.

4)      Would rather put you or your customer through the 3 D’s than to actually fix the problem.  How that happened is for another blog.

Copyright Eric Knaus 2010

postheadericon Are You VoIP Ready? – The Road to China: Content Filtering to the Max

ChinaNET is managed by the Data Communications Bureau of the Ministry for Posts and Telecommunications, and provides Internet service in all 31 provincial capitals in mainland China. It is one of the two major commercial networks approved by the State Council, the other being ChinaBGN. For this reason, Figure 6 is one of my favorite sites to watch, not because it has great VoIP possibilities – because it does not – but because you can capture the business heart beat of a nation along with the ideology of a government just by viewing this graph over a week s
time. The target site is in a town just south of Shanghai called Hangzhou. The part I find most interesting is that you can tell the moment you hit mainland China (hop 13) because the latency skyrockets from 62ms to 449ms. This is a classic example of Content Filtering  that ChinaNet does in order to keep certain things out of their country. Fortunately, ICMP packets are not one of them, so once we get past the censors, you can see that even within the mainland, there is an overall increase in latency to the final destination – this hints at content filtering within the borders as well. Overall, you can see what the average Chinese Internet user experiences in terms of latency over a weeks time. The graph shows a 7 day cycle and within each day cycle you can see a consistent dip just a little past half way which I found out later was when they – you guessed it – took the equivalent of lunch. This also shows that the basic internal data-transport infrastructure is under a severe load and the chances of VoIP running well WITHIN China are slim if you have to go more than a handful of hops. This might be another reason the Chinese government blocks most of the incoming Internet traffic – the network just could not handle it!
As an aside, when I first started watching this site about a year ago, I was able to see ChinaNet  in the DNSName column. About 7 months ago they removed any identification other than the IP address.
The client originally asked me to see if they could have a telephone connected via VoIP from California to this site and the answer was an emphatic No! . We considered a satellite solution but found out that there were restrictions on this as well. Besides, satellite in general,
has very high latency (too high for decent voice, in my opinion) and is susceptible to bad weather. So as of this article, they are simply resorting to email and regular PSTN connections to communicate.

postheadericon Are You VoIP Ready? – What Stable Connections Look Like

Figure 5 is a good example of a connection that should work for your VoIP application. In this screen shot, showing a 24 hour segment, you can see that there are only a total of 5 samples that go into the yellow area. The vast majority are in the lower middle of the green band with the average latency at 73ms and jitter of 4ms. As such, I would tell the customer that this circuit meets my criteria of VoIP readiness. The only thing I do not know for sure is how much bandwidth they have. Sometimes the customer will know and other times they will think  they know. If it is a T-1 connection, then you can be fairly certain that you are getting 1.54M up AND down and base your voice session calculations off of that. If it is a DSL or cable connection, you will most likely experience swings in latency as usage (voice and data) goes up.

postheadericon Are You VoIP Ready? – The “X” Factor

One X factor you will need to consider when looking at a VoIP solution is your network s vulnerability to viruses, worms and Trojans. The first thing I caution customers about when they want to go all out and purchase a pure  IP telephony solution is that as a general rule, you want to keep your local voice and local data traffic separate. In practice, this means if you already have a voice infrastructure (i.e. jacks specifically for telephone
sets that home run to a main telephone room or the IT Head End  room without connecting to the LAN wiring), put your voice on that with TDM sets instead of abandoning the wires that are in place.
Voice and data packets going out to other offices that are linked to you through a Point to Point or VPN connection will inevitably share time on the same Internet circuit. However, TDM sets are not affected by anything on the LAN until they have to connect to some off site device through VoIP. Regular local traffic such as voice mail retrieval, intercom calls, paging to the warehouse or calling a supplier over the PSTN will occur with or without a data switch or server in place.
If you do decide to go all IP, then do it when you can make a fresh wiring start and run a separate cable for voice and data sets. Also, keep the voice devices on their own subnet separated from the other LANs by a decent router. RonEK is not a Cisco reseller but we like their routers and recommend them in cases like this.
Many IT people would take issue with this approach but I have personally witnessed and been victim to what can happen as shown in Figure 4. As you can see, everything looks great until the main server on the LAN was hit by a very aggressive virus. For about 2 hours it created such havoc on the LAN in the form of broadcast storms that all network traffic was reduced to a crawl and VoIP was stopped cold.

postheadericon Are You VoIP Ready? – QoS (Quality of Service)

Normally, when packets are serialized out the router to the Internet, they are sent in a first come first serve  fashion. If your router is equipped with QOS, packets from your PBX or SIP server* can be prioritized ahead of the other non-voice packets thereby keeping the flow of
voice traffic relatively smooth. Most carriers that offer a combined package of voice and data services do just that. RonEK is a partner with several ISPs and one of the first questions on the vendor check list is the IP address of the PBX. Most of the higher end carriers will provide an
end to end managed circuit  which means that they can control the connection from your office to their space in the central office. From there, packets are routed out on their backbone or someone else s backbone depending on the carrier s capacity and the final destination.
Keep in mind that QOS prioritizes packets going out to the other end. Once a packet leaves your premise or your provider s backbone, it is no longer prioritized and subject to the winds and tides of the Internet just like all the other packets. So, when designing a multi-site
network, try to stick with one provider and, if possible, try to do it on an MPLS platform. Usually if you stay within a provider s backbone from end to end, the prioritization will be maintained throughout the connection. Also, there is no way to prioritize packets coming to you until they actually get to you. Many times customers think that if they simply implement QOS that all their voice issues will go away not realizing that they only addressed half of the potential problem. Your connection and QOS is just part of the overall voice session that YOU control. The rest is in the hands of the intermediary (often there is more than one) that controls the path of the packets and then finally, the ISP and equipment at the final destination.
* More about SIP servers in coming articles.

postheadericon Are You VoIP Ready? – Latency

Every IT staff member knows that when you ping something, in addition to confirming that a device is connected to the network, the reply will give you the round trip time in milliseconds from the device you are pinging. A ping is a type of ICMP packet (along with the commonly used trace route command) that you can use to determine just how much delay your packets will experience from point A to point B and back. All of the graphs you see in this article are latency based and bandwidth availability – or lack thereof – has to be extrapolated from that.
Realistically, there is no way to know how much bandwidth some one has by simply looking at their connection from the outside  unless you are
sitting in the Central office looking directly at the connection. That said, there are several things that latency will tell you. I usually try to get a week s worth of data before I m comfortable with the circuit – if there is a problem, it will likely show up within that sample.
The magic latency number I like to see when testing for VoIP usability between sites is 80ms or lower. Another term that is directly related to latency is jitter . Jitter is caused when packets leaving a source in a certain order and spacing, arrive at the destination in the same order
(usually) but with different spacing. It is essentially the difference in latency time from one packet to the next. When jitter is high – anything over 15% variance between samples – it usually points to bandwidth problem. To get an idea of the impact of high jitter, imagine the sound of an
audio CD that is played while alternating between pause and fast-forward. The garbled sound is characteristic of jitter.
If you are going to network offices within the same city, you should see ping times of around 30ms or less and jitter under 5ms. The further across the country you go, the higher the latency tends to get. As of this article, a typical ping time from Houston to Los Angeles is
between 54ms and 67ms. Surprisingly, latency from Houston to Hong Kong is only 62ms to 87ms. I was in Switzerland not too long ago and the ping time was only 75ms from Bern to Los Angeles. My point is that, while geographical distance is an issue, it is not going to be the
determining factor of whether your VoIP project is going to work or not. Things that will affect whether your voice packets will get there in reasonable time or not is QOS (which stands for Quality of Service), the ISP or Carrier, bandwidth, hardware and hardware configuration.
To get an idea of what mis-configured hardware looks like, take a look at Figure 3. The IT manager had just moved into a new facility and users were complaining to him about slow Internet speed. As you can see, the latency is pretty good but the amount of dropped packets was
very high. The poor guy spent the better part of 3 weeks arguing with the carrier about the problem. Their contention was that the source of the problem was at his end as they showed everything good when they tested up to the NIU. They also ran diagnostics on the router – that
they supplied – and that also produced nothing. Technically, they were right except for one important setting that would not show up on any diagnostic. The Clock Source  setting for the router was configured as Internal  – i.e. it was referencing itself as a clock source – instead of
clocking off the network (sometimes referred to as Recovered  mode). For the most part, it worked but any time the router or carrier s clock drifted slightly, packets would be dropped.