At its core, a communication protocol is just the ‘language’ that devices speak, right? Just a collection of rules that govern how components of a system are going to interact with each other, and what the capabilities of the system might look like.
In this post, I will discuss why you should bother learning anything about communication protocols in the first place. I will discuss the value of knowing what a communication standard is and why some people (myself included) commit so much time to learning them.
The true benefit to be found in studying communication protocols is that it allows us to approach troubleshooting by treating the application we are working with as a ‘black box’, agnostic of the application that is actually being used. Take, for example, the system below. If we know the rules governing communications on both sides of the box, we need to know nothing (or very little at least) about the internal logic that is driving our application. Looking for our other "Protocol Matters" posts? Click Here to Catch Up
We can make educated decisions based on how incoming requests on one side of the box are converted to outgoing messages on the other side of the box and, thereby, gain valuable insight into the logic driving the application. In most systems the mystery box above represents the OPC Server (or even a proprietary or custom driver). The left arrow represents communications between the server and your client application and the right arrow covers the connections out to your devices.
At this point you might be thinking to yourself “Well that’s great, but so what? How does this help me troubleshoot a problem any more efficiently?”.
Let’s consider a sample system where our OPC Client is ‘getting’ information from our OPC Server and our server is configured to communicate with a Modbus TCP Device. Everything is configured but we’re not getting any data in our client. Where does that put us?
Well we’ve tried nothing and nothing works, so is it time to start guessing? No, let’s break it down further – we know that our OPC Client is making a subscription to the Server, and at some point the client must be creating a Group and specifying the update rate at which it wants the data, adding items to that group, and then setting the group active to begin the subscription. We know that the return from the server will be an OnDataChange event, telling us that the value or quality of the tag we are monitoring has changed. The updated diagram would then look something like the following:
Now we are getting somewhere; but we can add some more detail concerning the Server-Device communications. We know the general structure of our Modbus Frame is going to take the form of:
[Transaction ID][Protocol Identifier][Length][Unit ID][Function Code][Offset][Number of Registers]
We can break this down even further since we know our PLC has an ID of 1, we are requesting a single holding register, and that register has an offset of 7:
[2 bytes Trans ID] 00 00 00 06 01 03 00 07 00 01
Similarly, we know that the response is going to have a similar format:
[Transaction ID][Protocol Identifier][Length][Unit ID][Function Code][Number of bytes][1 Register of Data]
Which breaks down to:
[2 bytes Trans ID] 00 00 00 05 01 03 02 [2 bytes Data]
Now that we know the expected structure of our Modbus communications, we have a pretty complete picture of communications:
Now it's just a matter of comparing the expected result to the actual result. If we see that our server is sending out an offset of 6 instead of 7, then we know that our 0/1 based addressing is likely wrong. If we see that our server is sending an OnDataChange to our client with the correct value update, we know that our client might be configured incorrectly.
I can go on listing all the issues we could identify with just the basic image above and a Wireshark, but you probably get the picture. The best thing of all? At no point did we have to guess – we knew what was supposed to be there, observed what was actually happening, and were able to isolate the problem based on the information at hand.
So if approaching troubleshooting from this perspective is so great, why doesn’t everyone do it?
- Guess and check troubleshooting has worked so far.
We’ve all done it, and we’ve all guessed correctly at some point, but the law of averages doesn’t apply here. Modern automation systems are steadily growing in complexity, and eventually there will be a problem where guess and check doesn’t work. With the amount of money at stake and the number of resources available, there is no excuse for relying on guessing at solutions. - The upfront time investment required is not insubstantial.
Reading long documents and taking the time to process and understand what is written does take time – there is no denying it. It all comes down to whether the time is better spent learning a protocol before there is a problem, or whether that time should be spent guessing at a solution when a problem occurs. - You don’t really care what the problem is.
We have all found ourselves in situations where time or technical constraints don’t allow for a full analysis of a situation. Rather than resorting to guessing and checking this is when external resources should be brought in to lend a hand. Gaining some basic knowledge, like what we are offing in this and our other blog posts, can still help. Even if you don’t know all the ins and outs in detail, having some knowledge can lead to faster solutions and capturing diagnostics when working with outside technical resources, such as our staff here at Software Toolbox.
As if it wasn’t already clear how vital it is to have a good understanding of the protocols in play, knowing the protocol also plays a huge role in the planning stages of an automation system. Everything from ideal uses, limitations, and considerations, to actually picking a controller and protocol best suited for the task. I invite you to subscribe to our blog and watch out for our coverage of additional industrial communications protocols.