Your operations visibility relies on real-time data from machinery, systems, and field devices. When there are communications issues resulting in loss of or inaccurate data, addressing issues promptly is critical. Whether it's stale, poor quality, or missing data in your SCADA or HMI application, a well-defined troubleshooting procedure is a must. Like all our products, the TOP Server OPC Server offers an arsenal of tools to inform you about what is happening and streamline your troubleshooting process, summarized in our previous blog post, Minimize Downtime with TOP Server Troubleshooting Tools.
In this post, we'll take a deep dive into some of those tools, with an emphasis on the device connection side of TOP Server, teaching you a step-by-step process to pinpoint communication issues at their source. While this specific example uses the Modbus Ethernet Driver and an OPC DA Client connection, the same principles can be applied when troubleshooting various other scenarios.
The initial step in troubleshooting almost any problem is to collect information. This information could include the potential causes of unexpected occurrences, associated symptoms, or specific conditions necessary to replicate the issue. TOP Server's comprehensive Event Log is a valuable troubleshooting resource that logs informational messages, warnings, and errors when issues arise. This holds true for configuration issues, client-side problems, and device-side communication challenges. A thorough examination of the TOP Server Event Log can offer valuable context for the issue at hand, making it an ideal starting point.
As covered in our previous blog post, the Event Log records four types of events: Errors, Warnings, Information, and Security. By default, all possible events will be logged. If you don't really need to see some event types, you can easily hide them by right-clicking on the Event Log and unchecking the event type as shown below. This does not stop the logging of the events, but filters the view.
A common use case would be to filter on errors and warnings when trying to solve a communication problem. It is recommended that you leave the view to show all events to start so that you get a full view and then narrow your focus.
The Event Log also provides the Date, Time, and Source of each logged entry, which can be used to determine answers to questions like...
The source of the error shows which TOP Server component is reporting the issue. Below we can see four examples of error types with explanations of each
As you might have experienced, the reason behind some messages are more obvious than others. It can be difficult to know where to start with errors like "device is not responding", but they may be preceded by other notable messages, some of which may only appear upon the initial device connection or other specific circumstances. This is why it is important to review the Event Log in its full context. That's why we ask you to make sure you send us event logs that match the time the issue occurred. Our team will look at events before the problem started and after to gain insights.
For projects with many tags/devices or rapid communications, relevant messages to the issue can go unnoticed with Event Log traffic moving so fast. You can turn off automatic scrolling of the Event Log by right-clicking and unchecking "Autoscroll", and you can also save the events to a text file by right-clicking and selecting "Save As Text File..." for further analysis.
While the Event Log can provide some insight into your issues, you may need more in-depth testing to resolve your problem. Next we will set up tests to control certain variables and observe the effects of induced changes. Specifically, we will look at how to troubleshoot when a client is not receiving any data from TOP Server.
Although this post is focused on a device side issue, when developing a good troubleshooting procedure it is important to isolate the source of a problem through the process of elimination. If you see no data flowing to your client, you need to determine if the problem lies between TOP Server and the device or between TOP Server and your client configuration. The diagram below helps illustrate this concept and the tools this post will focus on. There are other tools for more challenging situations, so this diagram is not all inclusive.
With the installation of TOP Server comes a lightweight OPC DA test client which can be used to rule out external clients as part of the issue. It is good practice to isolate communication elements as much as you can to observe only relevant factors. Completely disconnecting the external clients from TOP Server is recommended when isolating communication between just TOP Server and the Quick Client. If this is not possible, disabling them from requesting data on the problem device is the next best option. Please refer to your client documentation to accomplish this. If not possible, the troubleshooting tips here will still be valuable.
To launch the Quick Client from the TOP Server configuration, you can go to Tools > Launch OPC Quick Client or you can click the Quick Client symbol in the icon menu.
Launching the Quick Client from the TOP Server configuration window will, by default, automatically add all tags that TOP Server is exposing on its OPC DA interface. This can be disabled in the Tools > Options menu in the Quick Client.
For further details on navigating the Quick Client interface, check out our blog post on 5 Tips for Using OPC Quick Client.
To reduce the number of problem variables as much as possible, we will monitor communications to a single tag that...
For some protocols, the data type of a tag is dictated by its address syntax or location, but it’s good practice to confirm what we expect for the tag we are observing. It’s not required that the chosen tag has read/write permissions (unless the issue you are experiencing is specific to write requests), but it does give us more control in our tests. For example, if the issue only appears with a specific client, we can compare behaviors and determine if data changes are captured.
As covered in our previous post, the Quick Client application returns OPC DA quality codes as well as values for tags or items requested.
The provided OPC DA quality codes can also be helpful in determining where the communication issue lies. If you see values with good quality, you can rule out a problem with the connection between TOP Server and the device as well as the device specific settings in TOP Server. Instead, you can focus on the connection between TOP Server and your external client. If the opposite is true, as in our case, you can direct your troubleshooting efforts downstream.
Now that we've ruled out a client issue, we can focus on communication between TOP Server and the device. As previously mentioned, the TOP Server Modbus Ethernet driver will be the focus for this post, but the troubleshooting process we use will be applicable for other drivers as well.
First in your Event Log are you having Device Not Responding errors as shown earlier? If so, recall that means that the driver sent a request to the device, and no response received in the timeout period times the number of retries, which in most drivers is 3 retries with a 3000 ms timeout, or 9 seconds total.
Before we go into deeper tools, ask yourself some questions. We know you're a skilled engineer or technician, but we are all humans and probably have all missed the "simple stuff" before. You're going to see these again in the discussion of communications diagnostics and the evidence that these problems exist that will appear.
Assuming you've thought of all the items above, then a good next step is to determine if TOP Server is receiving any response from the device using the Communications Diagnostics tool. This feature provides a streaming protocol view and real-time performance data for your communications driver. All read and write operations between the server and the device can be viewed in the diagnostic display window in the form of transmits (TX) and receives (RX).
The TOP Server Communication Diagnostics can be found by right clicking on the configured device or channel as shown below.
As you can see below, each send and receive is timestamped. This helps with analyzing round trip communication time with a device to diagnose responsiveness. You can easily see how long it took a device to respond to the last request, allowing you to confirm if an issue exists with the device or something else. That device response time is very helpful when there are performance issues. You might be surprised at how fast or slow some devices respond. Remember control devices are there to control. Communications is secondary. A heavily loaded PLC might not be responding as fast as you think it is. Or network issues may show up as highly variable response times.
The bottom of the diagnostic window also shows the number of sends and receives and if they are successful or not. These values can also be monitored in system tags in your client application and even be reset via a special system tag. Click here for more information on how to harness the power of System and Statistics tags.
There are 3 common scenarios you might encounter in the Communication Diagnostics:
In our example case, we are experiencing scenario number one, No Tx or Rx. This tells us we need to troubleshoot our TCP connection to the device itself. Note the communications diagnostics window screenshot above is NOT representative of this use case but was provided to show you what that tool looks like.
Because we do not see any evidence of communication with the device in our Channel Diagnostics, our next step should be to determine if our device is even reachable. Is it even plugged in? There are many tools available for troubleshooting Ethernet connections. Today we will cover Ping, TNC Command, and Netstat.
When troubleshooting TCP connections, often users use the Ping command to determine if a remote host is reachable. This command is meant to tell a user if a particular destination IP address exists and if it can accept ICMP requests. The shortcomings of the Ping command can be summarized in that the command simply does not give us enough information to make an intelligent conclusion of why a controller or network node is not communicating.
If a Ping is successful, we now know that there is a “black box” at the given IP destination. What we need to now know is if there a TCP port actively listening for our connection attempts. This brings us to the Test Network Connection (TNC) command in PowerShell.
The TNC command helps you determine whether a TCP connection can be established with the device over a specified port. You will receive the response TcpTestSucceeded: True or False. This response indicates whether the TCP Connection can be made to the device/remote host on the given port.
Lastly, the netstat command can also be used to show the current network status of the device. The output of this command will show:
For our use case, we first ran the Ping command and found there was no response. This told us our device was not reachable. To confirm, we decided to take the extra time and run the TNC command from Windows PowerShell. The result was TcpTestSucceeded: False. This again verified our device was not reachable.
After examining the device, we saw the ethernet cable was dislodged from its plug. After fixing the connection we received successful results from all the above commands and data started to flow to all clients connected to TOP Server. Success!
We highly recommend that you also consult the Interactive TOP Server Troubleshooting Flowchart available in our knowledgebase as a tool to help logically walk through what we just covered.
We hope this information has been helpful in getting you started tackling communication issues. Check out our TOP Server Learning Resources for more information and tutorial videos to help you get started collecting real-time data from your devices using TOP Server.
As always, please contact our support team with any questions and don't forget to subscribe to our blog to find out about the latest updates to TOP Server.
Ready to try TOP Server with your devices? Download the fully-functional free trial.