-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync via Distributed Clock with Redundancy #621
Comments
Can you show the output of slaveinfo in singe and redundant mode. It shows the ports that are active in each slave. |
ok thinking again this of course makes sense as the propagation delays calculated in the calibration aren't valid anymore. So synchronisation can't be guaranteed anymore. So in my case the redundant channel only can be used for something like a fast shutdown if a cable break is detected. I found a interesting paper https://ieeexplore.ieee.org/document/7005297. It proposes to calculate all options for propagation delays on both ethernet ports and storing them in a list. If a cable break happens you only have to determine the new slave chain(s) and search for the appropiate delays and apply them. But I don't know how easily this method could be integrated into SOEM. |
The combination of cable redundancy and DC is a fundamental problem of observability. Cable redundancy in EtherCAT is nothing more than the creation two separate EtherCAT rings out of one, where the separation can be at any link. Let's assume an EtherCAT configuration:
where Mn = master n, rsn = DC reference slave n, sn = slave n. The arrows give the direction of the packet flow. M1 sends regular packets with PDO, SDO and MAC address A. M2 sends NOP packets with MAC address B. M2 receives packets of M1 and M1 receives packets of M2. When no link is broken this system is stable and observable. All slaves can see the reference clock rs1, and the propagation delay between each slave and the reference is fixed. As stated in the post above the measurement needs to be done without M2. Now the link between s4 and s5 breaks:
M1 now receives packets from M1, and M2 from M2. SOEM detects this from the received MAC address. Instead of sending NOP packets on M2 SOEM now copies the received packets from M1 and sends them over M2. When received back the result is used by SOEM as a completed slave traversal. The work counters should now be as complete as in the first case. The thing that is different is the timing. The time from M1 rx to M2 tx (Tred) is now added to all slaves after s4. For PDO and SDO transfers this is of little consequence, but for clock distribution this is a problem. Each slave clock is corrected by comparing the internal clock with the value of the reference clock corrected with the propagation delay. Because slave 5 and 6 now have Tred added this correction is no longer sufficient. To make things worse Tred is not constant because of how packets are handled in the master. In standard Linux systems jitter can be many microseconds. So a simple solution like adding a fixed number to the propagation delay of s5 and s6 is not sufficient. The problem is that slaves s5 and s6 loose their ability to observe rs1 directly. Some unknown and varying quantity of software time is added. Solution 1 (software) : Assuming Tred has some known (Gaussian) probability function and sigma is narrow enough to satisfy the timing constraints of slaves s5 and s6, we could calculate the optimum correction time to the slave propagation delay. Slave 5 and 6 would have a less stable DC but on average they still track rs1. Pro: Simple. Con: OS timing jitter is not Gaussian, and changes with system load and configuration. Solution 2 (hardware) : Leverage hardware time-stamping functions of NIC's. Some cards have hardware supported time stamps of received and transmitted packets. Tred then becomes measurable, and as a consequence rs1 is again observable for s5 and s6. The problem is that OS support for time-stamping is lacking. So we are stuck with writing special drivers. Solution 3 (hardware) : Use of a special reference slave. The configuration now looks as:
Our reference slave is special in that it has two ESC instances that have a common clock source. One ESC is rs1a the other ESC is rs1b. They both reside on the same PCB. Because they share a common clock their internal counters can never deviate. The special location the slaves are connected to M1 and M2 make that the default propagation mechanism (FRMW) works as intended. S2 to s4 reference from rs1a, s5 and s6 reference from rs1b, rs1a and rs1b are connected and synchronous. Intentionally I kept the example simple. There are complex structures possible, slaves can have up to 4 ports, not all slaves need to have the standard input->output port order. Sometimes a link disappears for a few cycles and then comes back online. And it is also possible we loose not only a link but slaves too. The master software needs to take all these complications into account. P.s : I also do consultancy for high-availability systems based on EtherCAT. Often these are triple-redundant systems with special (FPGA) hardware support. But still use COTS hardware for master(s) and slaves. PM me for details. <edit : changed solution 3 drawing> |
Hello, I just started learning soem also encountered a similar problem, how to operate the slave update DC clock in the running state, this part of the slave redundancy configuration how to switch control.How is this part of the code written and can I share test demo? |
@chenguang3312, please do not hijack an issue. Although your question might seem similar, the discussion gets convolved very quickly. If you have specific questions please open a new issue. |
Hi Arthur, |
Hi,
for a high demanding control application I'd like to use distributed clock with redundant operation. But like already discovered in #288 there are some Issues. Currently I try to find a workaround, but the suggested fix with deactivating the output / loop back port of the last device in the chain seems to not work for me.
Instead of modifiying the SOEM Source, I added the commands into my application like:
But I still get a wrong measurement of the propagation delay of the last slave and the sync pulses don't get activated.
If anyone has an Idea how to get it working I'd be very glad.
The text was updated successfully, but these errors were encountered: