Sync via Distributed Clock with Redundancy #621

unamehere · 2022-06-23T08:21:23Z

Hi,
for a high demanding control application I'd like to use distributed clock with redundant operation. But like already discovered in #288 there are some Issues. Currently I try to find a workaround, but the suggested fix with deactivating the output / loop back port of the last device in the chain seems to not work for me.
Instead of modifiying the SOEM Source, I added the commands into my application like:

int8_t portSetting = 0xfc;
ec_FPWR(ec_slave[ec_slavecount].configadr, 0x101, sizeof(portSetting), &portSetting, EC_TIMEOUTRET);
/* configure DC options for every DC capable slave found in the list */
ec_configdc();
portSetting = 0xf0;
ec_FPWR(ec_slave[ec_slavecount].configadr, 0x101, sizeof(portSetting), &portSetting, EC_TIMEOUTRET);

But I still get a wrong measurement of the propagation delay of the last slave and the sync pulses don't get activated.

If anyone has an Idea how to get it working I'd be very glad.

The text was updated successfully, but these errors were encountered:

ArthurKetels · 2022-06-24T21:56:04Z

Can you show the output of slaveinfo in singe and redundant mode. It shows the ports that are active in each slave.
A better method is to bring the redundant port link down, then do ec_configdc(), then redundant port link up. In auto port mode (default) the slaves figure out by themselves which port to close down.

unamehere · 2022-06-27T10:08:07Z

Thank you very much Arthur. This Approach works for me.

Without shutting the port down i got this response:

Slave:1 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:0.1
         Out:4daff1c0,  17 In:4daff1f3,   9
Slave:2 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:780.1
         Out:4daff1d1,  17 In:4daff1fc,   9
Slave:3 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:213038954.1
         Out:4daff1e2,  17 In:4daff205,   9

But with shutting down the redundant port, wait for a few milliseconds, performing ec_configdc() and activating the port again, the measured delays are correct.

Slave:1 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:0.1
         Out:b5b331c0,  17 In:b5b331f3,   9
Slave:2 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:780.1
         Out:b5b331d1,  17 In:b5b331fc,   9
Slave:3 Name:TMC8462-BOB_SPI_M Output size:136bits Input size: 72bits State: 4 delay:1600.1
         Out:b5b331e2,  17 In:b5b33205,   9

But if I try redundant operation, by unplugging the Ethernet cable between salve 2 and slave 3, it seems like the time synchronisation gets lost somehow.
First image: Normal operation all slaves connected on an unbroken ring ( yellow: sync slave 0 (reference clock), blue: sync slave 3, red: spi_cs (end of frame indicator)

second image : link between slave 2 and 3 is open ( yellow: sync slave 0 (reference clock), blue: sync slave 3, red: spi_cs (end of frame indicator)

now the frame gets send twice with a strange offset and the sync signal of slave 3 also get an offset.
That sending the frame on the redundant port introduces an offset in the frame send timing makes sense, but that the slave loses the reference time makes no sense to me.

Do you have an idea what could cause this behavior?

unamehere · 2022-06-27T13:22:29Z

ok thinking again this of course makes sense as the propagation delays calculated in the calibration aren't valid anymore. So synchronisation can't be guaranteed anymore. So in my case the redundant channel only can be used for something like a fast shutdown if a cable break is detected.

I found a interesting paper https://ieeexplore.ieee.org/document/7005297. It proposes to calculate all options for propagation delays on both ethernet ports and storing them in a list. If a cable break happens you only have to determine the new slave chain(s) and search for the appropiate delays and apply them.

But I don't know how easily this method could be integrated into SOEM.

ArthurKetels · 2022-06-27T20:15:35Z

The combination of cable redundancy and DC is a fundamental problem of observability. Cable redundancy in EtherCAT is nothing more than the creation two separate EtherCAT rings out of one, where the separation can be at any link.

Let's assume an EtherCAT configuration:

M1 -> rs1 - s2 - s3 - s4 - s5 - s6 -> M2
 |                                    |
 +------------------<-----------------+

where Mn = master n, rsn = DC reference slave n, sn = slave n. The arrows give the direction of the packet flow.

M1 sends regular packets with PDO, SDO and MAC address A. M2 sends NOP packets with MAC address B. M2 receives packets of M1 and M1 receives packets of M2.

When no link is broken this system is stable and observable. All slaves can see the reference clock rs1, and the propagation delay between each slave and the reference is fixed. As stated in the post above the measurement needs to be done without M2.

Now the link between s4 and s5 breaks:

M1 -> rs1 - s2 - s3 - s4   s5 - s6 -> M2
 |                    |    |          |
 +----------<---------+    +----<-----+

M1 now receives packets from M1, and M2 from M2. SOEM detects this from the received MAC address. Instead of sending NOP packets on M2 SOEM now copies the received packets from M1 and sends them over M2. When received back the result is used by SOEM as a completed slave traversal. The work counters should now be as complete as in the first case. The thing that is different is the timing. The time from M1 rx to M2 tx (Tred) is now added to all slaves after s4.

For PDO and SDO transfers this is of little consequence, but for clock distribution this is a problem. Each slave clock is corrected by comparing the internal clock with the value of the reference clock corrected with the propagation delay. Because slave 5 and 6 now have Tred added this correction is no longer sufficient. To make things worse Tred is not constant because of how packets are handled in the master. In standard Linux systems jitter can be many microseconds. So a simple solution like adding a fixed number to the propagation delay of s5 and s6 is not sufficient.

The problem is that slaves s5 and s6 loose their ability to observe rs1 directly. Some unknown and varying quantity of software time is added.

Solution 1 (software) : Assuming Tred has some known (Gaussian) probability function and sigma is narrow enough to satisfy the timing constraints of slaves s5 and s6, we could calculate the optimum correction time to the slave propagation delay. Slave 5 and 6 would have a less stable DC but on average they still track rs1. Pro: Simple. Con: OS timing jitter is not Gaussian, and changes with system load and configuration.

Solution 2 (hardware) : Leverage hardware time-stamping functions of NIC's. Some cards have hardware supported time stamps of received and transmitted packets. Tred then becomes measurable, and as a consequence rs1 is again observable for s5 and s6. The problem is that OS support for time-stamping is lacking. So we are stuck with writing special drivers.

Solution 3 (hardware) : Use of a special reference slave. The configuration now looks as:

M1 -> rs1a - s2 - s3 - s4   s5 - s6 -> M2
 |                     |    |          |
 +-----------<---------+    +-<--rs1b<-+

Our reference slave is special in that it has two ESC instances that have a common clock source. One ESC is rs1a the other ESC is rs1b. They both reside on the same PCB. Because they share a common clock their internal counters can never deviate. The special location the slaves are connected to M1 and M2 make that the default propagation mechanism (FRMW) works as intended. S2 to s4 reference from rs1a, s5 and s6 reference from rs1b, rs1a and rs1b are connected and synchronous.

Intentionally I kept the example simple. There are complex structures possible, slaves can have up to 4 ports, not all slaves need to have the standard input->output port order. Sometimes a link disappears for a few cycles and then comes back online. And it is also possible we loose not only a link but slaves too. The master software needs to take all these complications into account.

P.s : I also do consultancy for high-availability systems based on EtherCAT. Often these are triple-redundant systems with special (FPGA) hardware support. But still use COTS hardware for master(s) and slaves. PM me for details.

chenguang3312 · 2022-06-28T03:37:32Z

int8_t portSetting = 0xfc; ec_FPWR(ec_slave[ec_slavecount].configadr, 0x101, sizeof(portSetting), &portSetting, EC_TIMEOUTRET); /* configure DC options for every DC capable slave found in the list */ ec_configdc(); portSetting = 0xf0; ec_FPWR(ec_slave[ec_slavecount].configadr, 0x101, sizeof(portSetting), &portSetting, EC_TIMEOUTRET);

Hello, I just started learning soem also encountered a similar problem, how to operate the slave update DC clock in the running state, this part of the slave redundancy configuration how to switch control.How is this part of the code written and can I share test demo?

ArthurKetels · 2022-06-28T06:26:49Z

@chenguang3312, please do not hijack an issue. Although your question might seem similar, the discussion gets convolved very quickly. If you have specific questions please open a new issue.

unamehere · 2022-06-28T06:49:04Z

Hi Arthur,
thank you very much for your great explanation.
I try to first take a look at the software solution.
Until I get the synchronisation to work realiable, I'll use the redundant port just for emergency shutdown if a link failure is detected.
I think there isn't much to add to this issue so I close it.

unamehere closed this as completed Jun 28, 2022

ArthurKetels mentioned this issue Nov 24, 2022

Question about SEOM Redundancy Mechanism #664

Closed

jamwaffles mentioned this issue Feb 14, 2025

Redundant network interface support ethercrab-rs/ethercrab#98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync via Distributed Clock with Redundancy #621

Sync via Distributed Clock with Redundancy #621

unamehere commented Jun 23, 2022

ArthurKetels commented Jun 24, 2022

unamehere commented Jun 27, 2022

unamehere commented Jun 27, 2022

ArthurKetels commented Jun 27, 2022 •

edited

Loading

chenguang3312 commented Jun 28, 2022

ArthurKetels commented Jun 28, 2022

unamehere commented Jun 28, 2022

Sync via Distributed Clock with Redundancy #621

Sync via Distributed Clock with Redundancy #621

Comments

unamehere commented Jun 23, 2022

ArthurKetels commented Jun 24, 2022

unamehere commented Jun 27, 2022

unamehere commented Jun 27, 2022

ArthurKetels commented Jun 27, 2022 • edited Loading

chenguang3312 commented Jun 28, 2022

ArthurKetels commented Jun 28, 2022

unamehere commented Jun 28, 2022

ArthurKetels commented Jun 27, 2022 •

edited

Loading