Home >Industry dynamics>Industry dynamics
Practical tutorial on packet loss and retransmission

Lost packet retransmission is a very important function in data communication applications, especially in wireless communication, it is an indicator that determines the advanced level of communication protocols.

First, how to detect packet loss:
If you want to retransmit the packet loss, you must first detect the packet loss. If there is no packet loss, there will be no retransmission. In wireless communication, there are usually two ways to detect packet loss.

carrier sense
Carrier sensing is a common packet loss detection method, and CSMA/CA is developed on the basis of carrier sensing. CSMA/CA can also be regarded as a retransmission mechanism, which is adopted by WIFI and zigbee that we often use. A communication device opens receive for a short period of time before sending a message. In this short period of time, the wireless unit will detect whether there are other interference signals in the same frequency band. For example, when a zigbee device is doing carrier sense, it must detect whether there is WIFI, Bluetooth is transmitting signals, and whether there are other interference signals. Zigbee is transmitting a signal. If the interference signal comes from WIFI or Bluetooth, the zigbee device will measure whether its power is as high as its own. If the power is not as high as its own, it will overwhelm it. The device, regardless of whether the power of the other party is higher than its own, will actively drop packets and let others go first.

response mechanism
Another way to judge packet loss is to add a response mechanism. Usually, communication protocols have an OSI seven-layer model. Starting from the link layer in the seven-layer protocol, each layer can add a response mechanism. The lower the level and the closer to the hardware, the faster the response of the acknowledgment mechanism.

IOT wireless protocol

OSI seven layer model

We still take ZigBee's response mechanism as an example. Zigbee's OSI model, which has a response mechanism, is currently limited to the MAC layer (data link layer) and the APS layer (transport layer). However, in practical applications, many times a response mechanism is added to the application layer. The response of the MAC layer is the fastest response, also called MAC-ACK, which is usually automatically generated by the hardware in the wireless transceiver unit of zigbee. After receiving the zigbee data frame, the receiving device sends it out in the form of broadcast 120 microseconds. At the same time, MAC-ACK is also the shortest frame in zigbee, with a frame length of only 5 bytes, plus a total of 11 bytes for the frame preamble and synchronization frame. According to zigbee's 250kbps transmission rate, each byte needs 32 microseconds, and the duty time of a MAC-ACK frame is 352 microseconds. That means that after sending a MAC frame, the sender will receive the MAC-ACK corresponding to the MAC frame after 120+352=472 microseconds. Similarly, the MAC layer of zigbee also stipulates that if the sender does not receive the corresponding MAC-ACK within 540 microseconds, it is considered a packet loss.

MAC-ACK is sent by broadcast. Firstly, it can reduce the address field in the MAC-ACK frame, reduce the frame length, and make the ACK frame duty cycle shorter. The sender can judge whether it is its own according to the frame number in the MAC-ACK frame. MAC-ACK; Secondly, MAC-ACK adopts the broadcast method, which can also remind other zigbee devices that they are communicating. If other zigbee devices are also performing carrier sense at this time, they can actively avoid the communicating devices. In the MAC layer of zigbee, carrier sense and MAC-ACK are two-pronged, which can ensure the accuracy of packet loss detection. In addition, zigbee broadcast messages will not generate MAC-ACK.

Diversified packet loss detection mechanism

In addition to the response mechanism at the MAC layer, zigbee also has a response mechanism at the transport layer and application layer. Zigbee is a multi-hop Mesh network, and the transmission at the MAC layer can only satisfy single-hop transmission, so Zigbee also responds at the transport layer, also called APS-ACK. The Zigbee sender transmits a message to the zigbee receiver, which will be forwarded by multiple zigbee router nodes in the middle. After receiving the message, the receiving end will send the APS-ACK to the sending end through the same routing path, and the sending end will consider that the sent message has reached the receiving end after receiving the APS-ACK. If the sender does not receive APS-ACK after 6 seconds (the default value), it will consider the data packet lost.

ZigBee systems usually only open interfaces to the application layer. The most common interface open to the application layer is a detection interface called "AF Data Confirm", which combines MAC layer packet loss (including MAC-ACK packet loss and Carrier sense packet loss), network layer packet loss, transport layer packet loss and other underlying packet loss information. Zigbee's top-level application can know whether the currently sent message is lost or not.

application layer response

The packet loss detection of the MAC layer and the transport layer belongs to the packet loss of the system layer. Another kind of packet loss is the packet loss at the application layer. For example, a dimmer switch sends an instruction "turn the brightness to 50%" to an air conditioner. If this instruction does not drop packets, what will be the result? It must be that the air conditioner will execute "adjust the brightness to 50%", but the air conditioner only has the temperature but no brightness, which means that this command is sent to the wrong target. At this time, a response from the application layer is needed to solve this problem. For example, after the air conditioner receives "adjust the brightness to 50%", it can reply to the sending end with an application layer response "see that I am not a light bulb".

Second, the system layer retransmission mechanism:
If there is packet loss, there will be retransmission. For different packet loss, the retransmission strategy is also different. We still use the retransmission mechanism of zigbee as the entry point to analyze the retransmission mechanism of the communication protocol.

CSMA/CA mechanism:
CSMA/CA is a retransmission mechanism used in conjunction with carrier sense. When we talked about carrier sensing, we said that its principle is to receive for a period of time, and the retransmission mechanism of CSMA/CA is to control the listening time.
When Zigbee's MAC layer sends a message, it will randomly listen for a period of time. This random time is also particular. We all know that the time for zigbee to transmit a byte is 32 microseconds, and the MAC layer stipulates that the transmission time of 10 bytes is 320 microseconds as a "backoff period". When the MAC layer sends data for the first time, it randomly listens to the carrier for 1 to 8 backoff periods, that is to say, the time for listening to the carrier may be 320 microseconds to 2.56 milliseconds. Assuming that there are 2~3 zigbee devices sending MAC layer messages at the same time, according to the probability distribution, they will not detect each other's carrier with a high probability, so they all have a chance to successfully obtain the sending window. However, if the data of zigbee devices sent at the same time increases, some zigbee devices will definitely not be able to grab the sending window, resulting in carrier sense packet loss, and retransmission will be required at this time.


CSMA CA.png

The retransmission of CSMA/CA is also particular. Since there will be collisions in 1~8 random backoff periods, the scope of the backoff period is simply doubled, and 1~16 random backoff periods are listened to during retransmission. Does this reduce the probability of avoidance? If it is not enough, the next retransmission will be 1~32 random backoff periods... But if there is always channel conflict, it is impossible to retransmit endlessly, right? And each retransmission will expand the range of random backoff, which is a bottomless pit. Therefore, usually the MAC layer of zigbee will tell the application layer through "AF Data Confirm" after three times of retransmission due to packet loss due to carrier sense, "I tried my best, but I can't do it", and the application layer decides what to do.

Response packet loss retransmission at the MAC layer:
In the zigbee protocol, if the MAC-ACK is not received when the MAC frame is sent, the MAC layer will automatically retransmit the MAC frame 3 times. Different from each retransmission of carrier sense, which needs to increase the interval time, the retransmission of the MAC layer will not increase the interval time. If all three retransmissions fail, the "AF Data Confirm" will also be used to inform the application layer that the packet is lost and there is no way to recover.

MAC lost packet retransmission
However, carrier sense is still performed for each retransmission of the MAC frame. If the retransmission of the MAC layer encounters a carrier sense conflict, CSMA/CA retransmission will also be induced.

APS-ACK packet loss retransmission:
Zigbee's transport layer retransmission is used to ensure that the message has not been transmitted to the final device. After sending the message, the APS layer waits for 6 seconds, and continues to retransmit if no APS-ACK is received. Usually APS retransmits twice, the first retransmission is 6 seconds later, and the second retransmission is 12 seconds later. If the last retransmission fails, the APS layer will report a "death notification" to the application layer through "AF Data Confirm".

Three, the retransmission strategy of the application layer:
The retransmission at the system layer, whether it is CSMA/CA, MAC retransmission, or APS retransmission, is a mechanical and rigid strategy. The retransmission mechanism at the system layer is characterized by retransmissions 2 to 3 times. If the retransmission fails, it will report the failure to the application layer through "AF Data Confirm".
However, the application layer is the most intelligent in wireless transmission, and it is also the layer where retransmission strategy design can be carried out. The application layer should design packet loss and retransmission strategies according to the application environment and the importance of messages. We take the application of zigbee as an example to design the retransmission strategy.


Retransmission of CSMA/CA failure:

Usually the zigbee device fails CSMA/CA, either the environment interference is too strong, or the number of zigbee devices sending messages at the same time is too large and exceeds the maximum window of CSMA/CA.

If there are too many devices sending messages at the same time, a "manual avoidance" mechanism must be added at this time. For example, many nodes upload messages to the coordinator at the same time, and a node detects carrier conflict and packet loss through "AF Data Confirm". At this time, you can write down the packet loss message, and then delay it for a random period of time, staggering the sending peak for retransmission. Of course, this random time range is much larger than the random time of CSMA/CA in the MAC layer. The retransmission of the application layer can be retransmitted randomly for 1~4 seconds, with 0.1 second as the minimum unit. If carrier conflict occurs again, the random time range can be doubled. In particular, the number of nodes in zigbee network applications is a dynamically variable factor. When the number of nodes is large, it is not necessary to care about the real-time performance of data, but to ensure that the messages of each node can be received, and "the more devices The more time-consuming" logic is also scientific and reasonable.


The conflict detection mechanism provided by the Zigbee system can only detect conflicts, but cannot distinguish between transmission conflicts and malicious signal interference. Therefore, it is possible that the application layer has been retransmitting, and "AF Data Confirm" has been reporting carrier conflicts. At this time, the application layer needs to be "smart". It is possible that the zigbee device encounters continuous interference signals. The continuous interference signal can only be solved by violent means, and the source of interference can be found and destroyed by radio positioning equipment.

The system acknowledges the retransmission of lost packets:
The following strategies can be adopted for system response packet loss, including MAC-ACK packet loss and APS-ACK packet loss. The system layer of ZigBee has a Mesh routing design, and it will look for a shortest routing path during data transmission. When MAC-ACK packet loss occurs and MAC retransmission fails, zigbee's routing algorithm will calculate a new path. And APS-ACK is the ultimate goal of packet loss. In this case, it is most likely that the final goal is broken, or the final goal does not exist at all. In addition, if the final destination does not need to enter the route, the zigbee system directly sends the MAC frame, and directly reports the MAC-ACK packet loss.
Response packet loss occurs and retransmission fails, because the system layer has already performed MAC retransmission and APS retransmission. Therefore, instead of retransmitting immediately, you can wait for a period of time (10 seconds to 1 minute) before continuing to retransmit. If the retransmission is successful, it means that the target device is fine, but there was an accident just now; if the retransmission fails, it can be suspected that there is a problem with the target device, and you should not send any messages to the problematic target device in the future. If there is a faulty device in the zigbee network, other devices will consume network resources to route and address it when sending messages to it. Not only will there be a long delay, but there will be no good results. Therefore, once there is a faulty device in the network, the application layer should avoid sending any messages to the faulty device.

How to handle the response from the application layer:
The response at the system layer can only indicate whether the sent message has reached the target. The response at the application layer not only indicates that the message has reached the target, but also has the function of indicating the execution result of the message. The English response of the system layer is called "acknowledge" abbreviated "ACK" and the application layer response is called "response" abbreviated "RSP". The ACK is generated immediately after the target device receives the message, and the RSP is the reply after the target device processes the message. The waiting time must be added to the processing time of the target device, which is highly unpredictable. Therefore, usually in the control application of zigbee, when sending a short message, only enable MAC-ACK but not APS-ACK, and then wait for RSP. In addition to being used to determine whether packets are lost, RSP can also determine what message to send next time, so that a closed-loop control system can be formed.

Closed-loop negative feedback system, RSP response can be used as a negative feedback channel
When the receiving end sends an RSP response to the sending end after receiving the message, in order to ensure that the RSP can reach the sending end, it usually turns on APS-ACK instead. In this way, even if the RSP response packet is lost, the retransmission mechanism of the transport layer will automatically allow the RSP response to be retransmitted. In addition, it is stipulated in zigbee 3.0 that a zigbee device can use any of its own status values as a heartbeat packet to report to another device. When the heartbeat packet is sent out is controlled by the system layer without intervention by the application layer, and the application layer will not interfere whether the heartbeat packet is sent. packet loss. Therefore, when the heartbeat packet is transmitted, APS-ACK is usually enabled and RSP is not enabled.