Dear PlatON node partners, community members,
Thanks a lot for your great support on help troubleshooting the block production bug on new Baleyworld test net since the very beginning it occurred on Mar.9, 2020. Now we are excited to tell that, with the joint effort of all node partners, community members and PlatON staff, the upgrade of PlatON Baleyworld test net has succeed, and the block production has resumed!
All node partners and community members are just the watchmen of the whole PlatON ecosystem and the most reliable partners of us all the time. We PlatON express our sincere gratitude to what you’ve done for us. Following is the detailed event log.
Mar. 9, 2020, 6:56 (GMT+8)
PlatON test net failure occurred, node partners including Bit Cat, Nodeasy, Wetez, etc. provided us their error logs, and Cobo provided us the error message, helping locate the issues and improving the efficiency on troubleshooting.
Mar.9, 2020, 13:59 (GMT+8)
PlatON notified all community members about the reason caused the network failure: an abnormal event relevant to governance caused the error state of a node, and then the block production issue occurred when the election among 101 candidates performed. Plus, PlatON suggested on applying community-based governance upgrade, and get approved by all community members.
Mar.9, 2020, 19:36 (GMT+8)
PlatON updated the reason caused the network failure: an exception occurred on validators’ on-chain status data, and then the logic of block producing couldn’t work as normal when the epoch election worked to select the next candidate. As a result, the block production and verification couldn’t work as normal.
In addition, PlatON introduced the potential risk of upgrade and the upgrade proposal based on the risk evaluation, namely, set a time interval, let the validators that haven’t participated in the verifying finish the upgrade first, and then enable the verifying validators to upgrade uniformly at a specific time by applying the script upgrade.
PlatON received plenty of suggestions on the upgrade proposal.
Mar.10, 2020, 11:22 (GMT+8)
PlatON initiated a vote on the starting time of the script upgrade for the 25 nodes that has issues on block production, and the starting time of regular upgrade for the rest nodes, in the Galaxy Rally group on WeChat. Many nodes voted at the first time.
Mar.10, 2020, 18:16 (GMT+8)
The vote finished, and finally we have the results:
According to the votes, the 25 nodes that has issues on block production will start their upgrade on Mar.11, 2020, 14:00-16:00 (GMT+8), while the rest nodes will start their upgrade on Mar.11, 2020, 8:00-14:00 (GMT+8).
Meanwhile, PlatON has solicited online suggestions on the upgrade guide of v 0.10.1, and the regular script upgrade for the 25 nodes that has issues on block production.
Also, we received plenty of suggestions on the upgrade proposal.
Mar.10, 2020, 22:07 (GMT+8)
PlatON team has received 15 suggestions on upgrade from node partners, and replied all of them.
“We sincerely welcome all kinds of suggestions and even criticisms ” The Head of PlatON Community Mr.Kai Yu said, and PlatON Chief Architectet Mr. Shenglin Li added “Please feel free to contact us if you have any questions and problems during the Galaxy Rally”.
Mar.11, 2020, 3:17 (GMT+8)
Combined with 6 suggestions on upgrade from node partners, we finally have the script for node upgrade. And the author of the script sent the upgrade guide and the download link of script to PlatON team over email.
Mar.11, 2020, 8:02 (GMT+8)
PlatON team sent the upgrade guide to all node partners, and the rest nodes started the upgrade. We received many reports and suggestions during upgrade, and recorded for further improvement.
Mar.11, 14:00 (GMT+8)
The 25 nodes that have issues on block production started running the script. To ensure that all of the nodes can upgrade at the same time, the script will be started on Mar.11, 16:00 (GMT+8).
Mar.11, 2020, 16:00 (GMT+8)
The 25 scripts started running, however, the network still failed in running.
PlatON team checked the error logs for troubleshooting. According to the error logs from 2 validators, there is a huge difference and synchronization issue on the node view. As a result, each node produced blocks independently, and don’t recognized and finished the signature to each other.
Mar. 11, 2020, 16:38:41 (GMT+8)
The block production on new PlatON Baleyworld test net resumed! And a great cheer went up in the Galaxy Rally group on WeChat!
Following is what happened on Mar. 11, 2020, 16:00 – 16:40(GMT+8): During the period of block production issues, part of the validators quit, and their view numbers stopped increasing. However, the online validators kept switching views and their view numbers kept increasing. As a result, the difference of view numbers between online validators and offline validators kept growing, and even increased to be 2000 on Mar. 11, 2020, 16:00(GMT+8). As the views of all validators are not the same, the block production couldn’t resume, while the validators need to sync the view one by one. The timeout of each view is 40s, which is also the synchronization time of view. Thus, 40 mins later, 18 validators finally reached the same view and the block producing resumed.
So far, we’ve witnessed a significant upgrade on the PlatON new Baleyworld test net with the deep engagement of and contribution from PlatON community. We PlatON express our sincerely gratitude to all watchmen who deeply engaged and shown great support in this significant upgrade.