DISCUSS: Network Issues w/ Shade Airdrop - 2/21/22

I just want to say that the Active Validators have been amazing at cross-coordinating resources like strong peers and troubleshooting hardware. There have been lots of vocal and active teams contributing to stabilizing the back-bone of the network as we churn through the txs. It would be great in the post-mortem to just acknowledge the active validators actively troubleshooting. There are at least a few validators that had low uptime but were actively trying to optimize their nodes for their delegates. In the meantime lots of activity behind the scenes to ensure that things run smoother minute by minute.

10 Likes

100%. Communication in the TG group has been fantastic, and node uptime is now actively improving across the board. Hopefully, everyone sharing peers provides at least enough ā€œummphā€ to get us through the rest of the Shade claiming process; Iā€™m keeping a close watch on this from our end.

Agree that those actively engaged in resolving this should get some credit in the post-mortem! Well done.

Little by little, this is how we improve things and continue to blaze that Secret Trail!

4 Likes

Iā€™ve been monitoring the TG validator channel, but canā€™t really contribute because Iā€™m not that technical. Laura is on a road trip (10 hours!) and isnā€™t able to join in, but Iā€™m keeping her up-to-date as much as I can, especially about the peer list. Iā€™d like to know, since the peer list is making a big improvement, why hasnā€™t it been shared? (Julia | SecretChainGirl)

3 Likes

Itā€™s unlikely that everyone can use the exact same peers since there are 70 validators. There have been suggestions to create a ā€˜publicā€™ peer list for a while (mainly by @mohammedpatla) and it seems that that is now gaining traction as it now contains a fair number of nodes. Once that list gets large enough it should spread out the load nicely.

@dylanschultzie suggested to create around five batches that share peers with each other, which would create groups of 14 validators. Thatā€™s definitely an approach to consider.

I will add this point of discussion to todayā€™s (Wednesday) governance call, starts at 16h00 UTC on the official discord. Hope to see you there :slight_smile:

2 Likes

One way to improve peering is to require everyone to turn on seed mode for their nodes.

1 Like

As mentioned in the Validator tg, I support a temporarely reduction in block size.

An explanation for the less technical to follow along what this means and how it could help.

TPS (transactions per second) is a function of average gas per tx, total gas per block (block size), and block time.

If we assume 100k gas for the average tx. Secret Network currently has about 16-17 TPS || 10,000,000 / 100,000 / 6 = 16,67 || (blocksize / avg. tx / block time).

The network currently peaks at about 30k transactions per day, you can find this on the node explorer: https://secretnodes.com/secret/chains/secret-4/transactions

30k tx per day is an average of 0.35 tx/sec || 30,000 / (24 * 60 * 60) || In other words, we arenā€™t anywhere near our current maximum TPS.

The congestion is a result of a sudden concentrated spam of txs, usually associated with the launch of an NFT or DApp. Some nodes arenā€™t able to process the intense computational load and they start falling out of sync. The consensus algorithm starts to increase the block time to allow nodes to catch up to ensure enough nodes are online to reliably verify blocks. With the SHD airdrop the block time increased from 6 seconds to 100+ seconds.

The result from this increased blocktime is that the TPS drops from 16,7 seconds to 1 || 16,7 * (6/100) || this lower TPS comes at a moment the network needs it most.

If we lower the block size we reduce the TPS of the network overall (temporarily until we have more permanent fixes). Even if we half the block size, it shouldnā€™t impact the network as we currently only use 2% of the available TPS || 0,35 / 16,7 || However, having a lower block size means that more nodes can keep up with the network in times of peak load. Meaning that the consensus algorithm wonā€™t throttle the block time (as much) during peak loads. That in turn should increase the effective TPS in the moments that we need it most (comparatively to the TPS we have now at times of peak load).

10 Likes

Excellent explanation @Stefan_DomeriumLabs regarding the rationale behind it, and a shout-out to @baedrik for bringing this up in the first place. secretSauce also supports a temporary lowering of the maximum gas per block.

3 Likes

Whispernode supports. +1 on the great explanations as well!

3 Likes

Hello all - Order of Secrets supports lowering the maximum gas per block.

We fully support sharing peers and will be adding to the list ourselves asap. We understand that the list has not grown as quickly as hoped before now - if there is a reluctance for peers to be shared with the whole world perhaps setting up sub-lists to allocated validators that they agree to keep private would encourage more to share?

Either way, we think that splitting a (sizeable) list of peers into batches that different validators are allocated to is a sensible way to ensure they donā€™t all get tapped out. It seems that quality definitely trumps quantity and everyone benefits in the end. Distributing evenly across rankings might work as a fair method of allocating the peers (e.g. Batch 1 - 1, 11, 21, 31, 41, 51, 61, Batch 2 - 2, 12, 22, 32, 42, 52, 62 etc into 7 batches who circulate a peer list between them) - of course rankings change but this would be a reasonably simple method to get it off the ground.

Weā€™d also like to express our appreciation for the efforts that so many people have made to deal with a challenging couple of days on the network. Special thanks to @dylanschultzie, @jamama2354 and @pmuecke and of course to @anon60841010 for throttling the flow of claims while we get through this!

1 Like

The peer list is maintained on this public list.

3 Likes

Our experience has been that because the shared peer list (the same one @mohammedpatla has just posted) is quite short, and lots of people are connecting to those peers, they were actually getting tapped out. Also, during the heavy volumes yesterday, we tried adding the whole list and it made things worse not better because of getting timed out with connections - so quality is (arguably) better than quantity at least for public peers.

3 Likes

Yeah just to second. Jamama (myself) and the RC DAO validator support lowering the max gas per block. We also definitely see the need to have high quality peers. Partitioning them into smaller groups would be great since sharing the same peers across the board seems to have some negative amplifying effects when nodes start to drop. Iā€™m also for controlled stress testing of the network through adjusting the front-end bottleneck. It might help us identify other pinch points and optimizations we can take ahead of other launches or high network periods.

2 Likes
2 Likes

Iā€™d like to make a suggestion, forgive me if it doesnā€™t make sense as I am not a Secret Network expert.

Validators should NOT be fielding queries. Keep their system resources focus on writing blocks to the chain, running contracts, etc. The read layer should be a separate series of nodes, kind of like ā€œread-only slavesā€ to use a mysql term. This allows the two node types to scale independently as needed and without one affecting the other.

This is the desgin pattern that THORChain has been using and its been great!

1 Like

Suggestion before making suggestions:

First the good newsā€¦the lowered gas limit appears to have had a stabilizing effect as we were hoping
Screen Shot 2022-03-03 at 4.10.22 AM

The beginning of the flat-lining at the end corresponds exactly with the parameter change going into effect

But I think 4 mill might end up being too low. I forgot that the cost of a compute store tx did not go down with supernova like the cost of a compute execute tx did, so we are likely going to see contracts that use permits, and especially ones that build off of snip20s and snip721s requiring more than 4 mill gas to store. Snip721 in particular is pretty large to begin with, although projects could shave off functions they donā€™t need like Reveal, SetMinters, functions that are redundant and only there for strict cw-721 compliance, etcā€¦ (Theyā€™d definitely get it under 4 mill by removing permits, but I donā€™t think that is a viable option.)

But depending on how complicated their use case, they might be adding even more than what they are shaving. So Iā€™m wondering if we actually need to have the limit at 5 mill. I guess another alternative would be to apply similar gas savings to compute store txs as was done with other txs, but that might not be feasible or worth the work if we think a bump up to 5 mill would still give us noticeable block time stability (although likely somewhat higher times than what we are currently seeing). Unfortunately itā€™s hard to really guess if any current projects need even more than 5 mill to store, so it might be worth trying to get some input from them too

3 Likes

As an FYI for the community, Secret DreamScapeā€™s largest gas usage is 300,000 and NFT minting is 1M for a box of 10 mints

Those are executes. This is in reference to storing the contract code (tx compute store). Executes underwent drastic gas reductions with supernova, but the tx to store your contract was not made cheaper (I think even just storing the counter template contract is over 1 million gas)

Welp. Less than 12 hours later and this has already been proven to be the case. Weā€™ve already seen a team have to delay their launch this morning because the gas limit is too low now to store a contract. I did some digging and found 108 (out of 286) compute store txs that had a ā€œgas usedā€ of over 4 million, with 4 (including a couple of alter contracts) over 5 million. I think submitting a proposal to increase the gas limit to 6 million would be a good compromise, but Iā€™m open to other suggestions.

Hmmm, if only 4 are over 5 million. I could even see raising it to just 5 mill, and having teams that need to store larger contracts notify us when they will be ready. Then we create a proposal to raise it to what they need (with a buffer obviously), and create another proposal 3 or 4 days later to lower it back to 5. That way they have a 3 or 4 day window to do their single store tx, and we just inform teams so that no launches are done when the network is vulnerable with the higher limit

1 Like