DISCUSS: Network Issues w/ Shade Airdrop - 2/21/22

Hello to the Secret community!

Yesterday afternoon (Feb 21, 2022) we started seeing impacted network performance stemming from the launch of the Shade airdrop. Things are already improving, but we want to explain what happened and what users can now expect. In this post, we’ll outline:

  1. What exactly happened
  2. Why these issues arose
  3. What fixes are being made now
  4. What improvements will be made longer term

Thanks to the SCRT Labs team for their contributions to this post. @Cashmaney @assafmo

What’s Happening?

On Feb 21, 2022 at around 11pm UTC the Shade protocol airdrop was launched, drawing a lot of attention to Secret Network.

As a part of their airdrop mechanism, Shade heavily utilized secp256k1 signature verification in their contracts, which is very computationally expensive.

These transactions are causing blocks to slow down due to the time required to compute each block, causing the mempool to fill up which delays the execution of transactions. A further effect of blocks that take a long time to compute is that queries are slowed down as well, as currently a node cannot both compute a block and serve a request.

It’s important to realize that these were known issues, and we are already in the process of making improvements on both fronts.

Why Did This Happen?

The reason for the abnormal behavior is mostly due to nodes running an outdated WebAssembly engine, which does not handle long computations very efficiently. Also, gas calculations do not account for this inefficiency, which further compounds the issue.

What’s Being Done?

Firstly, we are in the final stages of testing a release which will greatly improve query performance - this will allow nodes to both serve many more requests, and lessen the impact of long block computations. This will help services like Keplr stay available during network-wide events. This will also not require a hard-fork, so node runners will be able to apply it immediately.

Secondly, we are working to improve execution performance. Expensive functions like secp256k1 verification will be exposed to contracts (instead of being executed inside the contract) which will make them much more efficient. We are also replacing our WASM engine with a newer, more performant one.

Lastly, we will also be re-evaluating gas calculation and pricing and try to adjust the gas to more accurately reflect the computational cost of each contract.

What Happens Next?

Short term:

  1. Query node upgrade will be released in a few days.

  2. Working more closely with airdrops & large projects to make sure things are smooth and efficient given network limitations

Longer term:

  1. Network upgrade (hard fork) that will let contracts take advantage of fast, built-in building-blocks for contracts and a better contract execution engine

Discuss!

This is an open thread for discussion by validators, users, and developers on their observations, possible next steps, open questions, and other comments. Please be respectful and constructive - our goal is to get everything working at full capacity ASAP (and to expand that capacity!)

29 Likes

Can we drop the instructions or details of said WebAssembly engine here for people that may need it? Might be better suited than getting lost in the telegram or discord chats.

10 Likes

The current WASM engine used by Secret Network is wasmi. Vanilla CosmWasm (and thus Terra & Juno) is using wasmer.io, which is way faster than wasmi but also doesn’t work inside SGX. A big chunk of the work that SCRT Labs did back in 2020 was to rewrite the CosmWasm engine to used wasmi instead of wasmer.io.

So essentially wasmi enabled us to have Secret Contracts back in 2020, and now we have to make wasmer.io work inside SGX to get to the next evolution of Secret Network. This is not an easy task, to say the least.

11 Likes

One ask we at SCRT Labs have – if you’re planning a big drop, please talk to us first to brainstorm on best tactics. Also, ideally, consider SCRT Labs timezone (EU) and avoid doing things when we’re sleeping. We can’t help or respond that way :slight_smile:

12 Likes

Because the network upgrade (and resulting hardfork) that would enable validators to better handle the level of contract executions we are currently seeing will not come for some time (it is listed as a long term item), I think we should consider whether lowering the gas limit per block could work as a band-aid while we wait for the new code.

Right now the gas limit per block is 10 mill, but as we have seen from previous launches, a noticeable number of validators start struggling around the 5-6 million mark. If we lower the gas limit (which makes each block smaller) so that the majority of validators can keep up, we would likely see better network stability–especially if it is lowered to a point that 67% of the staking weight is able to sign the block within the target 6 second block time. For reference, yesterday, with a limit much higher than most nodes were capable of handling, block times stayed over 100 seconds for 6 hours, and currently block times are 60+ seconds.

I believe the gas limit is just a consensus parameter change, which if true, means the limit could be lowered without needing any code update or hardfork, and could easily be changed back if no noticeable improvement is seen.

A limit that is significantly higher than most nodes can reach isn’t actually a limit, since the real limit is their own capabilities. Personally I think it is worth trying. Either we do nothing and wait for the improved tx code and struggle any time blocks have more than 5-6 million gas. Or we try a possible band-aid while we wait the same amount of time for the improved code, and ensure we don’t see blocks of that size.

As a reminder, supernova reduced tx gas cost to roughly 1/5 what it was before. So even if we drop the block gas limit to 4 or 5 million, that is still over twice the amount of computation being allowed in a block than what was possible before supernova, so if we were ok with that computation limit for over a year, dropping to just double the previous limit for however long it takes to get the improved code doesn’t sound like a huge loss, but with significant potential upside.

Obviously if anyone thinks it would be detrimental, please state your reasons so everyone can discuss.

12 Likes

One issue that’s worth raising is our peer-sharing. Right now there isn’t a consolidated list of peers to attach to, which has resulted in poor connections. Indeed, the only provided peer in the docs is:

perl -i -pe 's/persistent_peers = ""/persistent_peers = "971911193b09a17c347565d311a3cc4f6004156d\@peer.node.scrtlabs.com:26656"/' ~/.secretd/config/config.toml

How do I know this is an issue? This morning I coordinated with several other validators to share our peers and expand the inbound peers parameter.


The only change made was the peers. It’s worth noting there was extended downtime during the re-discovery phase, but all else being equal, there has been an exceptional improvement in block signing.

There’s a key action item here that @mohammedpatla had already sought to create previously: creating a stable, unified peer list for others to pull from.

10 Likes

Unfortunately block_gas_limit cannot be changed by a governance proposal, it requires a hard fork. We might be able to do it that way:

  1. Everyone stop their node a height X.
  2. Edit genesis.json with the new block gas limit (a bash one-liner).
  3. Restart the node.
4 Likes

Could we simulate a lowering of the maximum gas per block by increasing the gas amount to 2x in the front-end of shade?

Would cost a bit more but with fees at 0.0125 uSCRT it shouldn’t be too much of an issue.

Would this not be able to change it on the fly? Applications | Tendermint Core

Or is that a later version than what SN is using?

3 Likes

Yes, but this param is not changeable by governance.

1 Like

Ah ok. I was thrown off by the heading Consensus Parameters, and thought those would be changeable by governance. Thanks for the clarification!

2 Likes

I suspect that people would quickly share that the txs are using double the needed gas, and a lot of users will just manually halve it in the keeper window. I really can’t guess what percentage would just change it back to what gas it is using now, but we might not want to rely on something that the user can bypass

2 Likes

Yeah that’s a good point, hopefully most would keep it at the default, at 2M gas that’s 0.025 SCRT. If that could help reduce the load on the network and increase its usability I’d hope a lot of users think that’s worth it.

Would halting this airdrop for a few days then reconfiguring our peers as a group be prudent then? Seems like the nodes with “good” peers are at an advantage.

Sorry I was wrong! This param can be changed with a vote:

➜  ~ secretcli q params subspace baseapp BlockParams
{"subspace":"baseapp","key":"BlockParams","value":"{\"max_bytes\":\"10000000\",\"max_gas\":\"10000000\"}"}

EDIT: However it would take 7 days to pass.
EDIT2: Added to the docs.

4 Likes

I actually think that there is still a significant number of validators that will not accept 0.0125 gas-prices, which brings up an interesting point. If the UI uses a low enough gas-price that only a certain number of proposers will accept, that would actually space the txs out somewhat because only certain proposers will include those txs. If the UI can make so that every gas button uses that same small gas-price, I don’t think the user can change it any other way. So we might be able to use gas-price to make sure the network isn’t getting pounded block after block, but instead gets some breather blocks without shade txs

1 Like

I think 6 out of the top 10 validators accept 0.0125 uSCRT according to the last poll from a few weeks ago. Rough guess I think around half of the blocks accept 0.0125 uSCRT. But that’s something that we can change quite quickly if need be.

Interesting alternative indeed, we could even combine both things to also try to reduce the computations required in the ‘shade’ blocks.

Oh cool! I’ll have to ask Sandy to confirm, but I think on pulsar at least, he was able to change the unbonding time sooner than 7 days if the yes vote weight was high enough. That might only be possible on testnet. Or do you think there might be some quorum that if the vote weight is a high enough percentage of the total stake weight, it lets it pass sooner?

1 Like

Is it possible to turn off the airdrop claiming and rewrite the contracts?

It’s been coming up on 24 hours, and the network is still unusable for most.

Yeah big performance from restricting. Plus more nodes are confirming blocks by sharing peers.

1 Like