Shockwave Delta Upgrade & Testnet - Due Dilligence & Risks

Greetings community,

I want to broach a sensitive topic. Shockwave Delta has been on Pulsar-2 since 9/11/2022 and plans to be released on 9/21/2022 assuming the on-chain proposal passes. Additionally, during the set-up of Shockwave testnet on Pulsar 2, the L1 went down.

Currently, developers on Secret Network have ~10 days to discover any breaking changes to smart contracts and/or front-ends on the pulsar-2 testnet.

As an example, Shade Protocol and Stashh both encountered a bug tied to CosmWasm 1.0 that changed how time is treated (and how backwards compatibility worked for 0.1 contracts). If this bug would not have been spotted on testnet, Stashh auctions & ShadeBonds would have been fundamentally broken post Shockwave Delta upgrade.

What other potential bugs exist on the contract or secretjs level?

Additionally, due to changes in SecretJS, various front-end APIs may break entirely with the new changes (some due to the reliance on GripTape). This may be resolvable during the next 10 days, but it does represent a potential risk.

Observations: SLABs has done incredible work getting Shockwave Delta onto testnet - this upgrade is the beginning of empowering Privacy-as-a-Service and will bring SNIP-20s to the entire Cosmos. This upgrade is MASSIVE and extremely impressive.
However, there are potential risks of going straight to mainnet. If that is the consensus of the community (and the vote is already live) then that is what will happen. But in good conscious & in good faith I wanted to start this thread in order to bring visiblity to some of the concerns surrounding testing.

Here is a list of recommendations from easiest to hardest:

  • For all upgrades moving forward, there should be both an alpha and beta testnet so that at any given moment there is always a testnet that reflects the current state of mainnet
  • Adequate time is given for apps to transition their front-ends and for developers to stress test contracts and configs of the update (at least an entire month)
  • Mainnet be delayed

I would advocate that we delay Shockwave Delta in order for there to be more thorough testing from the various applications to ensure that there is a smooth transition from apps that are based on CosmWasm 0.1 to 1.0.

Counter points to my recommendations:

  • As an L1 and as a set of dApps we don’t have the luxury of waiting. We simply exist in too competitive of a space, and the dApps should just deal with it.

Long live privacy, looking forward to your feedback and the discussion.

-Carter Woetzel

15 Likes

I think that this is good but not enough–labs for future upgrades needs a test suite that covers cosmwasm logic and basic secretJS queries to confirm exactly what will break and where, IMO

8 Likes

For instance, the fact that GitHub - scrtlabs/secret.js at cosmwasm-v1 is failing to even run the written tests and hasn’t passed for 10 days is…concerning, as is the fact that…they failed…at setup???

3 Likes

I also share concerns about not enough time for testing. 4 days is the amount of time between the most recent changes to the binary going live on testnet, and the upgrade proposal going on chain… and that was for beta.3 which does not even include the latest refactors.

It seems theres a great deal of pressure on/from slabs to push this upgrade and go live by next week, and I worry this may be clouding their best judgement.

I too am in favor of delaying the upgrade to allow for more thorough testing and review of the source code. SN would not recover from another exploit, I would hope everyone agrees it is better to be safe than sorry.

3 Likes

I’d be in favor of delaying the upgrade - what say scrt labs ?

1 Like

We are working on upgrading Griptape. The timeline is tight for sure.
I share the concern here and think that caution is important here. People will not remember that the upgrade happened a week or two later than promised. They will remember something going wrong.

6 Likes

@assafmo stated on Telegram that the next available time that SCRT Labs would be available to coordinate an upgrade would be October 26th, so about 5 weeks later

2 Likes

We do have an extensive test suite, which includes regression tests for v0.10 contract on v1.3 and v1.4 that runs on HW & SW for every commit. This includes cross-running secret.js & IBC contracts. Granted, it missed the env.block.time bug, and that’s why we pushed to upgrade pulsar-2, to get a wider scale of testing and help from the community.

secret.js is a simple version bump from secretjs@0.17.5 to secretjs@0.17.7 or secretjs@beta to secretjs@alpha. API is the same, UIs don’t need to update any code.

Then you have to adjust the gas limits for contracts execution. This shouldn’t take more than 20 minutes, and IMO a very sensible behavior to break as we’re trying to mitigate future DoS events like we had in the past few weeks. Also, after every upgrade it is known that it take a few days to return to full capacity, as infra operators upgrade all of their nodes.

This means double the resources, which seems over the top to me, since you can also run tests against LocalSecret which is fully operational for weeks now.

I would just comment that the Israeli holidays are from September 25 through October 22, with hardly any work days in between, and most of our team taking vacation days throughout October. This means that the next okay date for an upgrade is November 1, without any more testing from SCRT Labs.

Thanks, fixed. We cross run these tests on the main repo for every commit and it’s passing there since before the pulsar-2 upgrade.

These are small refactors that pass all integration and regression tests, and we specifically made sure the affected line of code are covered in the tests.

Looks like you’re already good to go.

11 Likes

Agree - the schedule is tight.

I’ve been saying that for a while.

Of course everybody is free to use LocalSecret as @assafmo said, but for sure will be more professional and less error prone if there are multiple official networks to test with, especially on such large upgrades.

Does this mean nobody from SCRT Labs will be available during this period if something breaks post-upgrade?

1 Like

Thanks @assafmo for giving more context.

Building on that, I’ll add that we’ve had available localsecret 1.4.0 versions publicly available for about a month now, which contain the bulk of the changes. Pulsar-2 has been running 1.4.0 for about 8 days now (since the 8th, with the fix for the env bug deployed on the 11th), which gives us a total of over a month of local dev environments and 2 weeks of full public beta before going live.

As we get closer to our target date, any changes made are only included after doing risk assessment, with most of the latest changes being usually small changes that are covered by tests, which add minimal risk.

I’m disappointed that the latest secretjs changes are being pointed to as a major culprit. I’ll give more context on this, just to give you an example of the considerations.
The change necessary was to the format of our protobuf messages for the compute (wasm) module. These changes are covered by a whole suite of tests, so we were able to verify easily that everything works as intended. The effect of this was a necessary change to the way secretjs parses responses from contracts. These changes are internal to secretjs, and do not effect developers nor users at all. Without this fix, we could not support for grpc-gateway messages, which is the drop-in replacement for the current LCD which is deprecated in the cosmos-sdk 0.46.x.
So basically, if we didn’t change it we wouldn’t be able to smoothly migrate LCD endpoints, which is critical for dapps and exchanges and will bite us in the ass if we want to upgrade the cosmos-sdk in the next upgrade. But wait, what happens if after all that the fix to secretjs doesn’t work properly for some edge-case? Well, it’s a client-side library - we can always patch it post-upgrade with the impact to dapp developers being a bump to the version deployed.

As for delays and timelines, beyond the difficult timing for SCRT Labs with holidays, there are also a lot of moving pieces in the grander scheme of things. Marketing plans, network and ecosystem coordination, and other puzzle pieces that are built months in advance around specific dates mean that any delays have ripple effects. Furthermore, this specific upgrade has new features that we want to get in the hands of developers, specifically around IBC and Secret Contracts. These will evolve as everyone gets more hands-on and we get feedback from developers. Delaying these features means delaying feedback, which slows down the iteration process and the time to get the network where we all want it to be.

I want to make it clear that Secret Network is and should be treated still as a start-up. By that, I mean that we’re still at a point where we should value innovation and speed at the cost of accepting more risk. There’s a reason that Secret Network is still the only fully open, decentralized, general-purpose privacy chain - if we waited for everything to be perfect we’d still be on testnet. If we were top 10 on CMC maybe we could rest on our laurels so-to-speak, but right now I think we should value setting aggressive timelines and accepting risk (especially as the risk we are talking about is to network availability, not user funds).

Lastly, regarding the testnet - after the previous upgrade, the feedback we got across the board from dapp developers was that upgrading the testnet to allow testing the upgrade hindered their development since it wasn’t representing current testnet. For that reason we made the decision to focus on releases LocalSecret environments as early as possible to allow for developers to test the upgrade, and delay upgrading pulsar to late as possible to hinder current development as little as possible.

The best of both worlds is allowing for 2 testnets, but the cost of that is the maintenance overhead that keeping validators, apis, nodes, explorers running smoothly and docs updated. We can discuss that going forwards; However, given that fact that we only have the one testnet, I think it isn’t fair to claim that on the one hand dapp developers didn’t get enough time to test their apps on Pulsar, but on the other to highlight the part where there should always be a testnet that reflects the current state of mainnet - our compromise was providing an upgraded LocalSecret over a month before the upgrade was due, with iterative updates as we go.

12 Likes

The current timelines take into account the availability requirements that the entire process requires, including monitoring and support before, during and after the upgrade. Even outside of that, in any case we make sure that we’re available in case of any critical failures for whatever reason.

4 Likes

@Cashmaney and @assafmo First of all I just want to say a huge thanks to you and your team for all the hard work to get us this close to the upgrade, we couldn’t be more excited for the upgrade and know the Shade team was one of the dapps pushing hardest for getting it here as quickly as we could when we discussed it early this year.

I want to push back a bit on the idea that LocalSecret is adequate for testing. As our secret network apps become more interconnected, it becomes extremely challenging to create a local chain that is anywhere close to the reality of mainnet. With testing for ShadeBonds, for example, we would have needed all the sienna contracts deployed to have access to their LP tokens, liquidity, prices, etc. as well as Band protocol contracts to have an accurate reflection of a price oracle. Of course some of these things can be mocked, but the more mocking you do, the higher the risk is that some mistakes will be made.

I would love to see an alpha and beta testnet be created moving forward, even if this is only a temporary thing that is spun up right around the time of major network upgrades.

2 Likes

It’s easy for me to say just go forward with the upgrade, but I think people may have panicked a bit with the issues they’ve ran into and instantly assumed things are worse than they may be.

Personally I think the timing of the upgrade is crucial if the next window is November 1st, so I think it’s important to be specific about the amount of work teams are facing (and using all the time left) before pushing back on the date. People should accept risk as long as products arent failing catastrophically.

I think this is also key. If the bulk has been available for over a month then, unless there’s something incredibly wrong, these 2 weeks should be enough for people.

I don’t disagree that the complexity of a real network is hard to mock - even with a testnet we’ve already seen that sometimes your apps will behave differently on mainnet and on Pulsar.

On the flip side, this complexity that you described adds to the difficulty of maintaining those two separate testnets. It’s not only about cloning the network, but also about keeping them in sync (making sure everyone deploys all their apps and contracts to both) over time and maintaining all the services I mentioned above in working order. I’m not saying it can’t be done, but I think that doing this well and supporting it long term will require a clear owner, resources and commitment.

Maybe it’s possible to throw together a solution that clones Pulsar on demand somehow and sets up services automagically so that we don’t have to constantly maintain sync… need to give this a bit more thought

@assafmo @Cashmaney thanks for pushing a fix to the sjs test suite/giving us that nice green checkmark, definitely makes me a lot less nervous :slight_smile: (that one particular commit from 10 days ago that said ‘tests are passing’ with the red x next to it wasn’t an ideal look lol ;p)

1 Like

It’s also good to know that there’s cross-repo GHA stuff going on, will look there in future

1 Like

Don’t have time to read the thread but my bottom line is I need Shade, Sienna, and Stashh to support the upgrade or would prefer we delay.

These are the chains biggest ‘clients’, so I want them to be happy about the work the chain is doing.

1 Like

Yeah in the main repo it’s under integration-tests.

1 Like

Thank you @Cashmaney @assafmo @AustinW @Avret for commenting on this thread - it was a very informative conversation.

Shade Protocol devs are officially in support of the mainnet upgrade. After feedback from @Cashmaney @assafmo we are confident in the protocol team’s ability to get this to the finish line.

While we believe there are risks, we trust that the dApps can all collectively get this done in time. Shade is officially pivoting the majority of our developers from other tasks to purely focus on testing pulsar-2 and preparing for Shockwave Delta.

At a minimum, for future upgrades I hope there will be tighter coordination surrounding testnets (alpha/beta) and as much of a run-up as possible. Looking forward to a successful launch :slight_smile:

-Carter Woetzel

8 Likes

With the input given and our own pulsar testing, Stashh also supports the mainnet upgrade at the scheduled time

7 Likes