Tradeoffs Discussion: Upgrade Transparency

Hi Secret Community,

I am writing to start an open conversation about several tough tradeoffs in Secret Network’s security and privacy design.
The goal of these posts is to stimulate a community discussion, and to encourage actionable next steps for the community to take. Ideally, through these discussions the community could come up with ideas (and eventually - solutions) to certain difficult questions. Many of these don’t have a clear answer, so it’s important to gauge where community sentiment lies.

For this post I’ll just focus on the first issue:

Software upgrade transparency.

Right now, the master consensus seed is sealed using the “MRSIGNER” rule. There was a forum post about this, and more information can be found here:The Initialization Of Secret Network - Secret Network, but that discussion thread ended without a clear resolution. I have also written a blog post about this (here). The current version of Secret does use MRSIGNER, and so it does exhibit this backdoor risk. Essentially MRSIGNER means that any enclave signed by the developers can access a sealed file containing the consensus seed. This means the developers could, if they chose to, run an enclave that decrypts the consensus seed, by code signing a malicious enclave that prints the key in plaintext and running this malicious enclave on one of their own nodes. Especially in a world where we know 3rd parties will exert pressure on developers, this is unwanted. Because this is an entirely offline attack, there would be no trace of it, and there is no way for developers to prove this hasn’t happened.

Even though Secret Network applies a voting process to approve software upgrades, these process is not enforced by the enclaves themselves. An alternative approach, which is already implemented in other privacy focused networks such as Phala and Oasis, and is being planned by Obscuro, is called “upgrade transparency,” and basically means that old enclaves will validate the on-chain voting process (as well as the developer signing key) before transfering the consensus seed to new enclaves.

The tradeoff is that this mechanism complicates the enclave, and not only is not trivial to implement, but also increases the risk of an implementation error “bricking” the network into an unrecoverable state. It’s thus a tough tradeoff between the risk of privacy breach and the risk of total shutdown.

8 Likes

Potentially a middle ground with a multisig key for ‘break glass in case of emergency’ situations?

3 Likes

i.e. seal the key with an M of N multisig but also standard upgrade is just via transparency process

I’ll let other team members voice their thoughts too, but it’s fair to say we all want to see this changed. The challenges, as Andrew rightfully points out, relate to complexity and risk. I’m happy to see this discussion in the clear because ideally we can hear from as many stakeholders in the network as possible their take.

The biggest concern is of course bricking the network in the case of an implementation bug. Bricking the network is worse than in any non-private network, because the private state in such a case would become irretrievable, which would also likely lead to a massive loss of funds. What’s the chances of this happening? Is it 0.1%, 1%, or 10%? Hard to say.

Regardless, even if the implementation isn’t that complicated (and it likely is because Secret is already live), testing it probably to reduce the above risk is key. Also, everyone in the network needs to be okay with some level of uncertainty that the worse case scenario could happen, since even though we can greatly reduce the risk with proper testing, we can’t eliminate it.

The other aspect of this relates to prioritization. I think it’s clear this needs to be fixed at some point. The questions are when? and how?. Focusing on it now means SCRT Labs (unless someone else can pick up the slack) is gonna put fairly significant dev time on it, while delaying other features. Since this happens behind the scenes and will not improve any discernible metric for users or devs, our community should take this into consideration as well when deciding on the right priority.

Finally, and where we can probably be the most productive in this discussion is the how part. How do we actually pull such a switch while reducing the risk to the network and minimizing dev time? It may be that the answer is trying to come up with multiple milestones instead of a single one, for example:

  1. Continue to use MRSIGNER, but have a ceremony that is live streamed/with the presence of several key community figures, and in which a new MRSIGNER key is generated inside of an enclave, then split and shared in an m out of n manner as @Avret mentioned. The creating laptop can either delete the key or we can physically destroy it for fun and giggles. This step is fairly simple, with the exception of migrating the enclave to a new MRSIGNER. We do not yet know how easy that would be

  2. a. Like #1, but save some of the shares in a secret contract. Not that contract can be developed to enforce specific voted-on code updates. This is a lot more work, riskier, but is also cool in the sense that we’re utilizing our chain’s strengths.

b. Move to MRENCLAVE - which achieves something similar to (a), but the move might be more difficult than (a) + run into the same brick chance.

TL;DR community should decide the priority and be okay with taking the associated risks and delays to other important network stuff. If we do this, we should probably start with something like (1) first before moving to more complicated solutions over time. (1) also means we need to decide who holds the shares of the MRSIGNER key.

5 Likes

I don’t think we should focus on it right now if it implies compromising progress in other areas. There is already significant trust placed on scrt labs as core developers and so I don’t think the existence of MRSIGNER increases the level of trust required significantly in practice. (It is nevertheless a big deal.)

Placing it lower in the priority queue for whenever there are less things to do can become a slippery slope rather quickly, so it’s important we all keep in mind what Andrew said about the risk of 3rd party pressure on scrt labs. That said, and given the risk of bricking things, it’s probably wisest to take longer than shorter. We might learn a thing or two as other projects go on with their implementations too.

1 Like

Regarding the tradeoffs in Secret Network’s security and privacy design. The current use of MRSIGNER poses a backdoor risk and the proposed solution requires old enclaves to validate the on-chain voting process before transferring the consensus seed increases complexity and risks “bricking” the network into an unrecoverable state. Due to these risks, If I’m going to make a suggestion, first I would like to understand the features on the short to medium-term roadmap for Secret L1 development more thoroughly. It was mentioned by Guy, that other features could be delayed depending on how this is prioritized. Off the cuff, I tend to lean towards not risking bricking the network and opting for a slower approach with interim solutions. However, if the short to medium-term roadmap doesn’t include high-value features, it might make more sense to hunker down and get it done ASAP. Looking forward to hearing what is coming and where this goes.

4 Likes

Yep, seconding this. Better to know what’s coming before making that decision.

2 Likes

Accidentally post this here:

Before mainnet we planned to use MRENCLAVE for everything, but then we realized that upgrades cannot happen in a trivial manner that way, and that implementing a seed handover mechanics can brick the network if not perfectly implemented.

I’d say that using MRENCLAVE is the ultimate goal here, but I’m not sure how urgent this is at this stage.

1 Like

My views are currently in line with Ian. If SCRT Labs are able to inform us on what they currently plan to work on, maybe the community can signal what they would prefer and SCRT Labs can take that feedback on board.

I do feel like this MRENCLAVE element should be started sooner rather than later though.

1 Like

Second mumuse.

Community is already placing significant trust in Slabs and the decision to move away from MRSIGNER to mitigate backdoor risk is not worth a rushed effort create a more ideal solution that may result in an irrevocable “bricked” network. Especially if this time is taken away from efforts that are currently being used to improve developer experience on the network (e.g. contract upgradability) and create new use cases like PaaS that will unlock new doors for Secret and keep us in the picture.

I like the first milestone approach that Guy suggested to reduce risk of backdoor (MRSIGNER + split to multisig) as an interim approach. I’d also like to see more research into how we can implement a failsafe back-up to the “bricked” implementation scenario and how we can make testing environment as close to production environment as possible for when we do decide to take on such a complex and risky implementation.

Maybe I am naive in my thinking, but I don’t really understand how you could ever get into a completely irretrievably bricked network situation.

If a risky upgrade was done in a way that at a proposed height, an official quicksync was created and all validators were strongly reminded to make backups of their sgx_secrets and .node folders then if the upgrade did make every upgraded node’s chainstate unusable, you should still be able to wipe the node, revert to a pre-upgrade binary and use the quicksync to restart un-upgraded. In which case the worst-case scenario is a network-wide rewind back to the quicksync height. While any major rollback of chainstate is still not ideal, it is far better than an irretrievable failure.

As an additional precaution, you could probably also ask enough validators that it takes to have the VP/lite-client verification needed to mint new blocks to all take a node offline at the quicksync height and not touch that node regarding the later upgrade. Then even if for some reason restoring the official fallback quicksync fails, you should probably be able to bring those “frozen” nodes back up with migrated validator keys, peer them to each other and have them start created new blocks. Maybe even change their configs to use a different ptp port during that process to make sure only the “frozen” nodes are talking to each other during that critical rewind time. I admit the logistics would be complicated but again this is only a contingency for if a quicksync-based rollback fails, but it would all be work people would be willing to do if that is the only way to get the chain back up.

So unless I am wrong, I don’t think people should be weighing the decision with the thought that there is an irretrievable bricking possibility, but rather a complicated (and possibly arduous) rollback to an earlier chainstate.

5 Likes

I agree with oj. Might be good to hear what slabs has in the pipeline for upgrades and then community give slabs feedback on our priorities. Slabs can then decide what if feels will be best for network while taking into account community feedback.

With MRENCLAVE, the new enclave cannot decrypt files that were encrypted with the old enclave. In our instance the files store the consensus seed and by using MRSIGNER (dev key) different enclaves can decrypt shared files.

In order to move to MRENCLAVE encryption we also need to implement a handover mechanism for the old enclave to verify the new enclave and send the consensus seed to the new enclave. If this mechanism is not implemented perfectly we might not be able to verify the new enclave, send the consensus seed or even permanently delete the consensus seed (which we want the old enclave to do in case of a successful handover).

1 Like

This sounds extremely risky for each upgrade.
How about we just routinely rotate the concensus seed after let’s say 10.000 (or even less) blocks as a short term security increase? Slabs already implemented seed rotations on upgrades, shouldn’t be that hard to do that routinely ?

Wouldn’t that be a more practical option? In case the concensus seed get‘s leaked, just do an extra emergency rotation via a proposal.

The issue here is that SCRT Labs can always get the consensus seed using our developer key, which can be prevented by using MRENCLAVE for file sealing.

1 Like

Multisig is definitely a better solution than the current situation, but everything has its time.
I’m wondering if this could potentially be solved with some additional layer of encryption that would protect consensus seed.

It is definitely good that this discussion has been started, the best network is not the one where problems are not talked about, but the one where problems are solved.

Thanks for the additional info @assafmo . To help the community assess the risks can you please comment on how this scenario might impact or prevent the sort of rollback @baedrik suggested. Does it make it impossible, difficult to do, not affect that? Is there a difference between the first upgrade that switches away from MRSiGNER and subsequent upgrades once MRENCLAVE has already been adopted?

2 Likes

It seems like the development and even deployment could be done before giving up the dev intervention authority:
Phase 1: (still MRSIGNER in the enclave) Develop a handoff mechanism that can validate that the new enclave is approved by the on-chain process,
Phase 1.1., 1.2,. … keep applying this for several updates to build confidence it works. At this point upgrades don’t even use MRSIGNER
Phase 2: when ready, have the next upgrade switch from MRSIGNER to MRENCLAVE

4 Likes

These are great points but a rollback may also come with other risks that are rather unbearable, such as exchanges delisting and damaging trust in the chain. Though these risks may be mitigated depending on the answers to these questions.

Can we use the native rollback feature to revert by only a few blocks in case the network bricks?

What are the possible situations that could cause the network to brick, such as during a network upgrade, or could it technically happen at any time due to a bug or other unexpected behavior?

questions for @guy @assafmo / slabs

I think It’s dangerous for us to just make assumptions that this is actually truly only possible during an upgrade itself, or to accept this without a confirmed understanding of what a rollback might look like.

  1. The old enclave runs with MRENCLAVE signing.
  2. Governance has proposed switching to a new enclave, and the SHA256 of the new enclave is attached to the vote.
  3. If the proposal passes, we will send the result (SHA256) and the state inclusion proof into the old enclave.
  4. The old enclave will somehow verify the SHA256 of the new enclave.
  5. During the handover, the old enclave will encrypt the consensus seed and send it to the new enclave.
  6. Finally, the old enclave will delete its copy of the consensus seed.

If there are bugs in parts 3, 4, or 5, we might get stuck with the old enclave and won’t be able to upgrade. This is a critical issue that needs to be addressed. Similarly, if there is a mistake during step 5, and we mistakenly believe that it succeeded, both the old and new enclaves could lack the consensus seed, resulting in a “brick” situation.

To prevent vulnerabilities in the old enclave from being exploited to leak private data, step 6 is crucial.

1 Like