Testing SGX before mainnet upgrades

There’s a lively discussion in TG since the recent upgrade; nodes were not able to restart, or their motherboards were no longer supported.

Folks want more hands on, rigorous testing to avoid such issues in future. To this end John who runs the Consensus One validator has sourced supported motherboards to run a sort of hardware lab, where he can test firmware upgrades and ensure they’re still supported.

John works closely with Supermicro and others to get upgrades promptly, he just needs some early communication from SCRT Labs about which Intel attestations will be enforced in future.

This is a great first step, but for validators and developers to be sure it works for their environment, their contract code still works etc, they should ideally be able to perform the upgrade on testnet themselves.

While this is possible with pulsar-2 testnet currently, it’s not a match for mainnet.

So as @toml of SCRT Labs said,

I think what might be a good improvement is to have a testnet with the exact same security configurations as the mainnet. Pulsar and Mainnet have different configurations, pulsar being more permissive and attesting to Intel’s dev api.

The way I see it we have 3 options to achieve this;

  1. Make pulsar-2 restrictive, big downside is the barrier for new devs to get going on testnet, and the interruption the migration could cause devs. The migration would mostly affect the testnet validators of course, and we’d prepare the hardware in advance, but there’s potential for at least some downtime, affecting devs.

  2. Launch a temporary testnet for short periods prior to upgrade. Big downside here is that devs and validators cannot test their contracts on an ongoing basis, as they wish to. We would have to coordinate with a growing number of stakeholders, this is tough to do in a short window.

  3. Launch a new and persistent testnet alongside pulsar-2. Just as Ethereum had rinkeby and ropsten, where one could test PoW on Ropsten. Big downside here is the additional cost.

As I mentioned on the governance call, I’ll get this discussion going here to hopefully find a solution that best meets the community’s needs.

Please chime in with any other pros and cons or alternate options we should consider.

5 Likes

Some thoughts:

  • I think Pulsar is perfectly fine for dApp developers. Maybe I didn’t explain myself well - Pulsar and Mainnet are the same in terms of the contract runtime engine. They always have been in the past are currently. Having said that, I can imagine that this might change in the future (theoretically - no plans as of now) so we can revisit this point then.

  • IMO, we need to have consistent environments that let you:

    1. test HW with the current Mainnet configuration,
    2. test HW configuration with an upcoming TCB Recovery,
    3. test HW configuration with an upcoming Mainnet upgrade,
    4. test contracts with the current Mainnet version,
    5. and test contracts with an upcoming Mainnet upgrade.

    (1) can be tested straight in Mainnet. Pulsar can be (3) and we can spin up a new testnet for (2) when needed. Pulsar can also satisfy (4) and (5), but Localsecret should be good enough for (5) as well to not have to wait for a Pulsar upgrade.

  • The upgrade situation where nodes could not register anymore was a bug and that’s on us, and we fixed it on the subsequent emergency upgrade.
    That being said - are node runners doing enough to test their setups? Did people try to register in Pulsar or were looking for a check-hw binaries to test? I’m not aware of anyone requesting or asking us for any guidance. How can we improve awareness in this regard? What can SCRT Labs, Foundation and Agency do to improve this situation and be better prepared for upgrades in the future?
    We’re also exploring this internally, but I’m curious to hear some thoughts.

1 Like

I think that one of the missing pieces in pulsar was restarting nodes post-upgrade, but I assume that’s not being tested

Thanks @toml , great points.
Perhaps pulsar is overlooked as an option before upgrading. Just because it only warns, doesn’t mean we shouldn’t heed and clear the warnings ahead of upcoming mainnet upgrades. check-hw has been valuable too

We’ve restarted and this isn’t an issue on pulsar

Interesting, is that because the salt code wasn’t deployed on pulsar?

There was salt, but adding the intra-block call counter to it was a last minute enhancement.

ahhhh i see, that’s good to know. what security model does that actually shift?

Uses different salt for different contract calls even withing the same tx, so writing the same KV twice in different calls within the same contract & tx will output a different ciphertext. On pulsar-2 currently it will use the same salt for all storage writes in the same block, so writing the same KV in the same contract twice in a block would output the same ciphertext.

Concrete example:

Bob’s balance is 10 sSCRT.

In the same block, in that order:

  1. Bob sends 1 sSCRT to Alice
  2. Bob sends 1 sSCRT to Mallory
  3. Eve sends 1 sSCRT to Bob

Bob balance writes in that block:

  1. (Bob, 9)
  2. (Bob, 8)
  3. (Bob, 9)

On pulsar-2 writes 1 & 3 would output the same ciphertext. On secret-4 they would all be different.

So on pulsar-2 an attacker knows that after tx #3 Bob’s balance has returned to what it was after he sent tx #1 in that block. On mainnet the attacker cannot draw that conclusion.

1 Like

Ahhhh, i see, it was a CPA security problem.
Doesn’t using a cpa secure encryption scheme solve this?

It’s not a CPA issue

ciphertexts remaining the same if the same plaintext is encrypted twice is the classic CPA security issue?

The attacker doesn’t know Bob’s balance in that scenario. I guess CPA is a subset of that problem.

You can’t really have anything but deterministic encryption for the keys (in the key value store). How will you search?

And yes, det. encryption is not CPA-secure.

Even if you randomize after every write (we discussed some ideas around that), this would still leak some information. Reads and writes should be made oblivious, or partially so (e.g., using decoys, bundling many state entries together, or ORAM) to really solve the problem.

This is an issue of deterministic encryption over the values, not the keys, as I understood it?

so it’s not an ORAM solved problem

Who cares about pulsar? I wasn’t following the whole discussion but values are very salty and thus random now

Yep, the problem is solved now, I was just confused why it existed in the first place since aes siv with nonreused nonces is CPA secure