Serialize format of states

What kind of format will be used for state serialize In Enigma Chain?

In Discovery, states are serialized to JSON format before be encrypted.

Since encrypting and decrypting states is heavy computation and SGX has hard limit of memory use, I think the size of serialized data and efficiency of serialize/deserialize are important for choosing a state serialize format.
bincode or COBR are better than JSON in that aspect.

3 Likes

We haven’t decided yet, but you are right, this will improve network-wide gas usage. I believe CosmWasm is using JSON, shouldn’t be too hard to replace it.

1 Like

Thanks @vitocchi. This is a really good point, but I’d say this is lower priority until we see the need arising from the live network.

Opened an issue to track this - https://github.com/enigmampc/EnigmaBlockchain/issues/72

1 Like

Very interesting discussion in cosmwasm TG about this too, if I may copy/paste Max of NEAR Protocol’s answer to why they built their own format Borch (Binary Object Representation Serializer for Hashing)

Maksym Zavershynskyi, [05 Mar 2020 at 23:04:54]:

First let me explain why we are not using self-descriptive serializers like JSON or protobuf. Self-descriptive serializers generally take more space and cost more resources to operate with. Disregard the benchmarks from capnproto that say they take 0ms, because in capnproto serialization step is done when value is written into the field of the structure not when the structure is actually serialized.

  • We use JSON for RPC because that’s the most common format, but we are not using it for anything like storage format, contract API, or networking, because as I said it is very expensive. Specifically some of the contracts that use JSON as the input format spent up to 80% of gas just on serialization/deserialization. Also, JSON cannot work with blobs directly and needs them base64 serialized and it can only work with u53 integers.
  • We used to use profobuffs for networking and data storage until we found out that they do not have a well defined specification. Specifically, a single data structure can have multiple valid representations after serialization, which is very problematic for blockchain use cases. Usually with blockchains when you want to sign a data structure, you serialize it first, then compute the hash, and then sign. So if representation is not unique the hash is not unique too, which might cause some complex problems;
  • Capnproto unfortunately caused us some complex problems in the past. It sacrifises DevX in favor of performance.
    Regarding non self-descriptive serializers, we considered bincode and cbor.
  • Unfortunately, cbor is quite an old format and it’s spec is bulky, and we value simplicity quite a lot in blockchain.
  • Bincode was a great choice until we found out that it does not have a formal spec, which is worrisome because we don’t want some performance-optimization PR to accidentally change the format. Also, bincode does not produce unique binary representation for the same Rust object, which is again, problematic because of the hashing. Additionally bincode is dependent on serde, which is a lot of code and again, we want our contract to have as few code as possible.

So we came up with Borsh, which is basically bincode, with well-defined schema, enforced uniqueness of binary representation, and no serde.

Oh, and also Borsh protects from some attacks that other formats do not. Specifically, an exhaustion attack. E.g. someone can construct a binary object and pass it to your program which would try do deserialize it and crash because it allocates too much memory or somesuch.

5 Likes