Main Insights to R3’s Corda DLT Platform
Abstract. This paper is a summary of the technical white paper on Corda published by Mike Hearn. A global ledger is the product of a decentralized database of minimalistic trust between nodes. Corda, a decentralized global database, is a platform for decentralized app development.
The concept of decentralized database transpired to overcome the shortcomings of shared and distributed databases. The novel features provided by Corda as a decentralized database platform include provision of new transaction types, execution of transactions in parallel, direct peer to peer communication between nodes in the network, presence of multiple notaries employing various consensus algorithms, elimination of global broadcast and sharing of data on a need to know basis, bytecode-to-bytecode transpilation to model flows (explained “ in section 3) as blocking code, modeling of data using object graphs called states(see section 4), relational database for each node that can be queried using SQL, rich type system, bulk data import from other databases and scheduling of events by states.
Corda provides a platform for writing applications called ‚CorDaaps‘ that extends the global database with new capabilities as new data types, new inter-node protocol flows and smart contracts. Bitcoin or Ethereum, in Corda, states that it is not consumed as a result of double spending.
2. Corda Network Overview
The network components of Corda include nodes, a requesting service that provides TLS certificates, a network map service, notary services and oracle services that are the facts of the transaction.
Corda network is semi-private and the nodes joining the network are required to obtain an identity signed by the root authority. The nodes can choose any identity as long as it is unique. But, in order to know your customer name and to differentiate between different entities, Corda employs the standard PKIX infrastructure and connects public keys to the identities thus making the base identity of the node to an X. 500 name.
At startup, nodes register with the network map service which is composed of multiple cooperating nodes. This service is responsible for publishing the IP address of the application along with their identity certificates. Nodes may wish to advertise their nearest city for balancing and network visualization purposes.
The message routing component of the nodes run outside the firewall which enables a node to connect to anyone in the network. Messages are written to the disk and delivered only when the recipient’s node acknowledges the message. The network uses AMQP / 1.0 protocol along with TLS to secure messages and also the Apache Artemis message broker which provides many useful features. There is no assumption of constant connectivity between nodes as the connections can be built or torn down as required.
All messages are encoded and have a UUID in its AMQP header which functions as a deduplication key. Message sessions are meant to persist across node restarts and network outages. Nodes generate a receipt on successfully receiving a message. This receipt act as an important evidence which can be used later in the case of a dispute mediation process.
3. Flow Framework
Transaction data in Corda is not globally broadcast and is shared with parties on a need-to-know basis. Simple use cases such as a cash payment could involve multi-step negotiations. Corda considers all the information (that is not put into the ledger) generated during a transaction essential. All communications in corda are in the form of small multi-party sub-protocols called Flows. Business logic is expressed at a very high level with the help of APIs for sending and receiving object graphs, embedding of sub-flows and tracking of progress. Some of the components that help in achieving the above stated factors include:
- Just-in-time state machine compiler which rewrites the code on the fly which makes it appear like the code is executing in a single blocking thread
- Transparent checkpointing when a flow is waiting for an input or message from another party
- Identity to IP address mapping to route messages to the right IP addresses of a given identity
- Sub-flows to automate tasks like notarising a transaction or atomically swapping the ownership of two assets
- Progress tracker which indicates in which step the flows are in and sub-trackers for sub-flows
- Flow hospital where the node administrator may kill a faulty flow or provide it with a solution
Data visibility. ResolveTransactions flow is a flow that checks for the validity of the transactions by performing a breadth-first search on the transaction graph, downloading any missing transactions and validating them. A transaction is not considered valid if any of its transitive dependencies are invalid. Transactions are communicated in a flow and the flow uses the resolution flow which fetches the necessary dependencies automatically from the correct party. But this act of resolving the dependencies might require many round-trips which could make the process of sending payments time consuming. While Corda tries to provide better privacy when compared to the other systems, in the absence of additional privacy measures, it is still uncertain as to who may get to see the transaction data. These risks can be mitigated by:
- Small subgraph transactions where the peers who wish to keep their deal are synchronized
- Transaction Privacy techniques search for randomizing public keys in a way that is difficult to link transactions to an identity or ‚tear-offs‘
- State re-issuance where an issuer ensures atomicity even when the ledger is not forcing it
4. Data Model
Transaction Structure. States are atomic unit of information in Corda. They are never changed and are current. Transactions accept zero or more states as inputs and creates zero or more states as outputs. The main components of a transaction include: references to the input state (output of the previous transaction), output states (its notary, associated contracts and the data), attachments (zip file containing code, data, certificates, supporting documents), commands which specifies the intent of the transaction which has a list of public keys, set of required signatures, type of transaction (normal, or not changing), timestamp, textual summaries of the purpose of the transaction.
Figure 1: Example of a cash issue
In the diagram above, the transaction consists of no inputs and one output. The expanded cash state (top right) shows the details about the cash, the contract code and the legal prose. The command specifies the intention of the transaction. The verify () function makes sure that the public keys mentioned on the commands belong to those whose signatures would make this transaction valid.
Composite keys. Composite keys are trees whose leaves are regular public keys. The validity of a set of signatures can be determined by tracing the tree bottom-up and summing up the weights of the keys.
Timestamps. Timestamps specify the window during which the transaction is believed to have occurred. Timestamps are expressed as windows and in distributed systems there is no true time but, a number of desynchronised clocks. The purpose of the timestamp is to determine the position of the transaction on the timeline to the smart contract code for applying the contractual logic. Timestamp windows could be open ended to convey that the transaction occurred before or after a certain time. They are managed by the notary services and if a notary’s signature is present, then the transaction is believed to have occurred within that time. Notaries synchronise themselves to the atomic clocks at the US Naval Observatory and Corda uses the Java timeline.
Attachments and Smart Contracts. Attachments are zip files and could contain data that support the contract code. Attachments are available on-ledger and are reused by the parties over and over again. They provide data but are not responsible for authenticating it. Therefore, the states contain some constraint mechanisms which ensures that the contract checks the authenticity of the data either by checking the hash of the data or by checking the existence of a trusted third-party signature. Smart contracts are defined using JVM bytecode. A contract is simply a class with an interface that calls the verify() function which checks the validity of the transaction. Embedding JVM specification in Corda specification gives the developers the flexibility of choosing the language they want. Java standards offers a comprehensive type system for expressing business data such as calendar, decimal calculations and so on.
Dispute Resolution. Buggy and fraudulent transactions are discarded by mutually agreeing to discard the entire transaction subgraph. The disadvantage of having no global visibility is that there is no central entity who records who exactly has seen which transactions. Therefore, node activity logs need to be correlated to determine the entities who have to agree to discard the transaction. Corda uses a tool which crawls through the entire network to find all the parties affected by the transaction rollback. It signals the respective node administrators who accepts the ‘investigation request’ send by the tool through the node explorer for discarding the transaction. The ledger is updated by adding new transactions to extend the transaction chain which will correct the database to match the reality. In this case, the contract must be designed to allow some arbitrary changes. In case of the existence of an uncooperating party, all the involved parties other than the uncooperating party, agree to mark the relevant states as no longer consumed or spent.
Identity lookups. This is important to know who you are dealing with. States in Corda defines the fields for the party object. The party object encapsulates an identity and a public key. The identity keys and public keys of parties are linked via X.509 certificate. When a state is deserialised from a transaction and is in its raw form, the identity field of the party object is set to null and only the public key is visible. When a transaction is deserialiased, then the identity field is set. This allows a single representation to be used for both anonymised case (to validate dependencies of a transaction) and in identified case (when dealing directly with the other party).
Oracles. Oracle is a network service that signs transactions if the statements are true. Oracles make sure that everyone in the network is able to check the validity of the transaction and arrive at the same answer at any point of time. Achieving the same by enabling the smart contract to fetch the required information might not result in all nodes reaching the same conclusion about the validity of the transaction. The data the oracles doesn’t need to see while signing a transaction can be torn-off. This is achieved by structuring the transaction as a Merkle hash tree where the hash required to sign the transaction is at the root. The resulting signature of the counter-party will contain flag bits to indicate which parts of the tree (data) were present while signing. This kindof approach will ensure that there is only one place in the transaction where the signatures can be found which makes the system more efficient as signature checks are one of the slowest parts of a block-chain system.
Encumbrances. Sometimes the behaviours of multiple contracts can be composed. Consider an asset that is supposed to be frozen for a certain period of time. This will require two states- an asset state and a time lock state. Encumbrances allow a state to specify another state that must be present in a transaction that uses it. Therefore, in this case, when the asset state is used, the time lock state that triggers the execution of the time lock contract must also be used along with it.
Contract constraints. Contracts are usually zip files and use the JAR signing infrastructure to identify the source of the contract code. Requiring different combination of signatures or using trusted third parties to review the contract will prevent the risk of rogue developers publishing a bad contract. A contract constraint might make use of composite key (explained earlier in this paper) and also signing algorithms like SHA256withRSA and SHA256withECDSA.
Event scheduling. A state may request for flows to be started at given times. A state implements the SchedulableState interface and return a request from the nextScheduledActivity to request scheduled events. The state will be queried when it is committed to the vault and the scheduler will ensure that the relevant flow is started at the requested time.
5. Common Financial Constructs
Assets. Corda has a notion of OwnableState which consists of an owner field which is a composite key. From OwnableState, a FungibleAsset concept which represents assets of measurable quantity is derived in which units that are similar can be represented together in a single ledger state (for example, the value of a single 10€ note in one ledger and two notes of €5 in another are the same). In the case of ‘fiat’ currencies, the two ledger entries may not be entirely fungible because these entries merely represent the claim of an issuer. Corda supports these kinds of complexities. Amount<T> is a type that defines an integer quantity of some token. As this type does not support fractions, the quantity of national currency should be measured as pennies with the sub-penny amount requiring the use of some other type. A common token type is Issue<T> which encapsulates data such as- what the asset it, who issued it and a reference field which points to an account number, location of the asset and so on.
Obligations. The pay back in finance takes the form of IOU (which is an obligation of one institution to another) and netting is a process of combining a set of obligations and replacing it with an economically-equivalent set. The final output is the amount of money that actually needs to be transferred. Corda uses an ObligationContract which is a subclass of FungibleAsset to model a nettable obligation. As the netting calculations can get complicated, the Obligation Contract in Corda provides methods to calculate bi-lateral nettings and to verify both bi-lateral and multi-lateral nettings.
Market Infrastructure. The Corda Data model allows for integration of the ledger with existing markets and exchanges. A sell order for an asset on the ledger can have a partially signed transaction associated with it. A partial signature allows the signed data to be changed in controlled ways and contains metadata describing which parts (inputs and outputs) of the transaction are covered. This feature allows for the integration of ledger with the order books of markets and exchanges. Hence, the trading and settlement becomes atomic as the ownership of the assets on the ledger are synchronised with the view of market participants.
6. Notaries and consensus
Notaries. The notary services in Corda provide transaction ordering and timestamping services. Notaries are identified and signed with composite public keys. They are composed of multiple mutually distrusting parties who use standard consensus algorithms such as BFT and Raft depending on the scenario. Notaries accepts a transaction by returning a signature over or the transaction or a rejection error. Notarisation is triggered by invoking the Finality flow on the transaction after all the signatures are obtained. The transaction is stored into the database once the finality flow returns successfully.
No proof-of-work. A Corda network is email-like with stable nodes having long term identities with which they can prove ownership to others. The efficient network entry process of Corda helps in blocking the Sybil attacks. Thus, the concept of proof-of-work is discarded in Corda as it has many downfalls which include- high energy consumption, unequal concentration of mining power, identityless participants, no concept of finality and so on.
Algorithmic agility. Corda is not restricted to one specific approach when it comes to consensus algorithms. Raft algorithm may be used in situations where good performance and latency are important while malicious attacks or errors could be compromised and BFT-SMaRT could be used in situations where existing legal or trust relationships are less robust. Additional hardware security features like Intel SGX may be used to make the non-BFT algorithms more trusted. Advantages of being able to support multiple notaries are scalability (notaries running in parallel), jurisdistionally specific notaries that provides regulatory constraints on data propagation, availability and performance, users being able to pick between validating and non-validating notaries, issuers being able to ‘self-notarise’ transactions and ability of the networks which start separately to be merged together later (explained below).
Validating and non-validating notaries. Validating notaries are responsible for resolving and inspecting the transactions which they are asked to de-conflict. In a basic and risk-prone case with just a single notary and merely any privacy features, the notaries could gain complete visibility into the transaction. This can be avoided by selecting the level of privacy or security a user wants for each state in a transaction. Non-validating notaries assume transaction validity and their main purpose is to roll back a transaction which was committed accidentally or contains incorrect data, but, if the error is malicious, then the transaction states could become permanently corrupted.
Merging Networks. Companies are composed of different entities that are cross-licencing, striking deals with each other, doing internal trades and so on. In this case the challenging factors are the heterogeneous IT departments, dissimilar ways of software development and many more. These organizations could benefit a lot from synchronizing their entities before synchronizing with the wider world. While merging networks, it is required that the participating entities trust each other’s notaries and have never signed double spends. In case of complicated cases, a standalone notary could be run against a hardware security modules with audit logging enabled.
Guaranteed data distribution. Since Corda does not allow global broadcast, there is no mechanism for everyone to view the transactions. But, it is possible that an interested party might want to be notified about the change in ledger-state. This can be accomplished by enabling the notaries to allow all the interested parties to find out about a transaction. But, in this case the notaries will require to know who the parties are and this would result in undesirable leak of privacy. This can be prevented by sending the certificates linking the random keys (see identity lookups in section 4) instead of the identity of the party. Once the transaction is committed, the notaries look up the key identities and sent copies of the transactions to the key identities that are resolved successfully.
7. The vault
A vault is similar to a ‘wallet’ in a block chain based system and it stores data extracted from the ledger which is relevant to the node’s owner. It contains private keys required to sign transactions and it also creates transactions which send value to a party by combining the asset states and balancing the values by adding a change output. This process is called ‘coin selection’. To increase the number of transaction creation parallelism which will significantly reduce the time for signing a transaction, vault implements splitting and merging of transactions. Vault manages ‘soft locks’ which prevent multiple transactions accessing the same output simultaneously thus preventing double spends. If the entire cash balance of a user was to be contained in a single state, it would result in high operation overheads as there would only be a single transaction that is created at one time. Thus, splitting one big state into multiple state would result in many transactions being created in parallel and effectively improving the privacy.
Direct SQL access. The states in Corda are defined with a subset of JVM bytecode language which includes annotations (like JPA defined in JSR 338) that define how a class can be mapped to a relational table schema including the primary keys and the SQL types. When a flow submits a transaction to the vault, the vault finds the relevant states and the associated CorDaap plugin that has been installed into the node and feeds the states into an object relational mapper which generates SQL UPDATE and INSERT statements. JOINs can be performed using a dedicated metadata table to eliminate the consumed state from the dataset. This allows the data to be queried at particular points in time. Nodes include an embedded database engine and apart from the state data, it also stores the node state and all the communications. JPA annotations are independent of the database engine used and the users are free to connect to their database and issue SQL queries that employ any feature of their respective engine. The ORM might skip ledger-specific data that is irrelevant and may expand some fields (Amount for example) into multiple columns. The vault can also customize its mapping to support other annotations like XML/JSON.
Key Randomization. The standard privacy technique using private keys and public keys might results in many keys being created because of generating fresh keys for every new deal. This problem is resolved by hierarchical deterministic key derivation which derives a private key from a single pool of entropy (for example, a protected 128 bits of random data). Public keys may also be deterministically derived without accessing the underlying private key material. Hence, fresh public keys are provided to the counterparties without being able to sign in with those keys, enabling better security.
8. Domain Specific Languages
Clauses. Some of the expected features from production quality asset contracts are- issuance and exit transactions, movement transactions for ownership reassignment, fungibility management and support for upgrading contracts. But these features could have obscure edge cases (additional constraints) which might require the developers to implement some pieces of low level logic. As a solution to this Corda provides a library called clauses which implements reusable pieces of contract logic. A developer may also implement own clauses which can later be integrated with the library
Combinator libraries. Corda uses the platform explained by Payton-James in his paper ‘Composing contracts’ to model financial contracts. The financial contracts are modelled with a small library of Haskell combinators. This platform encompasses a universal contract that builds on language extension features of the programming language Kotlin. In a contract, a programmer can define arbitrary ‘actions’ along with constraints that specify when the action may be invoked. The contract may also consist of a ‘zero’ token which indicates the termination of a deal. It also defines deal specific data such as what is allowed and when and how much is allowed.
Formally verifiable languages. There could be problems in convincing participants in a large network while upgrading a contract. This is resolved by using formally verifiable language (used for expressing smart contracts) which guarantees the correctness of the implementations. An example for this is the Whiley language by Dr. David Pearce which checks the program-integrated proofs at compile time.
Projectional editing. Projectional editing can be used for smart contracts which edits the source code using a structure-aware edition instead of editing it textually. Such a dedicated environment for construction of smart contract logic could be highly regarded by the users.
9. Secure signing devices
Background. Financial systems are using client-side hardware like CAP (Chip Authentication Program) to hold private keys. In this way, the signing key is held in a robust and cheap smart card so the device can be replaced without replacing the key. The benefits of keeping the signing key on a personally controlled device are- no single point of failure when the node is hacked, clarity regarding who signed off an action as the signatures prove which device was used for the signing and more security can be added by integrating fingerprint readers and other biometric authentication.
Confusion Attacks. A challenge concerning the integration of smart signing devices into a ledger is processing of transactions. In Bitcoin, this is resolved by converting a transaction into a human readable message. But this could result in confusion attacks as a result of an attacker swapping the address or the IBAN to which the payment needs to be made to. This attack is addressed by the BIP 70 protocol which allows a certificate chain to be presented that links a target key with a stable, human meaningful and verified identity. But still, there are additional challenges that needs to be addressed such as the existence of many different types of transactions that could include new ‘types’ created after the device was manufactured. This would cause constant problems requiring frequent firmware updates.
Transaction summaries. To solve the above problem, a summaries field (which is a list of strings) to added to the transaction format. This field is managed by the smart contract. It generates an English message which says what the transaction is doing and check if that is present in the transaction. The legitimacy of the message is verified by extracting the messages and printing it on screen with YES/NO buttons to decide whether the device wants to sign the message or not. A challenge to the above design occurs when large amounts of data are sent to the device. This is resolved by using a tear-off mechanism which presents only summaries and the Merkel branch connecting them to the root. All this is done with the assumption that the contracts themselves are not malicious.
Identity substitution. The contracts will only be able to access the public keys and not the identity information when a transaction chain is anonymised. This is resolved by providing the device with the X.509 certificate chains which links the public keys to long tem identity certificates along with the transaction. The device can verify these certificate chain by constructing a mapping of index to human readable name.
Multi-lingual support. Conventionally, the contract generates the human readable version of the contract in English. The transaction format could be defined to support messages in different languages but, care must be taken to ensure correct translation of messages.
10. Client RCP and reactive collections
Integration with the existing ecosystem is a challenge faced when deploying a distributed ledger. Programs that interact with the nodes are loosely coupled, authenticated, resistant to node outages and restarts and with speed differences. Corda uses RPC mechanism to meet these requirements. RPCs usually return a snapshot of some data structure and the client library has the functionality to reconstruct the snapshot along with the differences in a way that it can be directly bound to the Java FX user interface. This enables the rendering of data structures into the global ledger in a straightforward manner. Since RPC transport takes place via the node’s message queue broker, the framework automatically recovers from restarts of nodes, IP address changes and other interruptions. Programs that need to exist for long and survive restarts, upgrades and moves can request for the observations (data) to be sent to a persistent queue. Additional RPC processors can be attached for providing more RPC processing to the broker which automatically balances the load between them.
The main reasons for Corda to employ RCP for communicating with the node instead of the typical REST+JSON approach are- preference for binary protocols over textual protocols that can lead to security issues, provision of infrastructure for building reliable apps by the message queue brokers which is not provided by plain HTTP and the conventional ways of streamlining results using REST which are not ideal for the task.
11. Data distribution groups
Distribution of transactions take place via app-provided flows which specify which peers the transactions should be sent. Sometimes, the set of parties that should receive the data isn’t known ahead of time and for this reason a data distribution group is created by generating a keypair and a self-signed certificate for it. The nodes are invited into this group using an invitation flow. Membership can be read only (node added using public key) or read/write (node added using private key). Future improvement would be to provide each member in the group with a private key which would allow tracing who added transactions into the group. When a node is invited to join the group, it automatically accepts the membership if it ‘trusts’ the inviting node. If the node is already a member, then the invitation is rejected. The accepting node also records which node invited it which results in a two-way recorded relationship. Finally, all the transactions in the group are send to the new node.
Every transaction undergoes a relevancy test when it is sent to the vault. The PropagateTransactionToGroup flow checks if the transaction is already known and also if it is signed by the group in questions. If yes, it is end to the vault. In this way, a transaction which is send to the group propagates through the membership tree ensuring that all members see it. The advantages of such a structure are- simplicity as the access is handled using existing flows and tools, privacy as it is possible to join a group without the other members being aware of it, scalability (a group of four parties only imposes costs on those four), performance (groups can be created as fast as you can generate key value pairs and invite other nodes) and responsibility (there is always a node for every member in the group whose is responsible for sending it new data). On the other hand, the disadvantage is the brittleness caused with a node going offline which that results in the data getting split and backing up in the outbound queues of the parents and children of the offline node till it is online again.
New features to strengthen the group such as membership broadcasts can be added where a node with write access may choose to sign a membership announcement and propagate it through the tree. It will be defined in the certificate of the group if it prefers privacy or availability. The network map defines the event horizon which is the span of time elapsed before an offline node is considered to be permanently gone and any node is invited to remove it. When a node is online, messages are propagated to everyone in the network. There can also be cases where a part of the group has become split and no one is aware of it. Such a situation is unlikely but possible. It should also be noted that it is not possible to remove members after they have been added. A remove announcement can be propagated, but nothing stops the nodes from ignoring it.
12. Deterministic JVM
It is vital that all the nodes processing a transaction agree to its validity. Since transaction types are defined using JVM bytecode, the execution of the JVM bytecode must be fully deterministic. Non-determinism could be cause by- external inputs, random number generators, conflicting decisions to terminate a program, multi-threading, Object.hashCode(), different API implementations between nodes and so on.
A new type of JVM sandbox is constructed to ensure the efficiency of the contract verify functions in the face of infinite loops. It employs a bytecode static analysis and rewriting pass along with a JVM patch that controls the behaviour of hash code generation. The tasks performed by the bytecode analysis and rewrites are- deterministic termination of expensive bytecodes that consumes large amount of time and memory, prevents exception handlers from catching Throwable, Error or ThreadDeath, adjust constant pool references to relink the cod to a ‘shadow JDK’, sets strictfp flag on methods which require JVM to do floating point arithmetic, mostly forbids invokedynamic bytecode as a results of security problems, forbids native methods and forbids finalizers.
The cost instrumentation strategy counts the bytecodes that are expensive to execute and is designed in such a way that if the cost of verifying a transaction becomes very large, all nodes agree on when to quit. A more sophisticated design would be to calculate bytecode costs ahead of time as much as possible. Another complexity arises from the need to constrain memory usage. In order to simplify the implementation, the sandbox imposes a quota on bytes allocated rather than on bytes retained. This is harsh on the smart contracts that stir large quantities of garbage. Therefore, a better strategy that integrates with the garbage collector is required inorder to set quotas to a usefully generic level. Ultimately, it is not just the smart contract which is instrumented but all the code that it is transitively dependant on.
The Corda architecture allows for great levels of scalability. Some of which are:
- Partial visibility: Nodes are only exposed to those transactions they’re involved in or are dependencies of the transactions they are involved in. Therefore, nodes never see most of the transaction graph and hence, they don’t have to process it
- Distributed node: A message queue broker is present at the center of a Corda node. Node are basically microservices than can be run independently on separate machines. Although a single flow cannot be parallelised, a node under heavy load have many flows running in parallel. As flows access the network through the broker and local state through a database connection, additional flow capacity can be added by integrating further flow workers
- Signatures outside the transaction: The Corda transaction identifiers are present at the root of the merkel tree (‘is calculated over its contents excluding the signatures). The disadvantage of this is that a signed and a partially signed transaction cannot be distinguished by their identifier, but it means that the signatures can be easily verified in parallel. The Corda smart contracts are isolated from the underlying crptography and cannot request signature checks themselves. They are run after signature verification takes place and do not execute if the signatures are absent. So, even when the smart contract for a transaction is not parallelisable, it ensures that the signatures for a single transaction can be checked concurrently
- Multiple notaries: We can bring in additional notary clusters to increase scalability but this is effective only if the transaction graph has underlying exploitable structure (geographical biases for example)
- Asset reissuance: For a scenario where the issuer of an asset is both trustworthy and online, the asset state could be exited and reissued back to the ledger with a new reference field. This truncates the dependency graph of that assets resulting in better privacy and scalability, at the risk of losing atomicity
- Non-validating notaries: The need for investing the validity of a transaction before it is notarised is the main overhead of a non-BFT notary. A non-validating notary can be used in a situation where raw throughput is more important than ledger integrity. The primary bottlenecks of Corda are determined to be the notary clusters made up of mutually distrusting nodes and flow checkpointing. Due to partial visibility, nodes can only check transaction graphs ‘just in time’ and being able to check 1000 transactions/second is not good enough when the transaction graph consists of many more transactions and the user is expecting the transactions to show up instantly. Future revisions will be made to resolve this problem by pre-pushing transactions to a node when the developer is sure that the nodes will request for the data anyway
Corda employs various privacy techniques:
- Partial data visibility: No global broadcast of transactions
- Transaction tear-offs: Transactions are structured are Merkle trees (see section 4) which enables ‘tearing-off’ transaction components which the user wants to hide while signing the transaction
- Key randomisation: The vault generates random keys that are unlinkable to an identity without the respective lining certificate
- Graph pruning: Large transaction graphs that involve liquid assets can be ‘pruned’ by requesting the asset issuer to re-issue the asset onto the ledger with new reference field. This will therefore prevent the nodes from exploring the original dependency graph during verification
- Secure hardware: Secure hardware platforms allow computation to be performed in an undebuggable tamper-proof execution environment, for the software running inside that environment to derive encryption keys accessible only to that instance, and for the software to remotely attest to a third party over the internet that it is indeed running in the secure state. This will make it possible for the transactional dependencies to be transmitted via an encrypted enclave key which will enable them to verify the dependencies using a software they have themselves audited.
- Mix network: In a mix network a message is repeatedly encrypted in an onion-like fashion using keys owned by a small set of randomly selected nodes. Each layer in the onion contains the address of the next ‘hop’. Once the message is delivered to the first hop, it decrypts it to reveal the next encrypted layer and forwards it onwards. The return path operates in a similar fashion. Adding a mix network to the Corda protocol would allow users to learn about transactions they are not directly related with and also opt-in to a privacy upgrade, at the cost of higher latencies and more exposure to failed network nodes.
- Zero knowledge proofs: The holy grail of privacy in decentralised database systems is the use of zero knowledge proofs to convince a peer that a transaction is valid, without revealing the contents of the transaction to them. Although these techniques are not yet practical for execution of general purpose smart contracts, in future Corda plans to migrate to the use of zero knowledge succinct non-interactive arguments of knowledge (‘zkSNARKs’)
This paper was a summary on Corda which is a decentralised database design for financial sector. It includes smart contracts running on JVM providing access control and schema definition which enables the distribution of data among mutually distrusting nodes. Its continuation-based persistence framework manages the flow of data across the network. Its efficient identity management system enforces security by enabling the parties to know whom they are dealing with. It provides notary services using various algorithms to achieve consensus. It enforces security by using secure signing devices, secure enclaves, composite keys and employs binary protocols for authorisation policies. The data in the ledger can be accessed by the user using simple SQL queries and can add desired transactions by employing familiar programing languages. Ultimately, it integrates global ledger with financial infrastructure like high performance markets and netting services.
M. Hearn. Corda: A distributed ledger. November 29, 2016. Version 0.5.