2D Reed Solomon Encoded Merkle Tree Construction

Maham Imtiaz   |   

Nov 28, 2022

Nov 28, 2022

In our previous article, we briefly discussed the formation of the Merkle Tree using the Reed Solomon Code. In this article, we will look into the depth of how a 2k by 2k matrix is constructed in light clients.

Reed Solomon Erasure Coding

In order to form a Merkle tree using the reed Solomon coding, we first decide the size of our Galois Field, which is 16-bit. Size plays an important role in coding. It helps us to form the generator function, which then results in the parity-added data. For a detailed description of Reed Solomon's code, you can read our article here.

2D Reed Solomon Encoded Merkle Tree

The roots of Namespace Merkle Tree is divided into four quadrant where Q0 has the original data, Q1 has the parity data of rows of Q0, and Q2 has the parity data for a column of Q0 so the question that arises in our mind is how are we forming it in 2k by 2k dimension there is no explanation for it yet. So the answer is a simple trick of maths. Let's discuss it below.

alt=""
Graph of 2D coding

In the Q0 quadrant, we have our data divided into shares associated with namespace ID, whose data will be erasure-coded and committed to NameSpace Merkel Tree.  A share raw data is interpreted in different ways depending upon your namespace ID, which has the following raw data.

  • The first part of the share is the namespace ID
  • Next, we have a reserved share byte that is represented as *,, which represents the canonically serialized first request that starts in the share or 0 if there is none as the 1-byte big-endian unsigned integer.
  • Remaining, we have the requested data in which we have done some zero padding i.e. adding zero at the end of the data to achieve the desired length of the byte to complete the size of 256 bytes which can be shown below.
alt=""

Erasure Coding of Original Data

First of all, the data in Q0 is carefully divided into equal shares and is made sure that all shares lie in the set of 16-bit Galois fields (which is the length of the field we decide to perform Reed Solomon coding) which gives us the matrix as follows.

alt=""

Where Txs is the transaction data, and msg1 and msg2 stand for the messages which is the interaction between the block producer and transaction sender which is signed by the transaction sender. And then broken into parts so that they can be laid out in the matrix

For the above matrix, we decide on a generator polynomial that has alpha as the primitive roots and can also be written in the form of a matrix. The generator matrix helps us to add parity to the original data and also it helps to recover the loss from the parity.  The generator matrix looks like below.

alt=""

Which is obtained by obtaining a primitive polynomial. Now let us suppose that we have our following matrix as the original data in the GF(2^8) modulo 0x11D

alt=""

Now we create a matrix that can encode any matrix of size 4xn. Here we have n = 8 therefore we create a matrix of size 8x4, which can form a polynomial of size n and has n-1 roots hence our parity encoded matrix is as follows:

alt=""

Since we now know that the Vandermonde matrix is an alternative approach to the Lagrange interpolation method, it can interpolate the polynomials by expressing them in matrix form.

Also, we know that polynomials are unique, and therefore it is easy to generate polynomials using the determinant of the matrix or second approach is to form a generator matrix by taking the upper four rows since we have four rows in the original data therefore we are considering only four rows, which helps us yield the following matrix.

alt=""

Since we have obtained our extended matrix, we will now multiply our original data, and we will have the following result

alt=""

Recovering the Lost Data

Now let us assume that we have lost our data, let say row 0 and 4 from the extended matrix. Hence our encoding matrix (left) and original matrix (right)  becomes as follows

alt=""
alt=""

Since we know that A*A-1=I. Therefore we invert the encoded submatrix and multiply the inverse with encoded matrix to check that our answer is correct so we get

alt=""

As we have obtained an identity matrix therefore we are safe to say that we have a correct inverse matrix. Now in order to obtain our lost data we will multiply our inverse matrix with the extended data which will result in the restoration of the original matrix.

alt=""

In actual process what we do is that we only multiply the extended data with the sub inverse row of the generator matrix as shown below

alt=""

Once we have constructed the data matrix row we will also generate parity in the same way

alt=""

Now it comes to mind how we are extending our data horizontally so it's not that of moon maths it's basically we take another Vandermonde matrix and apply the transpose of the matrix this how we get our generator function. Once we have our generator matrix we then continue to the same work that is explained above.

The advantage of the Reed Solomon code is that the probability of an error remaining in the decoded data is much lower than the probability of an error if Reed Solomon is not used. But the approach of encoding data in matrix form is not efficient since the Vandermonde matrix fails on the data of large size therefore in order to have data more secure I believe Lagrange interpolation would be a better approach to the problem.

Written by

Researcher. Providing solutions to the ZK problem through the language of maths.

Similar Articles

January 9, 2021
Author: Zainab Hasan
January 28, 2021
Author: Zainab Hasan
March 15, 2021
Author: Zainab Hasan
1 2 3 16

Get notified on our latest Web3 researches and catch Xord at a glance.

    By checking this box , I agree to receive email communication from Xord.