In our previous article, we briefly discussed the formation of the Merkle Tree using the Reed Solomon Code. In this article, we will look into the depth of how a 2k by 2k matrix is constructed in light clients.
In order to form a Merkle tree using the reed Solomon coding, we first decide the size of our Galois Field, which is 16-bit. Size plays an important role in coding. It helps us to form the generator function, which then results in the parity-added data. For a detailed description of Reed Solomon's code, you can read our article here.
The roots of Namespace Merkle Tree is divided into four quadrant where Q0 has the original data, Q1 has the parity data of rows of Q0, and Q2 has the parity data for a column of Q0 so the question that arises in our mind is how are we forming it in 2k by 2k dimension there is no explanation for it yet. So the answer is a simple trick of maths. Let's discuss it below.
In the Q0 quadrant, we have our data divided into shares associated with namespace ID, whose data will be erasure-coded and committed to NameSpace Merkel Tree. A share raw data is interpreted in different ways depending upon your namespace ID, which has the following raw data.
First of all, the data in Q0 is carefully divided into equal shares and is made sure that all shares lie in the set of 16-bit Galois fields (which is the length of the field we decide to perform Reed Solomon coding) which gives us the matrix as follows.
Where Txs is the transaction data, and msg1 and msg2 stand for the messages which is the interaction between the block producer and transaction sender which is signed by the transaction sender. And then broken into parts so that they can be laid out in the matrix
For the above matrix, we decide on a generator polynomial that has alpha as the primitive roots and can also be written in the form of a matrix. The generator matrix helps us to add parity to the original data and also it helps to recover the loss from the parity. The generator matrix looks like below.
Which is obtained by obtaining a primitive polynomial. Now let us suppose that we have our following matrix as the original data in the GF(2^8) modulo 0x11D
Now we create a matrix that can encode any matrix of size 4xn. Here we have n = 8 therefore we create a matrix of size 8x4, which can form a polynomial of size n and has n-1 roots hence our parity encoded matrix is as follows:
Since we now know that the Vandermonde matrix is an alternative approach to the Lagrange interpolation method, it can interpolate the polynomials by expressing them in matrix form.
Also, we know that polynomials are unique, and therefore it is easy to generate polynomials using the determinant of the matrix or second approach is to form a generator matrix by taking the upper four rows since we have four rows in the original data therefore we are considering only four rows, which helps us yield the following matrix.
Since we have obtained our extended matrix, we will now multiply our original data, and we will have the following result
Now let us assume that we have lost our data, let say row 0 and 4 from the extended matrix. Hence our encoding matrix (left) and original matrix (right) becomes as follows
Since we know that A*A-1=I. Therefore we invert the encoded submatrix and multiply the inverse with encoded matrix to check that our answer is correct so we get
As we have obtained an identity matrix therefore we are safe to say that we have a correct inverse matrix. Now in order to obtain our lost data we will multiply our inverse matrix with the extended data which will result in the restoration of the original matrix.
In actual process what we do is that we only multiply the extended data with the sub inverse row of the generator matrix as shown below
Once we have constructed the data matrix row we will also generate parity in the same way
Now it comes to mind how we are extending our data horizontally so it's not that of moon maths it's basically we take another Vandermonde matrix and apply the transpose of the matrix this how we get our generator function. Once we have our generator matrix we then continue to the same work that is explained above.
The advantage of the Reed Solomon code is that the probability of an error remaining in the decoded data is much lower than the probability of an error if Reed Solomon is not used. But the approach of encoding data in matrix form is not efficient since the Vandermonde matrix fails on the data of large size therefore in order to have data more secure I believe Lagrange interpolation would be a better approach to the problem.