Building a bare-metal validator for Solana
Keeping hardware busy
The vision behind Solana is a system that scales with the hardware and bandwidth it’s supplied with. As Multicoin puts it in their investment thesis:
Solana’s guiding principle is that software shall not get in the way of hardware.
This is a powerful thing to have as Moores Law describes the exponential increase of computing power over time. Also, the bandwidth available for servers in data centers around the world is increasing steadily. Maturity and falling costs of networking equipment will lead to increased adoption over the coming years.
The aspiration of Solana is that as the supermajority of validators gets access to a 10x faster network connection the system throughput will be able to scale by a factor of 10x with the 10x increase in bandwidth. Its naturally going to require a lot of low-level performance engineering, but we like the compelling vision to scale out with available bandwidth and ever-increasing hardware performance.
We recently built a server that will serve as the basis for our Solana validator. Lately, we received lots of questions regarding the actual build and chassis. This post will discuss Solanas hardware requirements and offer three individual hardware configurations based on the same server chassis.
The hardware requirements
GPUs
Solanas novel approach of massively parallelizing transaction processing on GPUs makes a fast GPU mandatory. Right now, GPUs are used for signature verification, but in the future, transaction execution will happen on GPUs as well. One of Nvidia’s current flagship products, the RTX 2080Ti, offers 4300 CUDA cores and gives a hint of the massive potential to parallelize operations across those cores.
Storage
Looking at Solanas throughput, the ledger might quickly grow to several TB. To handle the burden, the task of keeping the ledger available is offloaded to so-called Replicators. They form a Filecoin like storage layer which is tightly coupled to the rest of the system. Solana validators can discard old leger data to limit storage requirements. We expect that ~ 500GB storage will be a sufficient amount for a validator. The challenge for our storage medium will be the access speed and bandwidth it provides. IOPS is an input/output performance measurement used to characterize computer storage devices. Modern NVMe based SSDs offer incredible IOPS rates of more than 500.000 — compare that to the meager ~200 IOPS an old school HDD offers.
CPUs
Our mainboard offers two LGA 3647 sockets, which means we can install two physical CPUs. OS schedulers are pretty smart these days about thread migration between physical CPUs. Our tests have shown that the Solana client software has no problem dealing with two of them. Looking at Tour de Sol, transaction execution will happen on CPUs, which will put more load on those. Perspectively, load will move from CPUs to GPUs. Since the timeline for that to happen is speculative, we recommend solid CPUs that won’t bottleneck your system in the meantime.
RAM
Solana doesn’t have much use for excessive RAM right now so we will go with a moderate amount. All CPUs we use in the following examples support hexa-channel DDR4 memory. Multi-channel memory increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between them. Make sure you go for more RAM modules with lower capacity instead of fewer ones with higher capacity. For our hexa-channel support, we ideally need six RAM modules to make use of all six channels.
Going forward with 3 actual builds
The basis for our machine is this GPU friendly chassis which only takes up 1U in rack space:
The 1U size means that it’s a high-density machine with lots of hardware stuffed into a minimal amount of space. It might sound counter-intuitive, but the high density allows for better cooling. 9 powerful fans create a high-pressure airflow that keeps our machine at a pleasant temperature.
The “lower” case:
2x Intel Xeon Silver 4208 Processor - 8 Cores, 2.10GHz
6x 8GB 2666MHz DDR4 ECC RAM
1x RTX 2080 Ti, 11GB GDDR6, 4352 CUDA Cores
1x 512GB Samsung 970 PRO NVME M.2 SSD
The middle case:
Improvements: 8 more CPU cores. One more GPU and double the amount of NVMe storage.
2x Intel Xeon Silver 4214 Processor — 12 Cores, 2.20GHz
6x 8GB 2666MHz DDR4 ECC RAM
2x RTX 2080 Ti, 11GB GDDR6, 4352 CUDA Cores
1x 1TB Samsung 970 PRO NVME M.2 SSD
The higher case:
Improvements: Intel Gold series & 4 more CPU cores. One more GPU.
2x Intel Xeon Gold 5120 Processor — 14 Cores, 2.20GHz
6x 8GB 2666MHz DDR4 ECC RAM
3x RTX 2080 Ti, 11GB GDDR6, 4352 CUDA Cores
1x 1TB Samsung 970 PRO NVME M.2 SSD
With a few modifications, this is basically our machine.
A few thoughts on redundancy and extensibility
Our mainboard offers excellent extensibility. Once transactions on the Solana network ramp up and one GPU is struggling to keep up we can easily add another card. You might have noticed that GPUs and SSDs are consumer-grade hardware. It saves us a ton of money but certainly is a tradeoff in terms of hardware reliability. In case you are worried about the whole system being dependent on one consumer-grade SSD have a look at this hardware RAID solution:
It needs a dedicated PCIe 3.0 x16 slot and can deliver up to 14,000MB/s of transfer performance. We can put 4x 1TB Samsung 970 PRO NVME M.2 SSDs in there and run it as a RAID10. This is a considerable boost in terms of IOPS and eliminates the risk of downtime in case one of our SSDs malfunctions. Also, make sure that you order your machine with two redundant PSUs (power supply units).
Approaching Mainnet
Given the strong focus on hardware, running a validator for Solana is particularly interesting. We are looking into ways to build infrastructure that serves the network in the best way possible. If you are looking forward to staking your Solana tokens one day — we got your back. Solana is implementing a native delegation feature which allows token holders that are not willing to operate a validator on their own to still participate in consensus. Stay tuned for more information as we approach mainnet and follow us on Twitter & join our Telegram. We would love to chat.
— The Staking Facilities Team