This guide covers common issues you may encounter when running an Agave validator and how to resolve them.
Getting Help
There is a #validator-support Discord channel available to reach other validators and get help:
- Discord Server
#validator-support - General support channel for any validator related queries
#testnet-announcements - Critical information relating to Testnet
Useful Resources
Common Startup Issues
Validator Won’t Start
Check System Tuning
Your validator may not start without proper system tuning. Verify these settings are applied:
# Check sysctl settings
sysctl net.core.rmem_max
sysctl net.core.wmem_max
sysctl vm.max_map_count
sysctl fs.nr_open
Expected values:
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
vm.max_map_count = 1000000
fs.nr_open = 1000000
If values are incorrect, reapply system tuning:
sudo sysctl -p /etc/sysctl.d/21-agave-validator.conf
Check File Limits
Verify file descriptor limits:
Should return 1000000. If not, check /etc/security/limits.d/90-solana-nofiles.conf and log out/in again.
Check Disk Space
Verify sufficient disk space:
Ensure your ledger and accounts directories have at least 500GB free.
Check Permissions
Verify the sol user has proper permissions:
ls -la /mnt/ledger
ls -la /mnt/accounts
ls -la ~/validator-keypair.json
Fix permissions if needed:
sudo chown -R sol:sol /mnt/ledger
sudo chown -R sol:sol /mnt/accounts
chmod 600 ~/validator-keypair.json
Invalid Genesis Hash
If you see errors about genesis hash mismatch, you may have the wrong genesis for the cluster.
Delete your ledger directory and restart the validator to download the correct genesis.
# Stop validator
sudo systemctl stop sol
# Remove ledger data
rm -rf /mnt/ledger/*
# Start validator
sudo systemctl start sol
Failed to Bind Port
If you see “Address already in use” errors:
# Check what's using the port
sudo netstat -tunlp | grep 8899
# Kill the process if needed
sudo kill <PID>
Or change the port in your validator startup script:
--rpc-port 8900
--dynamic-port-range 9000-9020
Catchup and Sync Issues
Validator Not Catching Up
If your validator is not catching up to the cluster:
Check if Validator is Running
ps aux | grep agave-validator
Check Network Connectivity
Verify you can reach the cluster:
If this hangs or returns no results, check your network connection and firewall rules.
solana catchup <VALIDATOR_PUBKEY>
If the slot difference is increasing, your validator is falling further behind.
Verify CPU, memory, and disk are not maxed out:
Check for errors in the logs:
tail -f /home/sol/agave-validator.log | grep -i error
Try Downloading a Fresh Snapshot
If your validator is very far behind, download a fresh snapshot:
# Stop validator
sudo systemctl stop sol
# Remove --no-snapshot-fetch flag from validator.sh
# Or delete ledger to force fresh download
rm -rf /mnt/ledger/*
# Start validator
sudo systemctl start sol
Slow Catchup Speed
If your validator is catching up but slowly:
Check Network Bandwidth
Ensure you’re not saturating your network connection. Staked validators should have at least 2 Gbps, preferably 10 Gbps.
Check Disk I/O
Look at %util and await for your NVMe drives. If consistently at 100% utilization or high await times, you may have a disk bottleneck.
Verify Using NVMe Drives
Ensure accounts and ledger are on separate NVMe drives:
Falling Behind After Catchup
If your validator catches up but then falls behind again:
Check PoH Speed
Look for PoH (Proof of History) performance in logs:
grep "PoH" /home/sol/agave-validator.log | tail -20
If PoH hashes/second is slower than the cluster target:
# Set CPU to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Check CPU frequency
lscpu | grep MHz
Verify Release Build
If built from source, ensure you’re using a release build:
agave-validator --version
Rebuild with release flag if needed:
High Skip Rate
A high skip rate indicates your validator is missing votes on blocks.
solana validators | grep <VALIDATOR_PUBKEY>
Look at the skip rate column. A consistently high skip rate (>5%) indicates issues.
Common Causes
- Network issues - Poor connectivity to other validators
- CPU performance - CPU not fast enough or throttled
- Disk I/O - Slow disk causing delays
- Memory pressure - Insufficient RAM causing swapping
- System clock drift - System clock out of sync
Solutions
Check system clock:
Sync time if needed:
sudo systemctl restart systemd-timesyncd
Check for swap usage:
If swapping, you need more RAM.
Not Producing Blocks During Leader Slots
If you’re missing your leader slots:
Verify You’re in Leader Schedule
solana leader-schedule | grep <VALIDATOR_PUBKEY>
Check Logs During Leader Slot
Look for errors during your leader slots:
grep "leader slot" /home/sol/agave-validator.log
Ensure Validator is Caught Up
solana catchup <VALIDATOR_PUBKEY>
You must be caught up to produce blocks.
Vote and Stake Issues
Not Voting
If your validator is not voting:
Check Vote Account Status
solana vote-account <VOTE_ACCOUNT_PUBKEY>
Verify it shows recent vote activity.
Check Identity Account Balance
solana balance ~/validator-keypair.json
If balance is too low, your validator cannot afford vote transaction fees and will stop voting.
Fund your identity account:
solana transfer <VALIDATOR_IDENTITY_PUBKEY> 10
Verify Vote Account Configuration
Check that your validator is using the correct vote account:
ps aux | grep agave-validator | grep vote-account
Delinquent Validator
If your validator shows as delinquent:
solana validators --delinquent
This means your validator has fallen behind and is not voting.
Recovery Steps
sudo systemctl status sol
tail -100 /home/sol/agave-validator.log
Verify Network Connectivity
solana gossip | grep <VALIDATOR_PUBKEY>
agave-validator exit --max-delinquent-stake 40
Stake Not Activating
If you’ve delegated stake but it’s not activating:
Verify Stake Account
solana stake-account <STAKE_ACCOUNT_PUBKEY>
Check the “Delegated Stake” and “Activating Stake” fields.
Check Epoch
Stake activates at epoch boundaries. Check current epoch:
Stake will activate at the next epoch boundary and warm up over several epochs.
Verify Vote Account
Ensure stake is delegated to the correct vote account:
solana stake-account <STAKE_ACCOUNT_PUBKEY> | grep "Delegated Vote Account"
Disk and Storage Issues
Out of Disk Space
If you run out of disk space:
Clean Up Snapshots
Remove old snapshots if space is tight:
du -sh /mnt/ledger/snapshots/*
rm /mnt/ledger/snapshots/snapshot-*.tar.zst
Keep at least the most recent snapshot for faster restart.
Verify —limit-ledger-size
Ensure your validator startup script includes:
This automatically prunes old ledger data.
Corrupted Blockstore
If you see blockstore corruption errors:
grep -i corrupt /home/sol/agave-validator.log
Use WAL Recovery
Add to your validator startup script:
--wal-recovery-mode skip_any_corrupted_record
Last Resort: Clean Ledger
If corruption is severe:
sudo systemctl stop sol
rm -rf /mnt/ledger/*
sudo systemctl start sol
Blockstore Management
The validator blockstore rocksdb database can be inspected using the ldb tool. ldb is part of the rocksdb code base and is also available in the rocksdb-tools package.
List Column Families
ldb --db=/mnt/ledger/rocksdb/ list_column_families
Downgrade Issues
If a new column family has been introduced to the validator blockstore, a subsequent downgrade of the validator to a version that predates the new column family will cause the validator to fail while opening the blockstore during startup.
Please seek guidance on Discord before modifying the validator blockstore.
Drop Column Family
Only do this if you understand the implications:
ldb --db=/mnt/ledger/rocksdb drop_column_family <column_family_name>
Network Issues
Not Appearing in Gossip
If your validator doesn’t appear in gossip:
solana gossip | grep <VALIDATOR_PUBKEY>
Check Firewall
Verify firewall allows P2P ports:
Should show ports 8000-8020 (or your custom range) allowed for TCP and UDP.
Check Entrypoints
Verify entrypoints are reachable:
telnet entrypoint.testnet.solana.com 8001
Verify Public IP
Ensure your validator has a public IP:
This should match the IP shown when you eventually appear in gossip.
High Latency to Peers
If you have high network latency:
solana gossip | grep <VALIDATOR_PUBKEY>
Check the datacenter distribution. If you’re geographically isolated from other validators, you may experience higher latency.
Log Analysis Techniques
Finding Errors
# Recent errors
grep -i error /home/sol/agave-validator.log | tail -50
# Errors in last hour
find /home/sol -name "agave-validator.log*" -mmin -60 -exec grep -i error {} \;
Finding Specific Events
# Leader slots
grep "My next leader slot" /home/sol/agave-validator.log
# Vote activity
grep "voted" /home/sol/agave-validator.log | tail -20
# Snapshot downloads
grep -i snapshot /home/sol/agave-validator.log | grep -i download
Analyzing Restarts
# Find all restarts
grep "Starting validator" /home/sol/agave-validator.log*
# Check version at each restart
grep -B1 "Starting validator with" /home/sol/agave-validator.log*
Recovery Procedures
Complete Fresh Start
If all else fails, start completely fresh:
cp /home/sol/bin/validator.sh /home/sol/bin/validator.sh.backup
cp /home/sol/validator-keypair.json /home/sol/validator-keypair.json.backup
rm -rf /mnt/ledger/*
rm -rf /mnt/accounts/*
tail -f /home/sol/agave-validator.log
solana catchup <VALIDATOR_PUBKEY>
If you experience critical issues:
- Check Discord
#validator-support
- Review GitHub issues
- Check Status page for cluster-wide issues
Preventive Measures
Regular Health Checks
Schedule regular health checks:
# Create a health check script
cat > /home/sol/health-check.sh <<'EOF'
#!/bin/bash
echo "=== Validator Health Check ==="
echo "Validator Status:"
solana validators | grep $(solana-keygen pubkey ~/validator-keypair.json)
echo ""
echo "Balance:"
solana balance ~/validator-keypair.json
echo ""
echo "Recent Votes:"
solana vote-account ~/vote-account-keypair.json | head -20
echo ""
echo "Disk Usage:"
df -h | grep /mnt
EOF
chmod +x /home/sol/health-check.sh
Set Up Monitoring
Always run agave-watchtower on a separate machine to monitor your validator.
Document Your Setup
Keep documentation of:
- Startup script configuration
- System tuning settings
- Network configuration
- Recovery procedures
- Contact information for support
This helps you and others troubleshoot issues faster.