This article's aim is to help developers get started with Solana and the Anchor framework, by giving some insight into why things are so different on Solana than other blockchains. You may have heard the phrase "chewing glass," which refers to the complexity of account validation on Solana. Possibly you have also been told that data accounts on Solana are of a fixed size and may therefore be wondering how to use flexible data structures such as vectors and hashmaps - after all, they grow with their data.
First of all, let's clarify why Solana chose this rather unique programming model that splits a program's state into many accounts, instead of going with a more conventional approach.
The main reason for this is speed - more precisely, transaction throughput.
Let us consider a conventional blockchain program implementing a token contract. In essence, you'll have a hash map that stores the user as the key and their balance as the value.
At any point in time there may be hundreds of users trying to send tokens to other people, but conventional programs will need to lock the entire program state when processing each transaction. You may know this from parallel programming and mutexes.
You simply cannot guarantee the correct execution of all operations should they run in parallel.
A transfer of tokens can be divided into two parts:
Add to balance of recipient
Subtract from balance of sender
Consider a naive parallel execution:
Thread A reads the balance of Bob as 100 while sending all 100 tokens to Charlie. Now Alice, on thread B, sends 50 tokens to Bob which modifies Bob's balance to 150 but then Bob's transfer arrives at the second step and writes Bob's balance back as 0.
Clearly these are mutually exclusive operations and require being processed in sequence. And since on most blockchains the entire state for some token is in one account, no parallel transfers are possible.
Solana works around this limitation by creating individual data accounts. Bob's balance is in his token account for the relevant token type and Alice has an account of her own.
It is of course still required to lock these individual accounts for writing during a transaction. But now the lock will only affect individual users’ accounts - not all holders of a certain token. There could be hundreds of thousands of accounts for a token and chances are that most token transactions in any given block can be executed in parallel on Solana as they do not all write to the same accounts.
This way, Solana is able to run much faster than other chains but it also puts a burden on developers who now need to make sure that all the data accounts provided for the transaction belong to the correct program and contain the data that is expected.
Each transaction on Solana has to indicate which accounts will be used and whether they are being read to or written to. Should a transaction try to write to a read only account, the transaction will fail. This transaction data is, of course, supplied by users and can essentially be pretty much anything they want.
Verifying these accounts is therefore a crucial step for developers.
Before we get into the validation of these accounts, let us first look at how a hashmap data structure is split into these individual accounts on Solana.
The Program Derived Account (PDA)
A Solana PDA belongs to the program it is created by. Only the program owning it can modify its contents. This PDA account will have a public key - just like any other blockchain account, but unlike other accounts, it does not have a private key that can sign for it.
Meaning that for Alice, her USDC token account is owned by the token program, which can modify it if she submits certain transactions to the program and authorizes the action by providing the correct signature. Without such a signature check, anyone would be able to transfer everyone else's tokens at will. Always remember to check that the correct authority has signed a transaction.
Now, you can also see why accounts on Solana are a fixed size. Each token account has a known size, as it only holds information about the owner, their balance, and a few other details.
If you need to store many things, you'll be creating PDAs. This could be user associated data such as token balances, whitelists, and other things.
What do you need to create a PDA? In essence a PDA's address is generated by concatenating seed data.
While you can use any data for the seeds, you should probably shy away from using user input as this could create the opportunity for a bump seed collision.
Consider the following seeds:
“user_account”, [name] “user_account_config”, [name]
As you can see, these could easily become the same seed if the user is free to choose the name variable.
Prefixing "_config" before an actual name could potentially allow users to cause issues for a program doing this. It is more common to see seeds based on a user account key, which in essence yields a user specific account.
Let us consider the example of a potentially very long list, say a whitelist for acceptable NFTs.
To make this work, you may store the number of current items already in the list in some state account and then create the individual PDAs as necessary.
As already said, PDA addresses are generated from seeds, and you may choose to use "WHITELIST_NFT", [Index] as the seeds. Index would be the offset for your list item (0 ... n).
Each account (item in the list), let's call it a whitelisted collection account, would only hold one address: the whitelisted collection.
Adding a new item to the whitelist comes down to creating a new whitelisted collection account with the next offset as its seed and storing the address of the collection in this account.
Any transaction that has to check against this whitelist can take a whitelisted collection account and check whether the address stored in this account is as expected. It falls to the sender of the transaction to provide the correct whitelisted collection account.
As we already discussed earlier, this account has to be verified. Most importantly, it has to be an account owned by our program and must be of the correct type.
Without any checks, a user could create an account with any collection address in it that they choose. This is why we need an ownership check. If we check the account owner is our program, it becomes impossible for a user to create such an account all by themselves.
However, it is possible that our program allows users to create other PDAs. Maybe we have a delegator account where users can freely choose an address that can then execute instructions on their behalf. This account may only hold an address, just like our whitelist. Content wise, such an account will be indistinguishable from our whitelisted collection account.
A user could then create this delegator account and set whatever address they like and pass this account as a whitelisted collection account into our program. Being a legit PDA, the ownership test would of course succeed.
This is, however, not a problem when using the Anchor framework. This is because Anchor prefixes each account with unique extra bytes indicating the account type and checks those extra bytes whenever an account is passed to the program. If you were to develop without Anchor, you would have to implement your own account type discriminator checks.
Also, if you base your verification of an account A on data within account B, remember to verify account B as well. Otherwise you may again end up basing your trust on data that users may be able to set up to be whatever they like.
Last but not least, remember that two accounts could be the same. Sometimes it may just be that a transfer from A to A does not make sense, but it is easy to imagine use cases where rewards may be issued for certain actions that depend on an asset actually being moved.
Great! Now you are hopefully equipped with the knowledge to understand what's going on under the hood and a general idea of how to layout your Solana contracts. With these insights, the next step could be to look at Anchor coding tutorials and get started writing some code :)