We receive many enquires from users requesting information about the performance of our server API and how it compares to other standard implementations. We decided to setup a laboratory test to provide some basic statistics and advice to users that would answer the following questions.
- What's the maximum throughput can I expect for a single connection?
- How does throughput scale with increasing connections?
- What resources are required to support this maximum throughput.
Our test environment consisted of a 1U rack server running with the following specification:
Operating System: Ubuntu 12.04
Processor: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
HDD: 2TB, SATA 6Gb/s NCQ
Network: 1 Gbps Interface
Test 1 - Throughout Requirements
The first test we setup was to determine the maximum throughput of the server for a single network interface and establish the CPU resource level required to support this throughput and how this scaled with multiple connections. As we already know that performance is heavily dependent on the type of cipher used in SSH connections we decided we would test two different ciphers, AES and Arcfour. We chose these because AES is the default cipher for our own implementation and many others whilst Arcfour is well known to be an efficient, fast cipher and should demonstrate the maximum performance of the API. We do not recommend using Arcfour in production.
We created a client script that allowed us to set the preferred cipher and initiate as many clients transferring a 500MB file as we required. We then started recording the throughput achieved by each execution of the script, increasing the number of connections with each iteration of the test.
The server was configured to only use a single transfer thread which restricted the use to a single CPU core on the server. Therefore our results would provide information on how much throughput a single transfer thread could handle
Test 1 - The Results
The table below provides the test results and the graph provides a more readable view.
As expected the Arcfour cipher was the better performing cipher with a single connection throughput of 97.2 MB/s but this comes with a compromise on security.
The AES cipher provided a single connection throughput of 53.3 MB/s.
As the connections scaled we see an even share of throughput distribution across the connection, the server (which remember is currently restricted to a single thread/CPU core) maintains a consistent throughput throughout the different iterations.
What does this data tell us?
My interpretation of the data is that we could expect a 2-core server with 2 transfer threads to handle the maximum throughput of a 1Gbps network interface. If we want to scale the server to handle more load then we need to ensure that we have 1 Gbps Network Interface to every 2 transfer threads / CPU Cores.
Arcfour with SHA1 - Not recommended for External Use / Internal Only
AES/128/CTR with SHA1 - Recommended for External Use
Test 2 - Comparison against OpenSSH Server
The next test we setup using the same client script and process was to perform the same test against our own server as well as an OpenSSH native server. This would provide a benchmark as to where our performance is compared to the most widely deployed SSH server.
This test repeats the same process as Test 1, however we placed no restriction on our own server and configured the number of transfer threads to 4 to match the number of cores available to OpenSSH. The transfers were still restricted to the single 1 Gbps network interface and we used the AES cipher.
Test 2 - Results
The table below provides the test results and the graph a more readable view.
Both servers scaled well with our own server achieving comparable performance with OpenSSH.
Our tests have demonstrated that the performance of the Maverick SSHD server is comparable to the native performance of an Open SSH server in our laboratory conditions. We have established a formula for ensuring that a server can maximise its resources when it needs to scale by ensuring that it has one NIC for each 2 CPU cores and that our own server should be configured with one transfer thread per core. Anything less than this and performance will almost certainly be compromised.