System stress and load testing
System stress testing is run after integration testing to shake out bugs before they become critical field problems. The main objective is to "burn-in" the system in the lab environment. If the system is subject to conditions that are harsher than the field, there is a good chance that all the show stopper bugs would be caught before deploying the system in the field.
System stress testing can be divided into the following steps :
Feature Interference Tests
End to end testing of features is generally tested fairly well during integration testing. However, interactions between features are generally left out in this stage of testing. This hole is filled in by feature interference testing. Feature interference testing involves testing each feature offered by the system in presence of every other feature.
The best way to develop feature interference tests is to produce a feature interference matrix. The feature interference matrix is discussed below.
Feature Interference Matrix
Feature interference matrix is produced by making a cross of each feature offered by the system with every other feature. A table is drawn by having the list of all the system features as the first horizontal row and also as the first vertical column. Then each box in the table corresponds to the cross of two features depending on its position horizontally and vertically. A test should be developed for each such box. This table then corresponds to the feature interference test matrix.
The feature interference matrix can be easily populated by considering the row and column features for each box. A test case is identified by assuming that the row feature executed first and the column feature started execution while the row feature was in progress. Here is an example of a feature interference matrix that should clarify the population of the matrix. The matrix has been produced by taking a sub-list of the features offered by a switching system.
Features | Originating Subscriber Call | Terminating Subscriber Call | XEN Processor Failure | CAS Processor Failure | Subscriber Port Failure | Operator Commands |
Originating Subscriber Call | Test 1-1: Handling of two originating calls from a single PBX. | Test 1-2: Verify that a terminating call is rejected if an originating call has been setup for the subscriber. | Test 1-3: Verify system behavior when XEN processor fails after an originating call has been setup. | Test 1-4: Verify system behavior when CAS processor fails after an originating call has been setup. | Test 1-5: Verify handling of an originating call when the subscriber port fails. | Test 1-6: Verify handling of an originating call when operator puts the subscriber port out-of-service. |
Terminating Subscriber Call | Test 2-1: Add test to verify that an originating call is rejected if a terminating call has been setup for the subscriber. | Test 2-2: Add test to verify handling of two terminating calls to a single PBX. | Test 2-3: Verify system behavior when XEN processor fails after a terminating call has been setup. | Test 2-4: Verify system behavior when CAS processor fails after a terminating call has been setup. | Test 2-5: Verify handling of a terminating call when the subscriber port fails. | Test 2-6: Verify handling of a terminating call when operator puts the subscriber port out-of-service. |
XEN Processor Failure | Test 3-1: Verify that an originating call cannot be setup when XEN processor for the subscriber port fails. | Test 3-2: Verify that a terminating call cannot be setup when XEN processor for the subscriber port fails. | Test 3-3: Verify handling of calls of when multiple XEN processors fail at the same time. | Test 3-4: Verify system behavior when CAS processor fails when a XEN processor has already failed. | Test 3-5: Verify system behavior when a subscriber port fails when a XEN processor has failed. The system should detect the port failure when the XEN processor recovers. | Test 3-6: Verify that operator commands for a failed XEN processor are rejected by the OMC. |
CAS Processor Failure | Test 4-1: Verify that an originating call can be setup when a CAS processor has failed. | Test 4-2: Verify that a terminating call can be setup when a CAS processor has failed. | Test 4-3: Verify system behavior when XEN processor fails when a CAS processor has already failed. | Test 4-4: Verify that no calls can be supported when a CAS processor fails when another CAS processor has already failed. | Test 4-5: Verify system behavior when a subscriber port fails when a CAS processor has failed. | Test 4-6: Verify that operator commands for a failed CAS processor are rejected by the OMC. |
Subscriber Port Failure | Test 5-1: Verify that call setups are rejected on a failed port. | Test 5-2: Verity that call termination is rejected on a failed port. | Test 5-3: Reboot XEN processor and verify that the number of failed ports for the XEN before and after the reboot is same. | Test 5-4: Reboot CAS processor and verify that the number of failed ports in the system before and after the reboot is same. | Test 5-5: Verify that simultaneous failure of two ports in the same XEN is handled correctly. | Test 5-6: Verify that a failed port can be put out-of-service by the operator. |
Operator Commands | Test 6-1: Verify that a subscriber will not get dial tone and the originating call will fail if operator has put the subscriber port out-of-service. | Test 6-2: Verify that a terminating call will fail if operator has put the subscriber port out-of-service. | Test 6-3: Reboot XEN when an operator command for subscriber port on the XEN is in progress. Verify that the command failure is reported with the correct reason. | Test 6-4: Verify the clearing of all calls when one CAS processor fails when the other one is already put out-of-service by the operator. | Test 6-5: Verify that a subscriber port failure is handled even when an operator command is in progress for the same port. | Test 6-6: Verify that the system is able to handle simultaneous commands from two different operators for the same entity. The commands should be executed one after the other. |
Interference Test Procedures
Once you have developed the feature interference matrix, define detailed test procedures from the matrix. The test procedures can be divided into two categories:
- Simple tests involving initiating a row feature followed by a column feature
- Load tests involving multiple instances of row and column features. We will discuss this in the next section.
Interference Load Tests
Interference load tests are identified from the feature interference matrix. Basically you run simultaneous load for different features. Here load does not only mean handling subscriber load. Load could also mean repeatedly executing operator commands via a script , rebooting boards periodically.
Interference load tests are best explained by examples from the above matrix:
- Run subscriber load (originating to terminating calls) and operator command load overnight.
- Run subscriber load with periodic CAS and XEN processor failures.
- Run subscriber load and periodically inject faults in subscriber ports.
Stress Load Tests
Stress load tests are the final step of system stress testing. Here the system is subjected to field like conditions. Actually the conditions for these tests are harder than what the system would have to handle after deployment.
Stress Load Testing Guidelines
- Overload the system. During stress load conditions, the system should be subjected to harsher conditions than the field environment. By doing this you can make sure that the system will run stably for extended periods of time. Thus a weekend stress test might give you confidence that the system would survive a month of regular system operation.
- Load test the system with field type traffic mix. Run a traffic mix that is close to the expected load in the field conditions. Many times field traffic mix data can be obtained from studies and papers on that subject.
- Load test the system with traffic that is varying with time. When the system is deployed in the field, it will be subjected to huge fluctuations in traffic. Simulate such fluctuations in the lab. Keep in mind that there might be bugs in the system which show up only with fluctuating traffic. Most system have bugs in handling of high load as well as low load conditions.
- Load test the system with events that have random inter-arrival time. Run load such that the inter-arrival traffic distribution is random. This will exercise several legs of your code that you could not even imagine. To make the random tests reproducible, seed the random number generator with a known value before the load test. This way you would be able to recreate the exact test conditions by feeding the same random number seed.
- Load test the system with events that have random service time. Do not run load with a fixed call/session duration. Use random session durations during your load tests. For best results you should use a load generator that works with Poisson inter-arrival and service times.
- Load test everything. Do not restrict your load testing to just subscriber load. Load testing fault conditions and operator commands will make sure that these features do not display memory leaks and other slow build up faults.
- Measure load performance. Load runs are not just for verifying the stability of the system. Always measure and plot the performance of the system during the entire load test. The best way to do this is to use tools that plot graphs showing system performance.