With Agile & DevOps becoming more & more predominant in software development methodologies, early performance analysis & predictive performance has marked its presence as a norm for business critical & high traffic applications. Performance modelling & prediction analytics using the historical statistics gathered across several layers & SDLC phases can help in analyzing several what-if scenarios & making quick performance judgments without actually testing the system. This paper brings in a view on key aspects to be considered for building a Predictive Performance Analytics solution along with various Analytical, Regression & Simulation modelling techniques.
What is Performance Prediction?
Performance prediction is the process of forecasting the system performance using mathematical models and/or using statistical historical measurements. Predictive Performance Analysis can only FORECAST, cannot ascertain what might happen in future as it is probabilistic in nature.
A Performance model uses specific number of building blocks to predict how performance will vary under different what-if scenarios like varied set of load conditions, change in workloads, and change in server capacity, etc. Usually the inputs to the model are expressed in mathematical quantities such as number of users, arrival rate, response time, throughput, resource utilization, etc.
Building Performance Analytics Solution
A robust performance analytics solution should comprise of system performance data collected throughout the software development life cycle phases that can be correlated using several modelling & prediction techniques to forecast the system performance at future load levels.
It should facilitate different types of data collection & storage along with the intelligence built through sound modelling & forecasting algorithms to provide realistic forecasts on system performance for what-if scenarios.
A Predictive model along with the intelligence to prescribe an action to business to act upon along with the feedback system that tracks outcome produced by the action taken becomes Prescriptive model. For example, a predictive model can predict the peak traffic throughput of the application under test whereas a prescriptive model can predict &recommend / alert business about the need to bring down the resident time of specific layer/method or to upgrade the specific hardware resource with high service demand to meet the performance SLAs with clear data points about the expected performance improvements.
Visualizing the required data as onion layers, the performance prediction accuracy increases when the data layers used for building the analytics solution increases. Some of the major data layers to be considered include performance modelling results & actual performance test results from controlled test environment, Network performance simulation results & device side performance metrics (for mobile applications), Test versus Production environment capacity differences , Production infrastructure monitoring statistics, website end user traffic patterns, & web (browser) performance statistics. At least to start with, production environment monitoring statistics & website user traffic statistics data layers are essential ones to do forecasts based on historical data analysis using regression techniques.
Figure : Data layers in Performance Analytics Solution
There are several open source & commercial tools generally used to perform the testing & analysis at each of the below data layers. The key challenge lies in building the intelligence to parse the results produced by variety of tools & provision a tool agnostic reporting structure that can interpret the results captured by various tools.
The Performance analytics solution that comprises of above collected data is now ready to be integrated with performance forecasting solutions (Analytical & Statistical modelling tools) to add the intelligence to perform performance prediction analysis. In case of large historical data availability, applying regression techniques would provide better results, else combination of analytical & statistical techniques would yield more accurate results.
The Predictive performance analytics solution uses several additional data layers to prescribe recommendations to next steps based on predicted results.
Figure : Journey towards Performance Analytics Solution
Performance Modelling / Forecasting Techniques
Performance modelling & forecasting helps to understand how an application will perform under load by simulating the application’s performance at early stage or on the ongoing production environment whenever there are changes in workload, hardware elements, etc.
Modelling techniques are broadly classified as
- Analytical (QN) modelling,
- Statistical (Regression) Modelling,
- Simulation Modelling.
Generally, analytical models are easy to use & comparatively less expensive than simulation solutions. To use either analytical or statistical modelling techniques, it’s necessary to measure performance under a specific range of load conditions.
Queuing Theory is the mathematical study of waiting lines or queues. It is used in various fields like telecommunications, networks, traffic management, etc.
For an abstract system shown in the below figure, jobs arrive from the input source, each job brings a demand for service from the queue. When a job completes its service, it departs the queue.
T, the length of time we observed the system; A, the number of request arrivals we observed; C, the number of request completions we observed.
From these measurements, the below Operational laws can be derived:
- Utilization Law expressed as Utilization, U = X * S
- Little’s Law expressed as Number of Jobs, N = X * R
- Response Time Law expressed as Response Time, R = (N/X) – Z
- Forced Flow Law expressed as Device k Throughput, Xk = Vk * X
- Throughput Bound Law expressed as Maximum Throughput, Xmax<= 1 / Dmax
Using these relationships, different type of techniques can be employed to build QN based models.
Statistical (Regression) Modelling
Regression is the study of relationships among variables used to predict or estimate the value of one variable from known values of other variables related to it.
Linear Regression tries to find a linear relationship between two variables, for example between users and CPU utilization. The general form of it is the straight line equation, Y = mx + c, where c represents y-intercept of the line & m represents the slope are known as regression coefficients.
Estimating the regression coefficients can be done through various techniques of which least squares method is popularly used.
Time Series Forecasting Models
A Time series is a sequence of observations on a variable measured over successive periods of time. The measurements may be taken at any other regular interval of time. Three popular techniques are
- Moving Average
The moving average method uses the average of the most recent k data values in the time series as the forecast for the next period. In this method, every time a new observation becomes available for the time series, the oldest observation in the equation is replaced and a new average is computed. As a result, the average will change as new observations become available.
- Weighted Moving Average
This method is similar to moving average, the difference being it involves selecting a different weight for each time series data value and then computing a weighted average of the most recent k values as the forecast.
- Exponential Smoothing
Exponential smoothing is a special case of weighted moving average which uses a weighted average of past time series values but only one weight is assigned for the most recent observation. It uses a “smoothing constant” to determine how much weight to assign to the actual values.
A Simulation Model is a mathematical model that calculates the impact of uncertain inputs and decisions without the cost and time investment involved in building it. It is used to understand under what load conditions, the system could fail and what loads it can withstand.
Discrete Event Simulation technique is the most popular simulation modelling technique. It models a system whose state may change only at discrete point in time. There are popular tools like SIMPY used for discrete event simulation.
The business objective& typical what-if scenarios that needs to be evaluated by the predictive performance analytics solution decides the type of data layers to be considered for building such solution. Performance Prediction should not be considered as an alternative or replacement for Performance testing. It needs to be considered as a quick & intelligent approach to get a direction in the early life cycle phases or to get a feel of system performance for projected load levels to make quick judgments on scalability levels &infrastructure investments.
- The Operational Analysis of Queueing Network Models by Peter J.Denning& Jeffrey P.Buzen
- Operational Laws & Mean Value Analysis by Raj Jain
- Cengage Learning Chapter 15 on Time Series Analysis and Forecasting published by stephany skinner.
- Simulation, Modeling & Analysis by Law and Kelton