EC2 Spot Fleet = 5x Faster Regression + 90% Lesser Cost
Most companies adopt QA automation for speed, accuracy, reusability, transparency and long-term cost efficiency. We, too, started automation with similar reasons. We had the most popular and powerful tool used in the market and known to every Automation engineer i.e. Selenium. This tool allows test developers to create completely custom UI automation frameworks and allows us to take advantage of the Page Object design model. Tests built using this design pattern usually contain less code and are much easier to maintain.
Details of why we used the Page Object design model and Selenium is a topic for another blog but this one is to address the problems faced while running the large regression test suites.
The problem to fix:
When we started automating our regression tests, our major goal was to cover the entire workflow to have good test coverage for our application.
This resulted in a huge and time-consuming test-suite with around ~600 UI tests. The execution of these tests on our limited resources became the biggest bottleneck.
These suites were not run in headless mode and in parallel due to the flow involved in our application. Our setup had around 5 physical desktops on which these tests were running but the execution time was as high as ~4hrs. This impacted our release cycle.
We could only target one release per day. Our automation team had to come up with something which would help reduce the regression time to as minimum as possible.
Our First Solution which did not Fare Well:
Increasing the number of physical hardware (from 5 to more) wasn’t an option as maintenance and cost would become a concern down the line.
We then tried using the AWS Workspace (Desktop As a service solution), around 8 -10 workspaces were used. This reduced the execution time to half of the previous time (~2hrs) and there was no maintenance cost as these workspaces were configured to run only when the regression jobs were running. These would go offline when not in use. This had its own limitation.
Trying to do multiple releases was still a challenge here as these instances were charged based on the usage and the duration of the tests were ~2 hrs which was still too high.
Then came the AWS Spot Fleet:
Since our primary goal was to reduce the Test execution time from 4Hrs to as much as 45mins-1hr, we needed our jobs to run on multiple instances at low cost, and with a little research, we came across ec2-spot-jenkins-plugin. But What is Amazon EC2 Spot Fleet and how is this plugin helpful?
In short, it provides an ability to create fleets that are a combination of both EC2 On-Demand instances and Spot instances with a single API call. This works best for batch workloads which was our use-case.
Features of this plugin:
- Fast and Easy Access to capacity.
- Optimized cost.
- Allows custom scaling based on the application’s need.
- EC2 Spot Fleet or Auto Scaling Group as Jenkins Workers.
- Auto resubmission of Jobs on failure due to spot interruption.
- No delay in scale-up strategy.
- Allow custom EC2 API endpoint.
This plugin assists to scale-up several instances and maintain target capacity as Spot prices alter to maintain the fleet within the stated price range.
As a bonus, Investing in this had a major impact during the COVID-19 outbreak which required us to work remotely. This solution entirely ruled out the need for maintaining the Physical desktops which required frequent start, patch, powering and network cabling.
Spot Instances are available at a discount of up to 90% off compared to On-Demand pricing.
We are using a t3.medium (windows instances) at $0.0309 per Hour and 117 USD per month.
Build Time Trend of jobs after moving to spot fleets.
How to Configure it:
Make sure that you:
- Checked “Maintain target capacity”
- Specify an SSH key that will be used later by Jenkins.
Setup Jenkins by adding the new EC2 fleet which was launched.
- Go to Manage Jenkins > Plugin Manager
- Install EC2 Fleet Jenkins Plugin
- Go to Manage Jenkins > Configure System
- Click Add a new cloud and select Amazon EC2 Fleet
- Configure credentials and specify EC2 Spot Fleet or Auto Scaling Group which you want to use.
The scaling limits can be defined in your cloud settings under “Maximum Cluster Size”. Jenkins will attempt to scale the fleet up in case tasks are waiting in the build queue and scale down the idle nodes as per the “Max Idle Minutes Before Scale down” time.
Dependency Graphs were created to identify independent test suites which could be run on different instances parallel.
We have around 45 different instances launched through the fleet and the time taken for regression has reduced drastically from ~4hrs to ~45 minutes with around 1+min to instantiate and launch these instances.
Got any other alternatives that worked well for you? Let us know in the comments.