In the days leading up to the holidays, we received the news that our industrial PhD candidate, Adha Hrusto, has had one of her articles accepted for publication at the prestigious International Conference on Software Engineering (ICSE). This conference is the 46th in its series and will take place in Lisbon from April 14th to 20th this year.
The article is titled "Autonomous Monitors for Detecting Failures Early and Reporting Interpretable Alerts in Cloud Operations" and focuses on early detection of potential issues and facilitating their management through AI tools. Currently, we are working on updating the article based on feedback from the reviewers, and we will be in Lisbon to present the article and its results. This work has been carried out as part of WASP, which is the largest research initiative in Sweden throughout history.
WASP, the Wallenberg AI, Autonomous Systems and Software Program, is a groundbreaking research initiative in Sweden. Established as one of the largest ever research efforts in the country, WASP focuses on advancing knowledge and innovation in the fields of artificial intelligence (AI), autonomous systems, and software development. Through collaboration between academia, industry, and society, WASP aims to address critical challenges, foster cutting-edge research, and contribute to the development of intelligent and autonomous technologies. This ambitious program plays a pivotal role in shaping the future of AI and autonomous systems, with a commitment to excellence in research and the practical application of findings to benefit both Sweden and the global community.
The WASP community gathers every year at the WASP Winter Conference, which is the biggest internal event that attracts more than 500 participants. This year, Adha had the opportunity to present her research related to the paper accepted at ICSE.
Detecting failures early in cloud-based software systems is highly significant as it can reduce operational costs, enhance service reliability, and improve user experience. Many existing approaches include anomaly detection in metrics or a blend of metric and log features. However, such approaches tend to be very complex and hardly explainable, and consequently non-trivial for implementation and evaluation in industrial contexts. In collaboration with a case company and their cloud-based system in the domain of PIM (Product Information Management), we propose and implement autonomous monitors for proactive monitoring across multiple services of distributed software architecture, fused with anomaly detection in performance metrics and log analysis using GPT-3. We demonstrated that operations engineers tend to be more efficient by having access to interpretable alert notifications based on detected anomalies that contain information about implications and potential root causes. Additionally, proposed autonomous monitors turned out to be beneficial for the timely identification and revision of potential issues before they propagate and cause severe consequences.
Great job Adha! We are proud of you and the work you do to contribute to the field of software quality assurance. And we look forward to the conference in April!