Journal Article
EPJ Web of Conferences, vol. 214, pp. 06007, 2019
Authors
Malachi Schram, Nathan Tallent, Ryan Friese, Alok Singh, Ilkay Altintas, A. Forti, L. Betev, M. Litmaath, O. Smirnova, P. Hristov
Abstract
In this research, we investigated two approaches to detect job anomalies and/or contention for large scale computing efforts:
1. Preemptive job scheduling using binomial classification long short-term memory networks
2. Forecasting intra-node computing loads from the active jobs and additional job(s)
For approach 1, we achieved a 14% improvement in computational resources utilization and an overall classification accuracy of 85% on real tasks executed in a High Energy Physics computing workflow. For this paper, we present the preliminary results used in second approach.