Given the trend of supercomputers accumulating much of their compute power in GPU accelerators composed of thousands of cores and operating in streaming mode, global synchronization points become a bottleneck, severely confining the performance of applications. In consequence, asynchronous methods breaking up the bulk-synchronous programming model are becoming increasingly attractive. In this paper, we study a GPU-focused asynchronous version of the Restricted Additive Schwarz (RAS) method that employs preconditioned Krylov subspace methods as subdomain solvers. We analyze the method for various parameters such as local solver tolerance and iteration counts. Leveraging the multi-GPU architecture on Summit, we show that these two-stage methods are more memory and time efficient than asynchronous RAS using direct solvers. We also demonstrate the superiority over synchronous counterparts, and present results using one-sided CUDA-aware MPI on up to 36 NVIDIA V100 GPUs.