Dark Posted April 24, 2021 Posted April 24, 2021 The DLSS is one of the spearheads of NVIDIA against AMD, the games that support it can achieve higher frame rates at output resolutions where without the use of this technique it would not be possible. This fact has been what has made the GPUs of the NVIDIA RTX ranges the current leaders in the GPU market, but the NVIDIA DLSS has a trap and we will tell you what it is. If we have to talk about the two spearheads of NVIDIA for its GeForce RTX it is clear that they are Ray Tracing and DLSS, the first is no longer an advantage due to the implementation in AMD's RDNA 2, but the second is still a differential element that gives it a great advantage, but not everything is what it seems at first glance. DLSS on RTX depends on Tensor Cores NVIDIA DLSS NSIGHTNVIDIA DLSS NSIGHTNVIDIA DLSS NSIGHTNVIDIA DLSS NSIGHT The first thing we have to take into account is how the different algorithms, commonly called DLSS, take advantage of the console hardware and nothing better than doing an analysis of the operation of the GPU while it is rendering a frame with the DLSS active and without it. The two screenshots you have above these images correspond to the use of the NVIDIA NSight tool, which measures the use of each of the parts of the GPU over time. To interpret the graphs we have to take into account that the vertical axis corresponds to the level of use of that part of the GPU and the horizontal axis the time in which the frame is rendered. As you can see, the difference between both screenshots of the NSight is that in one of them you can see the level of use of each part of the GPU when using the DLSS and in the other not. What is the difference? If we do not look closely we will see that in the one corresponding to the use of the DLSS the graph corresponding to the Tensor Cores is flat except at the end of the graph, which is when these units are activated. DLSS is nothing more than a super-resolution algorithm, which takes an image at a given input resolution and outputs a higher resolution version of the same image in the process. That is why the Tensor Cores when applied are activated last, since they require the GPU to render the image first. DLSS takes up to 3 milliseconds of the time to render a frame, regardless of how fast the game is running. If for example we want to apply the DLSS in games at a frequency of 60 Hz, then the GPU will have to resolve each frame in: (1000 ms / 60 Hz) -3 ms. In other words, in 13.6 ms, in return we will obtain a higher frame rate in the output resolution than we would obtain if we were to render the output resolution natively to the GPU. DLSS Operation example Suppose we have a scene that we want to render at 4K. For this we have an indeterminate GeForce RTX that at said resolution reaches 25 frames per second, so it renders each of these at 40 ms, we know that the same GPU can reach a frame rate of 5o, 20 ms at 1080p. Our hypothetical GeForce RTX takes about 2.5 ms to scale from 1080p to 4K, so if we activate DLSS to get a 4K image from one at 1080p then each frame with DLSS will take 22.5 ms. With this we have been able to render the scene at 44 frames per second, which is greater than the 25 frames that would be obtained rendering at native resolution. On the other hand, if the GPU is going to take more than 3 milliseconds to make the resolution jump then the DLSS will not activate, since it is the time limit set by NVIDIA in its RTX GPUs for them to apply the DLSS algorithms. This makes lower-end GPUs limited in the resolution at which they can run DLSS. DLSS benefits from high speed Tensor Cores The Tensor Cores are essential for the execution of the DLSS, without them it would not be possible to perform at the speed that runs in the NVIDIA RTX, since the algorithm used to perform the increase in resolution is what we call a convolutional neural network, in which Composition we are not going to go into this article, just say that they use a large number of matrix multiplications and tensor units are ideal for calculating with numerical matrices, since they are the type of unit that executes them faster. In the case of a movie today, decoders end up generating the initial image in the image buffer several times faster than the rate at which it is displayed on the screen, so there is more time to scale and therefore you end up requiring much less computing power. In a video game, on the other hand, we do not have it stored on a support as will be the following image, but it has to be generated by the GPU, this cuts the time that the scaler has to work.
Recommended Posts