banner
CedricXu

CedricXu

计科学生 / 摄影爱好者

Frustrating graphics card driver

Introduction#

Installing graphics card drivers is actually very simple.

This article mainly records a frustrating graphics card driver installation process that took more than a month (and eventually found out that the graphics card was broken), as well as some insights I gained during this process.

tri0m

Background#

Continuing from the previous episode, after I installed a computer for 1500 yuan, in order to use my server in my deep learning introductory course, I urgently needed a graphics card. So I ordered a P104-100 from the seafood market, and thus began my nightmare-like journey of installing graphics card drivers.

f84pq

Driver Installation Process#

First Attempt#

At the first opportunity after receiving the graphics card, I tried to install the driver using PPA. First, add the apt repository.

sudo add-apt-repository ppa:micahflee/ppa
sudo apt update

Then use the ubuntu-drivers command to install the recommended driver.

❯ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.1/0000:10:00.0 ==
modalias : pci:v000010DEd00001B87sv000010DEsd00001237bc03sc02i00
vendor   : NVIDIA Corporation
model    : GP104 [P104-100]
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-545 - third-party non-free
driver   : nvidia-driver-525 - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-525-server - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-418-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

It can be seen that nvidia-driver-535 is recommended, so install it directly.

sudo apt install nvidia-driver-455
sudo reboot

Or use

sudo ubuntu-drivers autoinstall

to achieve the same effect.

At this point, the graphics card driver installation process should be completed, and you can now use this graphics card normally (as it turns out, if the graphics card is good, it is indeed like this). However, after running Nvidia-smi, I encountered an error.

❯ nvidia-smi
No devices were found

After searching, I found that it might be a problem with the motherboard, so I checked the motherboard, set the CPU integrated graphics as the default graphics card, and checked that the Above 4G Decoding option was enabled, but it did not solve the problem. So I further suspected that the driver installed by this method was incompatible with the mining card.

Second Attempt#

I uninstalled the previously installed driver and downloaded the .run version driver for GTX1080 from the NVIDIA official website. The installation process of this version of the driver is more complicated. It requires disabling the open-source graphics card driver Nouveau and shutting down the graphical interface, otherwise there will be a black screen on startup. However, this driver did not work either, and it still showed No devices were found.

Third Attempt#

I chose to install Cuda directly and selected the option to automatically install the graphics card driver during the Cuda installation. The advantage of doing this is that Cuda and the driver are installed together, saving time and effort, and there is no need to disable the open-source driver and shut down the graphical interface, simplifying the installation process. However, this method still didn't work.

Fourth Attempt#

I tried various methods to install the driver on Manjaro and Windows systems, but there were still various errors.

Analyzing the Problem#

I didn't immediately suspect that the graphics card was the problem because the unscrupulous sellers in the seafood market promised that the card had undergone stress testing before shipping.

I asked for solutions on the NVIDIA community, but no one responded to me.

e0j9s

I searched through the answers to similar questions on the NVIDIA community, and tried the methods given there, but none of them solved my problem. So I carefully studied the driver installation log and found a line that said RmInitAdapter failed!. After comparing it with similar problems online, I realized that the graphics card might be the problem. So I ordered a new graphics card on a certain online platform and used the bundled installation method of Cuda and the graphics card driver, and it worked!

xyhkl

Finally, the wonderful output of nvidia-smi appeared.

9xk8g

Conclusion#

At this point, this graphics card can be used normally. I want to say that in the seafood market, obviously the vendors cannot be blindly trusted when selling hardware, and when problems are discovered, hardware issues should be suspected more. Don't waste precious time troubleshooting other issues, only to find out in the end that the hardware is faulty. Also, don't disassemble the graphics card before it is tested to be in good condition. I checked the thermal paste status of the graphics card as soon as I received it, which damaged the warranty sticker. When I found out that this graphics card was faulty, I couldn't return it, and could only sell it as a faulty card.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.