Thanks for reaching out to us. There could be several reasons why you're experiencing this issue. Here are a few troubleshooting steps you can follow:
- Check the VM size: Not all VMs in Azure support GPU acceleration. Make sure that you're using a VM size that supports GPUs. NC6s v3 should support GPUs, so this shouldn't be an issue.
- Check the CUDA version: It's possible that the CUDA version installed on your DSVM is not compatible with the GPU on your VM. You can check the CUDA version with the command
nvcc --version
. The CUDA version should be compatible with the NVIDIA driver version. - Reinstall the NVIDIA driver: You mentioned that you have tried reinstalling the CUDA drivers. You can try reinstalling the NVIDIA drivers as well. Here's how:
- Uninstall the current driver:
sudo apt-get remove --purge nvidia-*
- Update the system:
sudo apt-get update
- Install the NVIDIA driver:
sudo apt-get install nvidia-driver-xxx
(replace xxx with the version you want)
- Install the NVIDIA driver:
- Update the system:
- Uninstall the current driver:
- Check the NVIDIA Kernel Module: Sometimes, the NVIDIA kernel module is not loaded correctly, which can cause issues. You can check if the NVIDIA kernel module is loaded with the command
lsmod | grep nvidia
. If it's not loaded, you can load it with the commandsudo modprobe nvidia
. - Check for any system updates: Sometimes, system updates can cause issues with the NVIDIA drivers. Make sure your system is up to date.
If none of these steps work, then you might want to consider starting from a clean, non-DSVM image and installing the drivers yourself. Make sure to follow the official NVIDIA installation guides to ensure that the drivers are installed correctly.
I hope this helps.
Regards,
Yutong