After discussion, we decide to focus on manipulating K-steps(the depth of the diffusion). In Openvpi version, it mentions that the empirical values of K-steps are usually between 300-400. And in the original Diffsinger paper, 54 is the chosen value. We plan to train a series of models with different K-steps to see the potential pattern.
To find the best K-steps, we plan to adapt KL-divergence mentioned as a shortcut in the paper, or we will figure out a dynamic way to determine the K.
We also plan to work on the inference evaluation as the original paper used a subjective method asking serval people to give scores. Some benchmarks we are thinking about are:
- note
- noise
- duration
- pitch
We started to deploy the Openvpi’s version of Diffsinger on Google Cloud. And this detailed guide (in Chinese) helped us a lot, it contains lots of configurations in the project.
Diffsinger provides a nice TensorBoardX integration, you can monitor graphs and audio files on the dashboard.
# start tensorboard on VM, it should be on port 6006 in default
tensorboard --logdir ~/DiffSinger
# run this on your local machine
# map VM localhost to local
gcloud compute ssh your-vm --zone your-zone --project your-project -- -L 8080:localhost:6006 -N -f
In the setup process, I had no idea why the training didn’t use GPU
at all, so I reinstalled CUDA
to fix it. Here are some useful commands.
# check if you have CUDA and if GPU works
nvcc --version
nvidia-smi
Python scripts to check if CUDA
is working:
python
import torch
print(torch.cuda.is_available())
For some reason, after running requirements installation, it cannot find torchaudio
. If you have the same issue, try install torch
packages separately.
# install envrionment
conda create -n openvpi python=3.8
conda activate openvpi
pip install torch torchvision torchaudio
pip install -r requirements.txt
# binarize before training, if you update config file, redo this step
python scripts/binarize.py --config path/to/config_acoustic.yaml
# training
python scripts/train.py --config path/to/config_acoustic.yaml --exp_name your/model/dir --reset
# inference
python scripts/infer.py acoustic samples/your-DS-file.ds --exp your/model --out your-output-wav-name