support multiple GPU training for XTTS #3391

aaron-lii · 2023-12-08T10:29:55Z

Set the parameter in recipes/ljspeech/xtts_v2/train_gpt_xtts.py

OPTIMIZER_WD_ONLY_ON_WEIGHTS = False  # for multi-gpu training please make it False

Now we can run a multi-gpu training using DDP back-end like this:

$ CUDA_VISIBLE_DEVICES="0, 1" python -m trainer.distribute --script recipes/ljspeech/xtts_v2/train_gpt_xtts.py

I ran an experiment with 2 GPUs. Everything seems fine.

I saw a TODO here. But I'm not very familiar with this function. I’m not sure if there will be any problems here compared to single GPU training.

def get_optimizer(self) -> List:
    """Initiate and return the optimizer based on the config parameters."""
    # ToDo: deal with multi GPU training
    if self.config.optimizer_wd_only_on_weights:

ukemamaster · 2024-01-09T10:43:17Z

Hi, @erogol @aaron-lii

I am trying to fine tune the xtts-v2 model using multi-gpu training. The training goes as expected but the GPU memory gradually increases which eventually crashes the training process after a few epochs. I observed that if i reduce the number of test sentences in :

"test_sentences": [
        {
            "text": "Hello, this is test.",
            "speaker_wav": [
                "test_speaker.wav"
            ],
            "language": "en"
        },]

The memory increase becomes slower. i.e., the trainer goes for more number of steps before crash.
Is there a way to avoid that?

NikitaKononov · 2024-07-02T15:07:21Z

Hi, @erogol @aaron-lii

I am trying to fine tune the xtts-v2 model using multi-gpu training. The training goes as expected but the GPU memory gradually increases which eventually crashes the training process after a few epochs. I observed that if i reduce the number of test sentences in :
"test_sentences": [
        {
            "text": "Hello, this is test.",
            "speaker_wav": [
                "test_speaker.wav"
            ],
            "language": "en"
        },]
The memory increase becomes slower. i.e., the trainer goes for more number of steps before crash. Is there a way to avoid that?

Hello, did you found a way to deal with that?

support multiple GPU training

b6e9296

erogol merged commit 934b87b into coqui-ai:dev Dec 12, 2023
53 checks passed

aaron-lii deleted the multi-gpu branch January 4, 2024 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support multiple GPU training for XTTS #3391

support multiple GPU training for XTTS #3391

aaron-lii commented Dec 8, 2023

ukemamaster commented Jan 9, 2024 •

edited

Loading

NikitaKononov commented Jul 2, 2024

support multiple GPU training for XTTS #3391

support multiple GPU training for XTTS #3391

Conversation

aaron-lii commented Dec 8, 2023

ukemamaster commented Jan 9, 2024 • edited Loading

NikitaKononov commented Jul 2, 2024

ukemamaster commented Jan 9, 2024 •

edited

Loading