Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple GPU training for XTTS #3391

Merged
merged 1 commit into from
Dec 12, 2023
Merged

support multiple GPU training for XTTS #3391

merged 1 commit into from
Dec 12, 2023

Conversation

aaron-lii
Copy link
Contributor

Set the parameter in recipes/ljspeech/xtts_v2/train_gpt_xtts.py

OPTIMIZER_WD_ONLY_ON_WEIGHTS = False  # for multi-gpu training please make it False

Now we can run a multi-gpu training using DDP back-end like this:

$ CUDA_VISIBLE_DEVICES="0, 1" python -m trainer.distribute --script recipes/ljspeech/xtts_v2/train_gpt_xtts.py

I ran an experiment with 2 GPUs. Everything seems fine.

I saw a TODO here. But I'm not very familiar with this function. I’m not sure if there will be any problems here compared to single GPU training.

def get_optimizer(self) -> List:
    """Initiate and return the optimizer based on the config parameters."""
    # ToDo: deal with multi GPU training
    if self.config.optimizer_wd_only_on_weights:

@erogol erogol merged commit 934b87b into coqui-ai:dev Dec 12, 2023
53 checks passed
@aaron-lii aaron-lii deleted the multi-gpu branch January 4, 2024 06:04
@ukemamaster
Copy link

ukemamaster commented Jan 9, 2024

Hi, @erogol @aaron-lii

I am trying to fine tune the xtts-v2 model using multi-gpu training. The training goes as expected but the GPU memory gradually increases which eventually crashes the training process after a few epochs. I observed that if i reduce the number of test sentences in :

"test_sentences": [
        {
            "text": "Hello, this is test.",
            "speaker_wav": [
                "test_speaker.wav"
            ],
            "language": "en"
        },]

The memory increase becomes slower. i.e., the trainer goes for more number of steps before crash.
Is there a way to avoid that?

@NikitaKononov
Copy link

Hi, @erogol @aaron-lii

I am trying to fine tune the xtts-v2 model using multi-gpu training. The training goes as expected but the GPU memory gradually increases which eventually crashes the training process after a few epochs. I observed that if i reduce the number of test sentences in :

"test_sentences": [
        {
            "text": "Hello, this is test.",
            "speaker_wav": [
                "test_speaker.wav"
            ],
            "language": "en"
        },]

The memory increase becomes slower. i.e., the trainer goes for more number of steps before crash. Is there a way to avoid that?

Hello, did you found a way to deal with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants