Skip to content

0.5.0

Latest
Compare
Choose a tag to compare
@R3gm R3gm released this 18 May 13:56
· 11 commits to main since this release
ebea8b1

What's Changed

v0.5.0 by @R3gm in #45

  • Added option Overlap Reduction
  • OpenAI API Key Integration for Transcription, translation, and TTS
  • More output types: subtitles by speaker, separate audio sound, and video only with subtitles
  • Access to a better-performing version of Whisper for transcribing speech on the Hugging Face Whisper page. Copy the repository ID and paste it into the 'Whisper ASR model' section in 'Advanced Settings'; e.g., kotoba-tech/kotoba-whisper-v1.1 for Japanese transcription available here
  • Support for ASS subtitles and batch processing with subtitles
  • Vocal enhancement before transcription
  • Added CPU mode with app_rvc.py --cpu_mode
  • TTS now supports up to 12 speakers
  • OpenVoiceV2 integration for voice imitation
  • PDF to videobook (displays images from the PDF)
  • GUI language translation in Persian and Afrikaans
  • New Language Support:
    • Complete support: Estonian, Macedonian, Malay, Swahili, Afrikaans, Bosnian, Latin, Myanmar Burmese, Norwegian, Traditional Chinese, Assamese, Basque, Hausa, Haitian Creole, Armenian, Lao, Malagasy, Mongolian, Maltese, Punjabi, Pashto, Slovenian, Shona, Somali, Tajik, Turkmen, Tatar, Uzbek, and Yoruba
    • Non-transcription: Aymara, Bambara, Cebuano, Chichewa, Divehi, Dogri, Ewe, Guarani, Iloko, Kinyarwanda, Krio, Kurdish, Kirghiz, Ganda, Maithili, Oriya, Oromo, Quechua, Samoan, Tigrinya, Tsonga, Akan, and Uighur

Full Changelog: 0.4.0...0.5.0