a ¬º”h<ã@s&ddlZddlmZddlmZmZmZmZmZm Z ddl Z ddl mZddlm Z ddlmZddlmZmZddlmZmZd d lmZd dlmZgZdZGd d„dejƒZGdd„dejƒZGdd„de jj ej!ƒZ"Gdd„de jj ej!ƒZ#Gdd„dƒZ$Gdd„dƒZ%eGdd„dƒƒZ&eGdd„dƒƒZ'Gdd„dƒZ(eGdd „d e'e&e$eƒƒZ)eGd!d"„d"e'e&e%eƒƒZ*eGd#d$„d$e(e&e$eƒƒZ+eGd%d&„d&e(e&e%eƒƒZ,e+d'ej-d(d)d*Z.d+e._/e,d,ej-d-d)d*Z0d.e0_/e)d/ej-d(d)d0e 1¡d1Z2d2e2_/e*d3ej-d-d)d0e 1¡d1Z3d4e3_/dS)5éN)Ú dataclass)ÚAnyÚDictÚListÚOptionalÚTupleÚUnion)ÚTensor)Úload_state_dict_from_url)Úmu_law_decoding)Ú Tacotron2ÚWaveRNN)Ú GriffinLimÚInverseMelScaleé)Úutils)ÚTacotron2TTSBundlez.https://download.pytorch.org/torchaudio/modelscsLeZdZ‡fdd„Zedd„ƒZeeeefe e e fdœdd„Z‡ZS)Ú_EnglishCharProcessorcs.tƒ ¡t ¡|_dd„t|jƒDƒ|_dS)NcSsi|]\}}||“qS©r)Ú.0ÚiÚsrrúL/var/www/auris/lib/python3.9/site-packages/torchaudio/pipelines/_tts/impl.pyÚ óz2_EnglishCharProcessor.__init__..)ÚsuperÚ__init__rZ _get_charsÚ_tokensÚ enumerateÚ_mapping©Úself©Ú __class__rrrs z_EnglishCharProcessor.__init__cCs|jS©N©rr rrrÚtokenssz_EnglishCharProcessor.tokens©ÚtextsÚreturncs,t|tƒr|g}‡fdd„|Dƒ}t |¡S)Ncs"g|]}‡fdd„| ¡Dƒ‘qS)cs g|]}|ˆjvrˆj|‘qSr©r)rÚcr rrÚ &rz=_EnglishCharProcessor.__call__...)Úlower)rÚtr rrr,&rz2_EnglishCharProcessor.__call__..)Ú isinstanceÚstrrÚ _to_tensor)r!r(Úindicesrr rÚ__call__#s z_EnglishCharProcessor.__call__© Ú__name__Ú __module__Ú__qualname__rÚpropertyr&rr0rrr r3Ú __classcell__rrr"rrs rcsReZdZddœ‡fdd„ Zedd„ƒZeeeefe e e fdœdd „Z‡ZS) Ú_EnglishPhoneProcessorN©Ú dl_kwargscsDtƒ ¡t ¡|_dd„t|jƒDƒ|_tjd|d|_d|_ dS)NcSsi|]\}}||“qSrr)rrÚprrrr.rz3_EnglishPhoneProcessor.__init__..zen_us_cmudict_forward.ptr;z(\[[A-Z]+?\]|[_!'(),.:;? -])) rrrZ_get_phonesrrrZ_load_phonemizerÚ_phonemizerÚ_pattern©r!r<r"rrr+s z_EnglishPhoneProcessor.__init__cCs|jSr$r%r rrrr&2sz_EnglishPhoneProcessor.tokensr'csbt|tƒr|g}g}ˆj|ddD]4}dd„t ˆj|¡Dƒ}| ‡fdd„|Dƒ¡q"t |¡S)NÚen_us)ÚlangcSsg|]}t dd|¡‘qS)z[\[\]]Ú)ÚreÚsub)rÚrrrrr,=rz3_EnglishPhoneProcessor.__call__..csg|]}ˆj|‘qSrr*)rr=r rrr,>r) r/r0r>rDÚfindallr?Úappendrr1)r!r(r2ZphonesÚretrr rr36s z_EnglishPhoneProcessor.__call__r4rrr"rr:*s r:cs@eZdZd eeedœ‡fdd„ Zedd„ƒZddd „Z ‡Z S)Ú_WaveRNNVocoderéœÿÿÿ)ÚmodelÚmin_level_dbcs tƒ ¡d|_||_||_dS)Né"V)rrÚ_sample_rateÚ_modelÚ _min_level_db)r!rLrMr"rrrHs z_WaveRNNVocoder.__init__cCs|jSr$©rOr rrrÚsample_rateNsz_WaveRNNVocoder.sample_rateNcCsŽt |¡}dt tj|dd¡}|jdurL|j||j}tj|ddd}|j ||¡\}}t ||jj ¡}t ||jjƒ}| d¡}||fS)Négñhãˆµøä>)Úminrr)rUÚmax) ÚtorchÚexpÚlog10ÚclamprQrPZinferrZ_unnormalize_waveformZn_bitsrZ n_classesZsqueeze)r!Úmel_specÚlengthsZwaveformrrrÚforwardRs z_WaveRNNVocoder.forward)rK)N)r5r6r7r rÚfloatrr8rSr]r9rrr"rrJGs rJcs2eZdZ‡fdd„Zedd„ƒZddd„Z‡ZS) Ú_GriffinLimVocoderc s@tƒ ¡d|_tdd|jddddd|_tdd d dd|_dS)NrNiéPgg@¿@Zslaney)Zn_stftZn_melsrSZf_minZf_maxZ mel_scaleZnormiré)Zn_fftÚpowerZ hop_lengthZ win_length)rrrOrrSÚ_inv_melrÚ_griffin_limr r"rrr`s" ù üz_GriffinLimVocoder.__init__cCs|jSr$rRr rrrrSssz_GriffinLimVocoder.sample_rateNcCsFt |¡}| ¡ ¡ d¡}| |¡}| ¡ d¡}| |¡}||fS)NTF)rWrXÚcloneÚdetachZrequires_grad_rcrd)r!r[r\ÚspecZ waveformsrrrr]ws z_GriffinLimVocoder.forward)N)r5r6r7rr8rSr]r9rrr"rr__s r_c@seZdZejdœdd„ZdS)Ú _CharMixin©r)cCstƒSr$)rr rrrÚget_text_processor†sz_CharMixin.get_text_processorN©r5r6r7rÚ TextProcessorrjrrrrrh…srhc@s"eZdZddœejdœdd„ZdS)Ú_PhoneMixinNr;ricCs t|dS©Nr;)r:r@rrrrj‹sz_PhoneMixin.get_text_processorrkrrrrrmŠsrmc@s:eZdZUeed<eeefed<ddœedœdd„ZdS)Ú_Tacotron2MixinÚ_tacotron2_pathÚ_tacotron2_paramsNr;ricCsVtfi|j¤Ž}t›d|j›}|dur,in|}t|fi|¤Ž}| |¡| ¡|S©Nú/)rrqÚ _BASE_URLrpr Úload_state_dictÚeval©r!r<rLÚurlZ state_dictrrrÚ get_tacotron2”s z_Tacotron2Mixin.get_tacotron2) r5r6r7r0Ú__annotations__rrrryrrrrros roc@sJeZdZUeeed<eeeefed<ddœdd„Zddœdd„Z dS) Ú _WaveRNNMixinÚ _wavernn_pathÚ_wavernn_paramsNr;cCs|j|d}t|ƒSrn)Ú_get_wavernnrJ)r!r<ZwavernnrrrÚget_vocoder£sz_WaveRNNMixin.get_vocodercCsVtfi|j¤Ž}t›d|j›}|dur,in|}t|fi|¤Ž}| |¡| ¡|Srr)r r}rtr|r rurvrwrrrr~§s z_WaveRNNMixin._get_wavernn) r5r6r7rr0rzrrrr~rrrrr{žs r{c@seZdZdd„ZdS)Ú_GriffinLimMixincKstƒSr$)r_)r!Ú_rrrr²sz_GriffinLimMixin.get_vocoderN)r5r6r7rrrrrr€±sr€c@seZdZdS)Ú_Tacotron2WaveRNNCharBundleN©r5r6r7rrrrr‚»sr‚c@seZdZdS)Ú_Tacotron2WaveRNNPhoneBundleNrƒrrrrr„Àsr„c@seZdZdS)Ú_Tacotron2GriffinLimCharBundleNrƒrrrrr…Åsr…c@seZdZdS)Ú_Tacotron2GriffinLimPhoneBundleNrƒrrrrr†Êsr†z5tacotron2_english_characters_1500_epochs_ljspeech.pthé&)Z n_symbols)rprqaþCharacter-based TTS pipeline with :py:class:`~torchaudio.models.Tacotron2` trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs, and :py:class:`~torchaudio.transforms.GriffinLim` as vocoder. The text processor encodes the input texts character-by-character. You can find the training script `here `__. The default parameters were used. Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage. Example - "Hello world! T T S stands for Text to Speech!" .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

Example - "The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired," .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH_v2.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

z3tacotron2_english_phonemes_1500_epochs_ljspeech.pthé`aèPhoneme-based TTS pipeline with :py:class:`~torchaudio.models.Tacotron2` trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs and :py:class:`~torchaudio.transforms.GriffinLim` as vocoder. The text processor encodes the input texts based on phoneme. It uses `DeepPhonemizer `__ to convert graphemes to phonemes. The model (*en_us_cmudict_forward*) was trained on `CMUDict `__. You can find the training script `here `__. The text processor is set to the *"english_phonemes"*. Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage. Example - "Hello world! T T S stands for Text to Speech!" .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

Example - "The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired," .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH_v2.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

z=tacotron2_english_characters_1500_epochs_wavernn_ljspeech.pthz%wavernn_10k_epochs_8bits_ljspeech.pth)rprqr|r}aCharacter-based TTS pipeline with :py:class:`~torchaudio.models.Tacotron2` trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs and :py:class:`~torchaudio.models.WaveRNN` vocoder trained on 8 bits depth waveform of *LJSpeech* :cite:`ljspeech17` for 10,000 epochs. The text processor encodes the input texts character-by-character. You can find the training script `here `__. The following parameters were used; ``win_length=1100``, ``hop_length=275``, ``n_fft=2048``, ``mel_fmin=40``, and ``mel_fmax=11025``. You can find the training script `here `__. Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage. Example - "Hello world! T T S stands for Text to Speech!" .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_WAVERNN_CHAR_LJSPEECH.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

Example - "The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired," .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_WAVERNN_CHAR_LJSPEECH_v2.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

z;tacotron2_english_phonemes_1500_epochs_wavernn_ljspeech.pthaPhoneme-based TTS pipeline with :py:class:`~torchaudio.models.Tacotron2` trained on *LJSpeech* :cite:`ljspeech17` for 1,500 epochs, and :py:class:`~torchaudio.models.WaveRNN` vocoder trained on 8 bits depth waveform of *LJSpeech* :cite:`ljspeech17` for 10,000 epochs. The text processor encodes the input texts based on phoneme. It uses `DeepPhonemizer `__ to convert graphemes to phonemes. The model (*en_us_cmudict_forward*) was trained on `CMUDict `__. You can find the training script for Tacotron2 `here `__. The following parameters were used; ``win_length=1100``, ``hop_length=275``, ``n_fft=2048``, ``mel_fmin=40``, and ``mel_fmax=11025``. You can find the training script for WaveRNN `here `__. Please refer to :func:`torchaudio.pipelines.Tacotron2TTSBundle` for the usage. Example - "Hello world! T T S stands for Text to Speech!" .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_WAVERNN_PHONE_LJSPEECH.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

Example - "The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired," .. image:: https://download.pytorch.org/torchaudio/doc-assets/TACOTRON2_WAVERNN_PHONE_LJSPEECH_v2.png :alt: Spectrogram generated by Tacotron2 .. raw:: html

)4rDZdataclassesrÚtypingrrrrrrrWr Ztorchaudio._internalr Ztorchaudio.functionalrZtorchaudio.modelsrr Ztorchaudio.transformsrrrCrZ interfacerÚ__all__rtrlrr:ÚnnÚModuleZVocoderrJr_rhrmror{r€r‚r„r…r†Z_get_taco_paramsZ"TACOTRON2_GRIFFINLIM_CHAR_LJSPEECHÚ__doc__Z#TACOTRON2_GRIFFINLIM_PHONE_LJSPEECHZ_get_wrnn_paramsZTACOTRON2_WAVERNN_CHAR_LJSPEECHZ TACOTRON2_WAVERNN_PHONE_LJSPEECHrrrrÚsn & þ# þ( ü% ü