my opinion is my own

Azure Text to Speechで英語テキストをPythonでmp3化する

英語勉強時に文章を読み上げてくれる機能が欲しくて調べてみた。英語に関してはAWS、Azure、GCPのText to Speechサービスの中では一番Azureが流暢に感じる。

参考にした記事

上記からの変更点

text = input('Enter English Text : ')
subscription_key = 'xxxxxxxxxx' # APIキーをいれてください

import requests
import xml.etree.ElementTree as ElementTree
import datetime

dt = datetime.datetime.now()
dt = dt.strftime('%Y%m%d%H%M%S')
output="azure-text-to-speech_"+dt+".mp3"

fetch_token_url = 'https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
headers = {
    'Ocp-Apim-Subscription-Key': subscription_key
}
response = requests.post(fetch_token_url, headers=headers)
access_token = str(response.text)
print(access_token)

constructed_url = 'https://eastus.tts.speech.microsoft.com/cognitiveservices/v1'

headers = {
    'Authorization': 'Bearer ' + access_token,
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': 'audio-16khz-128kbitrate-mono-mp3',
}

xml_body = ElementTree.Element('speak', version='1.0')
xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
voice = ElementTree.SubElement(xml_body, 'voice')
voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
voice.set('name', 'Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)')
prosody = ElementTree.SubElement(voice, 'prosody')
prosody.set('pitch','medium') # high
prosody.set('rate','medium') # fast
prosody.text = text
body = ElementTree.tostring(xml_body)

response = requests.post(constructed_url, headers=headers, data=body)
if response.status_code == 200:
    with open(output, 'wb') as audio:
        audio.write(response.content)
        print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
else:
    print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")
---

関連しているかもしれない記事


#Python #Azure #英語