Transcription and translation transforms

VOD transcription and VOD translation transforms are created using the transform endpoint in MK.IO API. They are both created using the #MediaKind.AIPipelinePreset value for @odata.type attribute.

☝️

VOD transcription and VOD translation are only available for MP4 content.

VOD transcription

Configuration parameters

The following table lists the configuration parameters for VOD transcription:

ParameterDescription
@odata.typeThe following value must be used: #MediaKind.AIPipelinePreset
pipeline namePredefined_ACSVodTranscription
languageLanguage spoken in the audio to transcribe
phrasesWords or phrases expected in the audio. This improves recognition by making these terms more likely to be picked up

Transform example

The example below shows how to configure a Transform that transcribes the audio track using AI models. It specifies en-US as the transcription language and uses a custom phrase list to improve recognition accuracy for domain-specific terms.

Once the transform is in place, it can be used to create a job on a given VOD asset.

curl --request PUT \
     --url https://api.mk.io/api/v1/projects/<project_name>/media/transforms/transform_name \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'Authorization: Bearer bearer-token' \
     --data '
{
    "properties": {
        "description": "Transcription en-US",
        "outputs": [
            {
                "preset": {
                    "@odata.type": "#MediaKind.AIPipelinePreset",
                    "pipeline": {
                        "name": "Predefined_ACSVodTranscription",
                        "arguments": {
                            "VodTranscription": [
                                {
                                    "name": "language",
                                    "value": "en-US"
                                },
                                {
                                    "name": "phrases",
                                    "value": [
                                        "Cyperus papyrus",
                                        "Heliotropium indicum",
                                        "Zamioculcas zamiifolia",
                                        "Monstera deliciosa",
                                        "Alocasia odora",
                                        "Tillandsia cyanea",
                                        "Drosera capensis",
                                        "Euphorbia tirucalli",
                                        "Ficus lyrata",
                                        "Calathea orbifolia"
                                    ]
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}
'

VOD translation

Configuration parameters

ParameterDescription
@odata.typeThe following value must be used: #MediaKind.AIPipelinePreset
pipeline namePredefined_ACSVodTranslation
languageLanguage spoken in the audio to transcribe
targetLanguagesSpecify the languages into which the transcription should be translated
phrasesWords or phrases expected in the audio. This improves recognition by making these terms more likely to be picked up

Transform example

The example below demonstrates how to configure a Transform that uses AI models to transcribe and translate the audio track. It transcribes the audio in en-US and translates the output into pt-pt, fr-FR, and es-ES. A custom phrase list is also included to enhance recognition accuracy for domain-specific terms.

Once the transform is in place, it can be used to create a job on a given VOD asset.

curl --request PUT \
     --url https://api.mk.io/api/v1/projects/<project_name>/media/transforms/transform_name \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'Authorization: Bearer bearer-token' \
     --data '
{
    "properties": {
        "description": "Transcription en-US, translation fr-FR pt-PT es-ES",
        "outputs": [
            {
                "preset": {
                    "@odata.type": "#MediaKind.AIPipelinePreset",
                    "pipeline": {
                        "name": "Predefined_ACSVodTranslation",
                        "arguments": {
                            "VodTranscription": [
                                {
                                    "name": "language",
                                    "value": "en-US"
                                },
                                {
                                    "name": "targetLanguages",
                                    "value": [
                                        "pt-pt",
                                        "fr-FR",
                                        "es-ES"
                                    ]
                                },
                                {
                                    "name": "phrases",
                                    "value": [
                                        "Cyperus papyrus",
                                        "Heliotropium indicum",
                                        "Zamioculcas zamiifolia",
                                        "Monstera deliciosa",
                                        "Alocasia odora",
                                        "Tillandsia cyanea",
                                        "Drosera capensis",
                                        "Euphorbia tirucalli",
                                        "Ficus lyrata",
                                        "Calathea orbifolia"
                                    ]
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}
'

Track insertion

Once the VOD Transcription or VOD Translation job is complete, track-insertion Transforms can be used to insert the generated VTT files in a previously encoded asset as the output.