# Manually mixing Track Recording files

> \[!WARNING]
>
> Twilio recommends [Compositions](/docs/video/api/compositions-resource) to mix all track recordings from a single Room, as it is a refined product that solves the various edge cases of mixing recordings with different start or offset times. Only use the approach described here if you have a business reason for not using Compositions. The transcoding tools described below are not Twilio products and as such we will not provide support for errors or problems that may arise when developers try to compose Recordings with the following procedure.

When you record a Twilio Video Room, you will most likely end up with multiple individual track recordings once the room is completed. By default, when you record a Room, Twilio records all participants' audio and video tracks and creates individual recordings for each one of those tracks. This provides the flexibility to compose into a single video output and gives control over which particular tracks will be displayed in the final composition.

> \[!NOTE]
>
> If you don't want to automatically record the audio and video tracks for all participants in a Room, you can use [Recording Rules](/docs/video/api/recording-rules) to specify the tracks you want to record. Also, you can [configure your Room default settings in the Twilio Console](https://www.twilio.com/console/video/configure). Here you can set the rooms to default to Rooms and to turn recording on/off.

Twilio offers [Compositions](/docs/video/api/compositions-resource), a service that creates playable files from Recordings and automatically takes into account the Recordings' timing variations. However, if your use case does not fit the Compositions product, you can manually mix recordings into a single playable file.

If you choose not to use Compositions, there are several factors you will need to consider when manually mixing the recordings together into a single output. In particular, you will need to take into account each recordings' `start_time` and `offset` values, as these might differ for each participant and cause synchronization issues when mixed.

In this tutorial, you'll learn how to synchronize two participants' track recordings, each with different `start_time` and `offset` values. The output will be a single video in either `webm` or `mp4` format, with the participants' videos side-by-side in a 2x1 grid.

## Tutorial requirements

* `ffmpeg` for mixing Recordings into a single file. [Download ffmpeg](https://ffmpeg.org/download.html). Learn more about [ffmpeg](https://ffmpeg.org/ffmpeg.html).
  * To create output files in `mp4` format, compile a version of `ffmpeg` that includes the `libfdk_aac` audio codec. Learn more about [Advanced Audio Coding (AAC)](https://trac.ffmpeg.org/wiki/Encode/AAC).
* `ffprobe` to gather information about each Recording's start time. [Click here](https://ffmpeg.org/ffprobe.html) for the official documentation.
* The SID of a Twilio Video Room with two participants where both participants' audio and video tracks were recorded. Learn more about [creating recordings](/docs/video/tutorials/understanding-video-recordings-and-compositions#working-with-video-recordings).

## Background

After you have recorded a Room, you might want to merge all of the recorded tracks into a single playable file so you can review the full contents of the Room. If you merge recorded tracks without considering their different `start_time` or `offset` values, the output will not be synchronized.

There are several reasons tracks from the same Room might have different `start_time` and `offset` values, such as:

* Participants entering the Room at different points in time. You can see this when there are different start times in the recordings' metadata.
* A Room crashing in the middle of a call. You can see this with the `offset` in the recordings' metadata. The offset is the time in milliseconds elapsed between an arbitrary point in time, common to all Rooms, and the moment when the source Room of this track started.

The example for this tutorial will be a scenario in which you want to mix Recordings from a Room with two participants. In this scenario:

* Alice and Bob were both participants in the same Room.
* Alice joined the Room when it started, but Bob entered roughly 20 seconds after it started.
* Both Alice and Bob have different `offset` values.
* The video and audio tracks for both Bob and Alice were recorded, so there are four recordings for the Room once the Room has completed:
  * Alice's audio
  * Alice's video
  * Bob's audio
  * Bob's video

**Mixing both Alice's and Bob's tracks together without taking into account the different `start_time` and `offset` values will result in a media file with synchronization issues, where Alice and Bob's tracks are not playing at the proper times.**

The output file this tutorial produces will mix the two video and two audio tracks and ensure they are correctly synchronized. The video tracks will be placed side by side in a 2x1 grid layout, with a resolution of `1024x768`.

## 1. Find the Recording SIDs for the Room

First, you will need to find the SID for each of the recordings you would like to mix. You can do this [via the REST API](/docs/video/api/recordings-resource#get-instance). Below is the API Call to retrieve the recording SIDs. (Note that you should pass the Room SID in the `GroupingSid` argument as an array with a single item.) You will need these recording SIDs in the next step.

> \[!NOTE]
>
> Click "Show Sample Response" in the bottom left corner of the code samples below to see the JSON response that would be returned from making the API calls. In this example, you should retrieve the `sid` for each of the recordings.

Retrieve a list of all Recordings for a Room

```js
// Download the helper library from https://www.twilio.com/docs/node/install
const twilio = require("twilio"); // Or, for ESM: import twilio from "twilio";

// Find your Account SID and Auth Token at twilio.com/console
// and set the environment variables. See http://twil.io/secure
const accountSid = process.env.TWILIO_ACCOUNT_SID;
const authToken = process.env.TWILIO_AUTH_TOKEN;
const client = twilio(accountSid, authToken);

async function listRecording() {
  const recordings = await client.video.v1.recordings.list({
    groupingSid: ["RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"],
    limit: 20,
  });

  recordings.forEach((r) => console.log(r.accountSid));
}

listRecording();
```

```python
# Download the helper library from https://www.twilio.com/docs/python/install
import os
from twilio.rest import Client

# Find your Account SID and Auth Token at twilio.com/console
# and set the environment variables. See http://twil.io/secure
account_sid = os.environ["TWILIO_ACCOUNT_SID"]
auth_token = os.environ["TWILIO_AUTH_TOKEN"]
client = Client(account_sid, auth_token)

recordings = client.video.v1.recordings.list(
    grouping_sid=["RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"], limit=20
)

for record in recordings:
    print(record.account_sid)
```

```csharp
// Install the C# / .NET helper library from twilio.com/docs/csharp/install

using System;
using Twilio;
using Twilio.Rest.Video.V1;
using System.Threading.Tasks;
using System.Collections.Generic;

class Program {
    public static async Task Main(string[] args) {
        // Find your Account SID and Auth Token at twilio.com/console
        // and set the environment variables. See http://twil.io/secure
        string accountSid = Environment.GetEnvironmentVariable("TWILIO_ACCOUNT_SID");
        string authToken = Environment.GetEnvironmentVariable("TWILIO_AUTH_TOKEN");

        TwilioClient.Init(accountSid, authToken);

        var recordings = await RecordingResource.ReadAsync(
            groupingSid: new List<string> { "RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" }, limit: 20);

        foreach (var record in recordings) {
            Console.WriteLine(record.AccountSid);
        }
    }
}
```

```java
// Install the Java helper library from twilio.com/docs/java/install

import java.util.Arrays;
import com.twilio.Twilio;
import com.twilio.rest.video.v1.Recording;
import com.twilio.base.ResourceSet;

public class Example {
    // Find your Account SID and Auth Token at twilio.com/console
    // and set the environment variables. See http://twil.io/secure
    public static final String ACCOUNT_SID = System.getenv("TWILIO_ACCOUNT_SID");
    public static final String AUTH_TOKEN = System.getenv("TWILIO_AUTH_TOKEN");

    public static void main(String[] args) {
        Twilio.init(ACCOUNT_SID, AUTH_TOKEN);
        ResourceSet<Recording> recordings =
            Recording.reader().setGroupingSid(Arrays.asList("RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")).limit(20).read();

        for (Recording record : recordings) {
            System.out.println(record.getAccountSid());
        }
    }
}
```

```go
// Download the helper library from https://www.twilio.com/docs/go/install
package main

import (
	"fmt"
	"github.com/twilio/twilio-go"
	video "github.com/twilio/twilio-go/rest/video/v1"
	"os"
)

func main() {
	// Find your Account SID and Auth Token at twilio.com/console
	// and set the environment variables. See http://twil.io/secure
	// Make sure TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN exists in your environment
	client := twilio.NewRestClient()

	params := &video.ListRecordingParams{}
	params.SetGroupingSid([]string{
		"RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	})
	params.SetLimit(20)

	resp, err := client.VideoV1.ListRecording(params)
	if err != nil {
		fmt.Println(err.Error())
		os.Exit(1)
	} else {
		for record := range resp {
			if resp[record].AccountSid != nil {
				fmt.Println(*resp[record].AccountSid)
			} else {
				fmt.Println(resp[record].AccountSid)
			}
		}
	}
}
```

```php
<?php

// Update the path below to your autoload.php,
// see https://getcomposer.org/doc/01-basic-usage.md
require_once "/path/to/vendor/autoload.php";

use Twilio\Rest\Client;

// Find your Account SID and Auth Token at twilio.com/console
// and set the environment variables. See http://twil.io/secure
$sid = getenv("TWILIO_ACCOUNT_SID");
$token = getenv("TWILIO_AUTH_TOKEN");
$twilio = new Client($sid, $token);

$recordings = $twilio->video->v1->recordings->read(
    ["groupingSid" => ["RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"]],
    20
);

foreach ($recordings as $record) {
    print $record->accountSid;
}
```

```ruby
# Download the helper library from https://www.twilio.com/docs/ruby/install
require 'twilio-ruby'

# Find your Account SID and Auth Token at twilio.com/console
# and set the environment variables. See http://twil.io/secure
account_sid = ENV['TWILIO_ACCOUNT_SID']
auth_token = ENV['TWILIO_AUTH_TOKEN']
@client = Twilio::REST::Client.new(account_sid, auth_token)

recordings = @client
             .video
             .v1
             .recordings
             .list(
               grouping_sid: [
                 'RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
               ],
               limit: 20
             )

recordings.each do |record|
   puts record.account_sid
end
```

```bash
# Install the twilio-cli from https://twil.io/cli

twilio api:video:v1:recordings:list \
   --grouping-sid RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
```

```bash
curl -X GET "https://video.twilio.com/v1/Recordings?GroupingSid=RMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&PageSize=20" \
-u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN
```

```json
{
  "recordings": [
    {
      "account_sid": "ACaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      "status": "completed",
      "date_created": "2015-07-30T20:00:00Z",
      "sid": "RTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      "source_sid": "MTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      "size": 23,
      "type": "audio",
      "duration": 10,
      "container_format": "mka",
      "codec": "opus",
      "track_name": "A name",
      "offset": 10,
      "status_callback": "https://mycallbackurl.com",
      "status_callback_method": "POST",
      "grouping_sids": {
        "room_sid": "RMaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
        "participant_sid": "PAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
      },
      "media_external_location": "https://my-super-duper-bucket.s3.amazonaws.com/my/path/",
      "url": "https://video.twilio.com/v1/Recordings/RTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
      "links": {
        "media": "https://video.twilio.com/v1/Recordings/RTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/Media"
      }
    }
  ],
  "meta": {
    "page": 0,
    "page_size": 50,
    "first_page_url": "https://video.twilio.com/v1/Recordings?Status=completed&DateCreatedAfter=2017-01-01T00%3A00%3A01Z&DateCreatedBefore=2017-12-31T23%3A59%3A59Z&SourceSid=MTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&MediaType=audio&GroupingSid=RMaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&PageSize=50&Page=0",
    "previous_page_url": null,
    "url": "https://video.twilio.com/v1/Recordings?Status=completed&DateCreatedAfter=2017-01-01T00%3A00%3A01Z&DateCreatedBefore=2017-12-31T23%3A59%3A59Z&SourceSid=MTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&MediaType=audio&GroupingSid=RMaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&PageSize=50&Page=0",
    "next_page_url": null,
    "key": "recordings"
  }
}
```

## 2. Retrieve the offset for each recording

The next step is to extract the offset value for each of the four recordings via its metadata. You can do this [using the REST API](/docs/video/api/recordings-resource#get-by-sid).

Keep track of these offsets, as you will need them in a later step.

Retrieve the offset for each track's Recording

```js
// Download the helper library from https://www.twilio.com/docs/node/install
const twilio = require("twilio"); // Or, for ESM: import twilio from "twilio";

// Find your Account SID and Auth Token at twilio.com/console
// and set the environment variables. See http://twil.io/secure
const accountSid = process.env.TWILIO_ACCOUNT_SID;
const authToken = process.env.TWILIO_AUTH_TOKEN;
const client = twilio(accountSid, authToken);

async function fetchRecording() {
  const recording = await client.video.v1
    .recordings("RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
    .fetch();

  console.log(recording.offset);
}

fetchRecording();
```

```python
# Download the helper library from https://www.twilio.com/docs/python/install
import os
from twilio.rest import Client

# Find your Account SID and Auth Token at twilio.com/console
# and set the environment variables. See http://twil.io/secure
account_sid = os.environ["TWILIO_ACCOUNT_SID"]
auth_token = os.environ["TWILIO_AUTH_TOKEN"]
client = Client(account_sid, auth_token)

recording = client.video.v1.recordings(
    "RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
).fetch()

print(recording.offset)
```

```csharp
// Install the C# / .NET helper library from twilio.com/docs/csharp/install

using System;
using Twilio;
using Twilio.Rest.Video.V1;
using System.Threading.Tasks;

class Program {
    public static async Task Main(string[] args) {
        // Find your Account SID and Auth Token at twilio.com/console
        // and set the environment variables. See http://twil.io/secure
        string accountSid = Environment.GetEnvironmentVariable("TWILIO_ACCOUNT_SID");
        string authToken = Environment.GetEnvironmentVariable("TWILIO_AUTH_TOKEN");

        TwilioClient.Init(accountSid, authToken);

        var recording =
            await RecordingResource.FetchAsync(pathSid: "RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");

        Console.WriteLine(recording.Offset);
    }
}
```

```java
// Install the Java helper library from twilio.com/docs/java/install

import com.twilio.Twilio;
import com.twilio.rest.video.v1.Recording;

public class Example {
    // Find your Account SID and Auth Token at twilio.com/console
    // and set the environment variables. See http://twil.io/secure
    public static final String ACCOUNT_SID = System.getenv("TWILIO_ACCOUNT_SID");
    public static final String AUTH_TOKEN = System.getenv("TWILIO_AUTH_TOKEN");

    public static void main(String[] args) {
        Twilio.init(ACCOUNT_SID, AUTH_TOKEN);
        Recording recording = Recording.fetcher("RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX").fetch();

        System.out.println(recording.getOffset());
    }
}
```

```go
// Download the helper library from https://www.twilio.com/docs/go/install
package main

import (
	"fmt"
	"github.com/twilio/twilio-go"
	"os"
)

func main() {
	// Find your Account SID and Auth Token at twilio.com/console
	// and set the environment variables. See http://twil.io/secure
	// Make sure TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN exists in your environment
	client := twilio.NewRestClient()

	resp, err := client.VideoV1.FetchRecording("RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
	if err != nil {
		fmt.Println(err.Error())
		os.Exit(1)
	} else {
		if resp.Offset != nil {
			fmt.Println(*resp.Offset)
		} else {
			fmt.Println(resp.Offset)
		}
	}
}
```

```php
<?php

// Update the path below to your autoload.php,
// see https://getcomposer.org/doc/01-basic-usage.md
require_once "/path/to/vendor/autoload.php";

use Twilio\Rest\Client;

// Find your Account SID and Auth Token at twilio.com/console
// and set the environment variables. See http://twil.io/secure
$sid = getenv("TWILIO_ACCOUNT_SID");
$token = getenv("TWILIO_AUTH_TOKEN");
$twilio = new Client($sid, $token);

$recording = $twilio->video->v1
    ->recordings("RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
    ->fetch();

print $recording->offset;
```

```ruby
# Download the helper library from https://www.twilio.com/docs/ruby/install
require 'twilio-ruby'

# Find your Account SID and Auth Token at twilio.com/console
# and set the environment variables. See http://twil.io/secure
account_sid = ENV['TWILIO_ACCOUNT_SID']
auth_token = ENV['TWILIO_AUTH_TOKEN']
@client = Twilio::REST::Client.new(account_sid, auth_token)

recording = @client
            .video
            .v1
            .recordings('RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
            .fetch

puts recording.offset
```

```bash
# Install the twilio-cli from https://twil.io/cli

twilio api:video:v1:recordings:fetch \
   --sid RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
```

```bash
curl -X GET "https://video.twilio.com/v1/Recordings/RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
-u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN
```

```json
{
  "account_sid": "ACaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "status": "processing",
  "date_created": "2015-07-30T20:00:00Z",
  "sid": "RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "source_sid": "MTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "size": 0,
  "url": "https://video.twilio.com/v1/Recordings/RTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "type": "audio",
  "duration": 0,
  "container_format": "mka",
  "codec": "opus",
  "track_name": "A name",
  "offset": 10,
  "status_callback": "https://mycallbackurl.com",
  "status_callback_method": "POST",
  "grouping_sids": {
    "room_sid": "RMaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
  },
  "media_external_location": "https://my-super-duper-bucket.s3.amazonaws.com/my/path/",
  "links": {
    "media": "https://video.twilio.com/v1/Recordings/RTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/Media"
  }
}
```

## 3. Download each recording

Next, you should download each recording. You can do this [via the REST API](/docs/video/api/recordings-resource#get-media-by-sid). The following curl command retrieves the URL that you can use to download the media content of a Recording.

```bash
curl 'https://video.twilio.com/v1/Recordings/RTXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Media' \
  -u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN
```

You will get back a JSON response that contains a `redirect_to` URL, similar to the response below. Go to this URL to download the recording file.

```json
{"redirect_to": "https://com-twilio-us1-video-recording..."}
```

The audio files you download will be in `.mka` format and the video files will be in `.mkv` format.

## 4. Find the start time of the Recordings with ffprobe

At this point, you should have four recordings downloaded on your machine, as well as the offset values for each of these recordings.

This next step uses `ffprobe` to retrieve the `start_time` for each recording. You will need to perform this step on each recording.

Below is an example of how to get Alice's audio `start_time` using the following `ffprobe` command:

```bash
ffprobe -show_entries format=start_time alice.mka
```

The output will look similar to the output below, and it will include the `start_time`:

```bash
# !mark(9)
Input #0, matroska,webm, from 'alice.mka':
Metadata:
    encoder         : GStreamer matroskamux version 1.8.1.1
    creation_time   : 2017-06-30T09:03:44.000000Z
Duration: 00:13:09.36, start: 1.564000, bitrate: 48 kb/s
    Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
        title           : Audio
start_time=1.564000
```

After retrieving both the `offset` from the Recordings metadata and `start_time` from the `ffprobe` on the Recording media, you can create a table like the one below. (The `creation_time` will also be in the output of the `ffprobe` command above; we are referencing it below to demonstrate that it is not the correct value to use when mixing tracks. It is not needed in any of the following steps and will be removed from the table going forward.)

A table of each recording's offset and start time, along with its creation time.

| Track Name | offset (in ms) | start\_time (in ms) | creation\_time              |
| ---------- | -------------- | ------------------- | --------------------------- |
| alice.mka  | 163481005731   | 1564                | 2017-06-30T09:03:44.000000Z |
| alice.mkv  | 163481005731   | 1584                | 2017-06-30T09:03:44.000000Z |
| bob.mka    | 163481005732   | 20789               | 2017-06-30T09:04:03.000000Z |
| bob.mkv    | 163481005732   | 20814               | 2017-06-30T09:04:03.000000Z |

The `start_time` and `offset` for a participant's audio and video are not required to be the same. This can happen in the scenario of a Room recovery. You can also see the approximate 20 seconds that Alice was in the room before Bob reflected in the `start_time`s of each participant's recordings.

It is important to use `start_time` as reference and not `creation_time`. The recording's `creation_time` is the time that the user joined the call, but the `start_time` refers to when the first sample of data was received for the recording. Additionally, `creation_time` does not have millisecond precision and could lead to synchronization issues.

## 5. Set the reference and relative offsets

Next, you will need to calculate the relative offset of each track so that the tracks will be synchronized. To calculate the relative offset:

1. Add the `offset` to the `start_time`. In the sample table below, we store this value in the `Addition` column.
2. Take the lowest `Addition` value of all tracks. In the sample, that's `alice.mka`, with an `Addition` value of 163481007295. Copy this value into every row of the `Reference Value` column, as you will need to reference it in the next step.
3. Subtract the `Reference Value` from the `Addition` value for each recording to create the `relative_offset` in milliseconds. You will need the `relative_offset` value when mixing the tracks together.

The following table shows the current values for our tracks, in which `alice.mka` is the reference value with 163481007295.

A table of each recording's offset, start time, and calculated fields for determining the relative offset.

| Track Name | offset (in ms) | start\_time (in ms) | Addition     | Reference Value | relative\_offset (in ms) |
| ---------- | -------------- | ------------------- | ------------ | --------------- | ------------------------ |
| alice.mka  | 163481005731   | 1564                | 163481007295 | 163481007295    | 0                        |
| alice.mkv  | 163481005731   | 1584                | 163481007315 | 163481007295    | 20                       |
| bob.mka    | 163481005732   | 20789               | 163481026521 | 163481007295    | 19226                    |
| bob.mkv    | 163481005732   | 20814               | 163481026546 | 163481007295    | 19251                    |

## 6. Mix the tracks with ffmpeg

The final step is to mix all the tracks in a single file. The command will:

* Keep video and audio tracks in synchronization
* Enable you to change the output video resolution
* Pad the video tracks to keep the aspect ratio of the original videos

### webm format

Below is the complete command to obtain the mixed file in `webm` format with a 1024x768 (width x height) resolution. It's a long command! You can see an explanation for each section below.

```bash
ffmpeg -i alice.mkv -i bob.mkv -i alice.mka -i bob.mka \
    -filter_complex " \
    [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0], \
    color=black:size=512x768:duration=0.020[b0], \
    [b0][vs0]concat[r0c0]; \
    [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1], \
    color=black:size=512x768:duration=19.251[b1], \
    [b1][vs1]concat[r0c1]; \
    [r0c0][r0c1]hstack=inputs=2[video]; \
    [2]aresample=async=1[a0]; \
    [3]aresample=async=1,adelay=19226.0|19226.0[a1]; \
    [a0][a1]amix=inputs=2[audio]" \
    -map '[video]' \
    -map '[audio]' \
    -acodec libopus \
    -vcodec libvpx \
    output.webm
```

#### 1. Input files

```bash
ffmpeg -i alice.mkv -i bob.mkv -i alice.mka -i bob.mka \
```

In the first line of this command, you specify the input files, which are the four recordings.

#### 2. Apply scales for video and delay to video and audio

The following section breaks down each line of the filter operation.

```bash
-filter_complex <script>
```

a. This will perform the filter operation specified in the following string.

```bash
"[0]scale=<half of width>:-2,pad=<half of width>:<height>:(ow-iw)/2:(oh-ih)/2[vs0]
```

b. This section selects the first input file (here, Alice's video) and scales it to half the width of the desired resolution (512) while maintaining the original aspect ratio. Additionally, it pads the scaled video (`pad`) and tags it `[vs0]`.

```bash
color=black:size=<half of width>x<height>:duration=<relative offset in seconds>[b0],\
```

c. The next step is to generate black frames for the duration of the track's `relative_offset` (which you calculated step 5), in seconds. This is intended to delay the track to keep it in sync with the other recordings.

```bash
[b0][vs0]concat[r0c0];\
```

d. This step concatenates the black stream `[b0]` with the padded stream `[vs0]`, and tags it as `[r0c0]`. Then it concatenates the black frames with the padded frames.

```bash
[1]scale=<half of width>:-2,pad=<half of width>:<height>:(ow-iw)/2:(oh-ih)/2[vs1],\
```

e. This step is the same as step **b**, repeated for the second input file (Bob's video). The output of this line is tagged as `[vs1]`.

```bash
color=black:size=<half of width>x<height>:duration=<relative offset in seconds>[b1],
```

f. This step is the same as step **c**, except the `duration` should be set to the `relative_offset`, in seconds, that you calculated for the second participant's video recording. In this example, it's 19.226. The output of this line is tagged as `[b1]`.

```bash
[b1][vs1]concat[r0c1]
```

g. This is the same as step **d**. It concatenates the black stream `[b1]` with the padded stream `[vs1]`, and tags it as `[r0c1]`.

```bash
[r0c0][r0c1]hstack=inputs=2[video]
```

h. This line configures the filter that will perform the horizontal video stacking (creating the 2x1 video grid). In this example, there are two video tracks, which is why the argument is `2[video]`.

```bash
[2]aresample=async=1[a0];\
```

i. This line resamples the first audio input track (Alice's audio, which was the input at index `[2]` in the input list). `resample` fills and trims the audio track if needed (see more information in the [resampler docs](https://ffmpeg.org/ffmpeg-resampler.html#Resampler-Options)). The resampled audio is tagged as `[a0]`.

```bash
[3]aresample=async=1,adelay=19226.0|19226.0[a1];\
```

j. This line similarly resamples the second audio input track, which in this example is Bob's audio. Here, the relative offset was 19226 ms. `adelay` specifies the audio delay for both left and right channels in milliseconds. The resampled and delayed audio is tagged as `[a1]`.

```bash
[a0][a1]amix=inputs=2[audio]" \
```

k. This configures the filter that will perform the audio mixing. In this sample case, there are two tracks, so the argument is `2[audio]`. This is the final line of the filter script.

#### 3. Output definition

Below are the commands used to produce the output:

```bash
-map '[video]'
```

a. This selects the stream marked as `video` to be used in the output

```bash
-map '[audio]'
```

b. This selects the stream marked as `audio` to be used in the output

```bash
-acodec libopus
```

c. The audio codec to use. For `mp4` use `libfdk_aac`. (See the note in [Requirements](#tutorial-requirements) about compiling a version of `ffmpeg` with `libfdk_aac` if you want to create an `mp4` output file.)

```bash
-vcodec libvpx
```

d. The video codec to use. For `mp4` use `libx264`.

```bash
output.webm
```

e. The output file name

### mp4 output

The following command would produce an output file in `mp4` format. The command follows the same format as the `webm` command above, with a few alterations:

* The audio codec for the output is `libfdk_aac` and the video codec is `libx264`.
* There is an added `-vsync 2 \` line immediately following the `-map '[audio]'` line. This line works with the `libx264` video encoder.
* The final output file is called `output.mp4`.

```bash
ffmpeg -i alice.mkv -i bob.mkv -i alice.mka -i bob.mka \
    -filter_complex "\
    [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],\
    color=black:size=512x768:duration=0.020[b0],\
    [b0][vs0]concat[r0c0];\
    [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],\
    color=black:size=512x768:duration=19.251[b1],\
    [b1][vs1]concat[r0c1];\
    [r0c0][r0c1]hstack=inputs=2[video];\
    [2]aresample=async=1[a0];\
    [3]aresample=async=1,adelay=19226.0|19226.0[a1];\
    [a0][a1]amix=inputs=2[audio]" \
    -map '[video]' \
    -map '[audio]' \
    -vsync 2 \
    -acodec libfdk_aac \
    -vcodec libx264 \
    output.mp4
```

## Additional resources to personalize the composed media

There are many situations where developers want to know the start, end, or duration of a track. For example, if you would like to concatenate black frames after the video track ends, you would need to know the `start` and `end` of the media track. In order to find these values, you can leverage `ffprobe`.

The examples below demonstrate how to use `ffprobe` to find the start time, end time, and duration of a track. The examples below use the example video track `alice.mkv`.

### Find the start\_time in milliseconds

```bash
ffprobe -i alice.mkv -show_frames 2>/dev/null | head -n 30 | grep -w pkt_dts | grep -Eo '[0-9]+'
```

This command outputs the start time of `alice.mkv`, which is 1564 ms.

### Find the end\_time in milliseconds

```bash
ffprobe -i alice.mkv -show_frames 2>/dev/null | tail -n 30 | grep -w pkt_dts | grep -Eo '[0-9]+'
```

This command outputs the end time of `alice.mkv`, which is 142242 ms.

### Duration in milliseconds of video track

The duration of the track is the difference between the `end_time` (142242 ms) and `start_time` (1564 ms), which results in a duration of 140678 ms.
