Getting your head around SIP and RTP traffic flows is a little daunting at first, but its actually not all that complicated when you understand the purpose of the protocols.
As its name implies, the Session Initiation Protocol is used to initiate a session between two endpoints. SIP does not carry any voice or video data itself - it merely allows two endpoints to set up connection to transfer that traffic between each other via the Real-time Transport Protocol (RTP).
The SIP protocol can be, and usually is, routed through one or more SIP proxy servers before reaching its destination. It is very similar to how email is transmitted, in that multiple email servers are usually involved in the delivery process, each forwarding the message in its original form. Each email server adds a Received header to the message, to track the route the message has taken. SIP uses a Via header to track the SIP proxies that the message has passed through to get to its destination.
SIP uses a very similar message format to HTTP. They are both human-readable, and use similar (if not the same) error codes. For example, both HTTP and SIP use 408 as the error code to signal a timeout error, 404 for 'not found', etc. Using wireshark, you can capture SIP packets and read the content of them.
Here is a breakdown of the structure of a SIP packet (Click to enlarge).
1. This shows the source and destination IP addresses of the SIP packet. Note this information will change as the packet passes between SIP proxy servers.
2. Transport Protocol and port. In this case, this is a SIP/UDP packet being sent to port 5060 (the standard SIP port)
3. This is the SIP Request header that tells us what type of SIP message this is. This particular packet is a SIP INVITE request for extension 401 @ asterisk.lithnet.local
4. The Via header contains a list of all SIP proxy servers that this packet has passed through, including the initiating client
5. The To header specifies the SIP packet's destination
6. The From header specified who sent the SIP packet
7. This particular packet is a SIP/SD packet, meaning it contains a Session Description Protocol message that contains information the remote client needs to open an RTP session for this call
8. The IP address of the SIP client that created this packet
9. The IP address the destination SIP client should contact to open an RTP session. It also specifies the IP Address version (IPv4 or IPv6)
10. The key pieces of information in this header are audio, 33438, and RTP/AVP. The audio component obviously signifies that this is an audio call, 33438 specifies the port that the remote computer should open at the IP address specified in (9), and RTP/AVP specifies that the Real-time Transport Protocol will be used for the session. The numbers at the end of this header represent the different codecs that this client supports. The SIP client at the other end must support one of the matching protocols in order to be able to make a successful connection.
Unlike SIP, which listens on port 5060 (usually UDP, but can be TCP), RTP uses a dynamic port range (and is only ever UDP), generally between 10000-20000. This range can usually be customized on the client to suit differing firewall configurations.
Now while SIP traffic passes from one server to the next to get to its destination, RTP sessions are set up directly between SIP clients (There is an exception to this rule, that I will explain shortly).
Here is an easy way to think of this. I want to call Bob on the phone, but I don't know Bob's number. I do however have his email address. So I send Bob an email telling him to call me on my phone number. The email passes through several servers and eventually arrives at Bob's inbox. Bob reads the email containing my phone number, picks up the phone, and calls me. We can then begin our audio conversation with each other. The email was used to help us set up a phone conversation, and after that it was no longer needed. Our phone call did not have to pass through the servers my email passed through to get to him, because they are two separate systems. The email in this example is analogous to a SIP packet, the phone call is our RTP session.
Now SIP is a good protocol, but things kind of break down when NAT gets involved. SIP packets themselves tend to move about without too much trouble (generally), as they 'hop' from one server to another. RTP sessions are somewhat more troublesome. Either both clients need to be aware they are behind a NAT, and substitute their local IP addresses for their public IPs in their Session Description messages and open the appropriate firewall ports, or something has to modify the SIP packets en route.
This is where the exception to the rule that I mentioned comes into play. Products known as Back-to-Back User Agents, one of the most well known being Asterisk, can can actually proxy RTP traffic.
Asterisk can modify SIP packets to direct the caller and destination to establish an RTP session with itself, rather than with each other. This is useful in situations where two SIP clients may not have direct access to each other, most commonly, when one or both of the SIP clients are behind a NAT.
It is important to note that Asterisk only proxy's RTP traffic when it has to, and when configured to do so. If both clients are on the same local network segment, Asterisk doesn't need to play a part in the RTP session, and it will proxy only the SIP traffic.
In summary, when troubleshooting packet captures, pay close attention to;
1. The ports and IP addresses specified in the SIP message header (to, from, via). Determine where the packet came from, where it thinks it needs to go, and the route it has taken to get to where you found it.
2. The ports and IP addresses specified in the Session Description (SD) portion of the SIP message. Ensure that the remote party will be able to connect to both the IP address and the port specified.