A walk-through of a TCP handshake

tcpdump is a great tool for really making sense of what's going on "under the hood" in your network communications — I've been called on more than once to troubleshoot an issue that required me to dig down into the wire-protocol layer that tcpdump exposes. There's actually a more modern graphical tool called Wireshark that exposes the same data, while adding some graphical niceties, but since the output is equivalent and it's easier to show tcpdump output in a blog post like this one, I'll stick with tcpdump output here. In this post, I'll capture the tcpdump output of a TCP handshake and walk through each byte of it and what each means and what it's for.

Of course, the first thing I need to do — before I even launch a browser — is to start up tcpdump in listening mode. As it turns out, by default, tcpdump spits out everything that passes through your network card — which is quite a lot. To start out with, then, it's worth narrowing down exactly what we're interested in: TCP traffic on port 443. tcpdump includes an option to filter the results using an expression language: tcp port 443 is the filter that I'll use here. Also by default, tcpdump only summarizes the data under the assumption that you're mostly interested in TCP behavior. In this case, I want to see everything, so I pass in the -x option which instructs it to output the contents of every single data packet in hexadecimal. For what should probably be obvious reasons, tcpdump must run as root on Unix (including Mac OS/X) platforms. (If you're on Windows, there's an equivalent program called windump that accepts the same parameters).

sh-3.2# tcpdump -x tcp port 443
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Apple DLT_PKTAP), capture size 262144 bytes
tcpdump waits until it sees any network traffic - when I open up, for example, the Amazon home page, there's a flurry of activity. The first three packets that are exchanged are, as required, the TCP 3-way handshake, which I step through below.

14:39:43.084497 IP localhost.54626 > server-54-192-87-96.lax3.r.cloudfront.net.https: Flags [S], seq 4183262244, win 65535, options 
		[mss 1460,nop,wscale 5,nop,nop,TS val 541058676 ecr 0,sackOK,eol], length 0
	0x0000:  XXXX XXXX XXXX YYYY YYYY YYYY 0800 4500
	0x0010:  0040 94ae 4000 4006 ed7e 0a64 2007 36c0
	0x0020:  5760 d562 01bb f957 8424 0000 0000 b002
	0x0030:  ffff 2324 0000 0204 05b4 0103 0305 0101
	0x0040:  080a 203f e674 0000 0000 0402 0000

Figure 1: TCP SYN packet

The first packet that my browser exchanges with Amazon is a TCP "synchronize" (SYN) packet, shown in figure 1. The first line of the tcpdump output above is a summary of the packet (which is all you see if you don't ask for a full hex dump), followed by the full 78-byte TCP packet. tcpdump doesn't offer much help in interpreting the hex dump (that's what the summary line is for, after all), but if you're at least somewhat familiar with TCP/IP, you know that this is an Ethernet header, followed by an IP header, followed by a TCP header. Just to be on the safe side, I've masked out my source and destination MAC addresses with YYYY YYYY YYYY and XXXX XXXX XXXX, respectively (you'll see actual numbers if you try this yourself). These are followed by the two-byte next protocol indicator, which is 0x0800, or the Internet Protocol (IP). The IP header, per the IP specification, section 3.1 starts with a four-bit version number 4 and a four-bit header length 5. The header length is counted in 32-bit "words", so this packet includes 20 bytes of IP data — the contents of the IP header here are:
	0x0000:                                     4500
	0x0010:  0040 94ae 4000 4006 ed7e 0a64 2007 36c0
	0x0020:  5760
(notice that the 20 byte length count includes the leading byte that declares the length in the first place). The next byte is the "type of service" byte which is 00 here: this byte is rarely, if ever used. This is followed by the two-byte total length of the packet 0x0040 (64 decimal) — this includes the IP header itself, but not the 14-byte Ethernet header. 0x94ae is the fragment identifier (used in reassembling partial packets). This fragment identifier in this case turns out to be relatively superfluous because the second bit of the next byte 0x40 is the "do not fragment" bit. The remainder of the byte and the next one are the fragment offset — 0 since this is a full packet which doesn't permit fragmentation anyway. Next is the time to live 0x40 (64) — each hop is responsible for decrementing this value and discarding it whenever the value is 0 to prevent packet looping. The next byte, 0x06, is the protocol indicator of the next header — in this case TCP. The next two bytes, 0xed7e, are the header checksum. This is defined tersely in the RFC as "the 16 bit one's complement of the one's complement sum of all 16 bit words in the header." This would suggest that you could compute it with a routine similar to listing 1, below:

unsigned short headers[] = {...}; // headers go here

unsigned short checksum = 0x0000;
for (int i = 0; i < ((headers[0] & 0x0F00) >> 7); i++)	{
 checksum += headers[i];
}
headers[5] = ~checksum;

Listing 1: almost, but not quite, IP header checksum routine

This doesn't quite work, though, because it discards overflow - any data in the high-order half has to be added back to the low order half before inverting it, as in listing 2:

checksum = (checksum & 0xFFFF) + ((checksum & 0xFFFF0000) >> 16);
headers[5] = ~checksum;

Listing 2: accounting properly for overflow

This process has the benefit that the receiver can check the checksum quickly by performing the same routine and verifying that the result is 0, as expected.

Finally, the last 8 bytes are the source address and destination address. My IP address (at least as far as my localhost is concerned) is 10.100.32.7, and the destination address (Amazon's web server) is 54.192.87.96. Notice in figure 1 that this is actually embedded in the text host name given out by the CDN — in this case Amazon's own CloudFront CDN.

IP headers are permitted to include quite a few optional values, but none of the packets in this exchange include them; they're all "bare" 20-byte IP headers.

Moving on to the TCP header, as specified by RFC 793, which starts at byte 34 of figure 1, you see that the first two bytes are the "source port" of 54626 (0xd562). The next two bytes are the destination port 443 (0x1bb). Notice in the tcpdump summary line that the source port 54626 is shown as the source, but 443, the destination, is annotated with the "friendlier" name https. Next up is the 8-byte sequence number and the 8-byte acknowledgment number. The sequence number is f957 8424 — TCP is a "sliding window" protocol, so each exchange starts with a random sequence number which is incremented by the size of the previous packet on each subsequent packet not including headers: so the next packet sent by this socket should (and, if you glance down a bit, is) actually be f957 8425, 1 more than the last packet. The 8-byte acknowledgment number is 0 — since the other side hasn't sent anything, there's nothing to acknowledge.

The next byte is 0xb0; the first four bits of this are the header length, in 32-bit "words", as in the IP header.

The following byte, 0x02, is the "flags" byte of which only one flag is set in this case: the next-to-last bit, indicating that this is a "synchronize" (SYN) packet — in other words, starts a new socket connection. 0xffff is the window size, indicating that the receiver of this packet can respond with up to 65,535 bytes of unacknowledged data at a time, but no more (but see the options, below). As with the IP header, the TCP header has its own checksum which follows the window size; in this case, 0x2324. The TCP header calculation is slightly more complex than the IP header, because it actually incorporates some elements from the IP header.

The unused (and mostly unspecified) "urgent pointer" that follows is 0, as it is for effectively all TCP traffic.

That's the end of the standard TCP header. Like IP, TCP allows for variable options to be appended to its header, but unlike IP, these are actually pretty common in TCP. Since you know from the header length that this is a 44-byte header, and the standard TCP header consumes exactly 20 bytes. Remember, though, that the TCP header length is given in 4-byte words, so it isn't safe to assume that all of the remaining data in the header are options; the options themselves encode enough information to process each one in turn even if you don't know ahead of time where to stop. Each option is a tag/length/value triple; the tag is the option specifier, the length is how long the option is, including the tag and length byte (limiting TCP options to < 253 bytes of value - probably a good thing, considering that these are prepended to data packets!), and the value varies depending on the tag. The TCP specification does permit the length/value to be omitted for tags which don't require data — as it turns out, there are only two of these, which I'll cover below.

The first tag is 0x02, which is specified in RFC 793 as the maximum segment size option. It's followed by 0x04 bytes of data which are themselves the maximum TCP segment of 0x05b4, or 1460 bytes. This instructs the receiver that, although the client can buffer up to 65,536 bytes of data at a time, the network card can only accept 1460 of them, so each packet must be less than this size, including TCP/IP headers.

This is followed by the "no-op" value 0x01. This is one of the two defined options that doesn't require (or allow) a length. This is typically used, as it is in this case, to align the next option on a word boundary.

Next, the options list ends with option 0x03. You can scour RFC 793 for an option with a tag 0x03, but you won't find one - this option was actually defined almost 10 years after TCP was, in RFC 1323. The "window scale" option is a 3-byte option and thus has a single byte of value: in this case 0x05. When TCP was first defined in 1981, an unacknowledged window of 65,536 bytes seemed like a lot — it was unlikely that the networks of the time would be able to send that much data before the application could consume and acknowledge that much outstanding data. However, not even 10 years later, this small window size was resulting in performance problems due to underutilization of the network. Rather than changing the TCP header specification, this option tells the receiver to left shift the window size specification 5 times (that is, multiply it by 32). This would imply that the client is advertising a buffer of 65535 * 32 = 2,097,120 bytes, but at this point, it can't make any assumptions that the receiver will understand the window scaling option, so to be on the safe side, it starts out by advertising as large a window as the TCP specification permits.

After two no-op bytes, the next option, 0x08, of length 0x0a (10 bytes), is the timestamp value. This is also part of the RFC 1323 high-throughput specification. By affixing a timestamp to each packet, the TCP implementation can get a better measure of what sort of network delay is in place and do a better job of only retransmitting packets when it's certain that they have actually been dropped by the underlying network.

The next option byte is 0x04 TCP selective acknowledgments permitted, which, too, is specified in RFC 1323 along with the By affixing a timestamp to each packet, the TCP implementation can get a better measure of what sort of network delay is in place and do a better job of only retransmitting packets when it's certain that they have actually been dropped by the underlying network. window scale and timestamp options. The following length byte is 2: this indicates that there is no value, since TCP SACK doesn't need it. The first version of TCP required that, if a packet were lost, then all subsequent packets should be assumed to have been lost, too (remember that packets follow a sequential numbering scheme). Selective acknowledgments permit the receiver to acknowledge that packets 1 and 3 were received, but not 2, hence limiting the number of packets which need to be retransmitted. Practically speaking, all TCP implementations permit this, but it is still required to advertise support for it with this option.

There are two bytes remaining - both 0's. Per the specification, the next byte should be parsed as an option tag and it is: in this case the "end-of-list" tag which is the other tag that doesn't permit (or allow) a length byte. This is necessary here because without it the TCP header would end "too soon".

It's helpful to see the whole packet "unrolled" and each piece labelled as below:
Starting offsetContentsMeaning
0x0000XXXXXXXXXXXXDestination MAC address
0x0006YYYYYYYYYYYYSource MAC address
0x000c0800Next protocol type (IP)
0x000e45IP version and IP header length / 4
0x000f00Type of service
0x00100040Total packet length
0x001294aeFragment Identifier
0x00144000Fragmentation options
0x001640Time to live
0x001806Next protocol (TCP)
0x001aed7eHeader checksum
0x001c0a642007Source IP address
0x001e36c05760Destination IP address
0x0022d562Source port
0x002401bbDestination port
0x0026f9578424Starting sequence number
0x002a00000000Acknowledging sequence number
0x002eb0RCP header length
0x002f02TCP flags
0x0030ffffWindow size
0x00322324TCP header checksum
0x00340000Urgent pointer
0x0036020405b4Maximum segment size
0x003a01No-op
0x003b030305Window scale
0x003e01No-op
0x003f01No-op
0x0040080a203fe67400000000Time stamp
0x004a0402TCP Sack permitted
0x004c00End of options
0x004d00Padding

Per the TCP handshake protocol, the server is now responsible for acknowledging the SYN packet, which it does with the next packet:

14:39:43.118141 IP server-54-192-87-96.lax3.r.cloudfront.net.https > localhost.54626: Flags [S.], seq 3050391779, ack 4183262245, win 28960, options 
		[mss 1460,sackOK,TS val 1898598028 ecr 541058676,nop,wscale 8], length 0
	0x0000:  YYYY YYYY YYYY XXXX XXXX XXXX 0800 4500
	0x0010:  003c 0000 4000 f106 d130 36c0 5760 0a64
	0x0020:  2007 01bb d562 b5d1 48e3 f957 8425 a012
	0x0030:  7120 0489 0000 0204 05b4 0402 080a 712a
	0x0040:  4e8c 203f e674 0103 0308
You can see a lot of similarities between this packet and the previous one. In fact, it's easier just to consider the differences. First of all, byte 17, 0x3c, is the length of this packet — four bytes shorter than the last one. This packet doesn't include an ID, and its time to live of 0xf1 is a bit higher (it's probably safe to assume that it was set by Amazon to 255 and was decremented by exactly one at each of 14 routers that it passed through to make it back to me). The header checksum is different of course and finally, the source and destination IP addresses are swapped — this makes sense because now Amazon is replying to me.

Similarly, the TCP header starts with an inverted pair of source and destination ports. The sequence number here is b5d1 48e3 — both sides maintain independent streams of sequence numbers. Now, though, the acknowledgment number is f957 8425: one more than the sequence number of the original SYN packet. TCP requires that each side acknowledge the next byte — in other words, the Amazon server is telling me that it expects to receive sequence number f957 8425 next. If my computer wasn't expecting to send that sequence number, that would be an indicator to the TCP infrastructure that a packet had been either lost or duplicated. The length of this header is 40 bytes instead of 44 as in the previous packet, and there's an extra flag set, the "ACK" flag. From this point on, the ACK flag is set on every packet, but this will be the last one in this interchange which also has the SYN flag set, because the synchronization is now considered complete: both sides have agreed on a starting sequence number and can thus recognize and recover from lost or duplicated packets. Interestingly, Amazon advertises a smaller window size of 0x7120 (28,960 bytes). The options are the same — Amazon is essentially agreeing to the options that my browser proposed — but it's interesting to see that they're presented in a different order, and that they're aligned precisely at the end of the header so that the 0x0 "end-of-list" tag is not present (or needed).

The TCP handshake is still not quite finished, though. My browser still has to acknowledge the acknowledgment, which it does with its next packet:

14:39:43.118213 IP localhost.54626 > server-54-192-87-96.lax3.r.cloudfront.net.https: Flags [.], ack 1, win 4117, 
		options [nop,nop,TS val 541058709 ecr 1898598028], length 0
	0x0000:  XXXX XXXX XXXX YYYY YYYY YYYY 0800 4500
	0x0010:  0034 eda4 4000 4006 9494 0a64 2007 36c0
	0x0020:  5760 d562 01bb f957 8425 b5d1 48e4 8010
	0x0030:  1015 9440 0000 0101 080a 203f e695 712a
	0x0040:  4e8c
This one is almost identical to the first packet now, since the source/destinations are the same. The two main differences are that the SYN flag is no longer set and the acknowledgment number (bolded above) is included. The window size is now 1015 (4117) scaled by 32 to 131,744, since both sides have established now that they support window scaling. Finally, there are fewer options in this case; the only option provided is the timestamp option (which must be present on every packet is TCP timestamps are being used).

At this point, the TCP handshake is done — both sides are ready to send each other data but as of now, any interim router (which it looks like there are at least 14 of) can see and log any data that's exchanged. Since the requested protocol was https, indicated by the choice of destination port 443, though, the client (my browser) knows to now begin an SSL handshake before transmitting even a single byte of HTTP data. In my next post, I'll walk through the (considerably more involved) SSL handshake that also occurs before the browser and the server can begin transferring actual HTTP messages.

Add a comment:

Completely off-topic or spam comments will be removed at the discretion of the moderator.

You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)

Name: Name is required
Email (will not be displayed publicly):
Comment:
Comment is required
My Book

I'm the author of the book "Implementing SSL/TLS Using Cryptography and PKI". Like the title says, this is a from-the-ground-up examination of the SSL protocol that provides security, integrity and privacy to most application-level internet protocols, most notably HTTP. I include the source code to a complete working SSL implementation, including the most popular cryptographic algorithms (DES, 3DES, RC4, AES, RSA, DSA, Diffie-Hellman, HMAC, MD5, SHA-1, SHA-256, and ECC), and show how they all fit together to provide transport-layer security.

My Picture

Joshua Davies

Past Posts