Troubleshooting With tcpdump
Today I ran into a problem with a program I'm writing to access the AUR (Arch User Repository) via the JSON interface. The program uses libcurl and ultimately will allow the user to search the AUR and download PKGBUILDs. I am still working on querying and parsing results via the JSON interface, and ran into a problem that is obvious in retrospect but I didn't see right away.
The program was working fine for downloading simple websites like google:
$ ./download http://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
However, it failed when querying the AUR:
$ ./download http://aur.archlinux.org/rpc.php?type=info&arg=tetris
{"type":"error","results":"No request type\/data specified."}
The problem should have been clear to me at this point, and it probably is to many of you, but I didn't see it. My troubleshooting was lead astray by the curl documentation for the -G/--get option:
-G/--get
When used, this option will make all data specified with -d/--data or --data-binary to be used in a HTTP GET request instead of the POST request that otherwise would be used. The data will be appended to the URL with a '?' separator.
The request worked fine using curl on the command line like so:
$ curl http://aur.archlinux.org/rpc.php --get -d type=info -d arg=tetris
{"type":"info","results":{"ID":"22474","Name":"tetris","Version":"0.27178-1","CategoryID":"6","Description":"A 2-D clone of Tetris","LocationID":"2","URL":"http:\/\/hackage.haskell.org\/cgi-bin\/hackage-scripts\/package\/tetris","URLPath":"\/packages\/tetris\/tetris.tar.gz","License":"custom:BSD3","NumVotes":"0","OutOfDate":"0"}}
I thought I needed to find the equivalent of this in libcurl. I looked through the documentation, read the mailing list, and tried several different things to no avail. I finally decided to use tcpdump to capture the transaction, once using the curl command line tool and once using my tool. I was specifically interested in viewing the actual request that was sent to the remote server and if it was different for the two cases.
I obtained two separate packet traces by running the following command and then running either curl from the command line or my program.
# tcpdump port 80 -w capture.log
I then analyzed each dump file with tcpdump with the -X option to print the data portion of the packet.
$ tcpdump -Xr capture.log
By default, tcpdump captures only the first 68 bytes from each packet, but it turns out this was enough to discover the problem. If it hadn't been, I could have used -s to grab more data.
Below is the interesting portion of the dump for each operation; the packet that includes the request to the AUR server. First the curl operation:
$ tcpdump -Xr curl.log
09:34:43.013749 IP 192.168.1.150.47272 > gerolde.archlinux.org.www: Flags [P.], ack 1, win 92, options [nop,nop,TS val 19900965 ecr 3363876227], length 175
0x0000: 4500 00e3 21ad 4000 4006 3e45 c0a8 0196 E...!.@.@.>E....
0x0010: 42d3 d511 b8a8 0050 a1ef 8ba6 c551 9fd7 B......P.....Q..
0x0020: 8018 005c f36a 0000 0101 080a 012f aa25 ...\.j......./.%
0x0030: c880 ad83 4745 5420 2f72 7063 2e70 6870 ....GET./rpc.php
0x0040: 3f74 7970 653d 696e 666f 2661 ?type=info&a
You can clearly see the GET request with correct syntax.
Now let's take a look at the dump from my program using libcurl:
$ tcpdump -Xr download.log
09:35:43.681417 IP 192.168.1.150.47274 > gerolde.archlinux.org.www: Flags [P.], ack 1, win 92, options [nop,nop,TS val 19919165 ecr 3363894437], length 73
0x0000: 4500 007d e356 4000 4006 7d01 c0a8 0196 E..}.V@.@.}.....
0x0010: 42d3 d511 b8aa 0050 da82 1274 fd68 053a B......P...t.h.:
0x0020: 8018 005c 8b39 0000 0101 080a 012f f13d ...\.9......./.=
0x0030: c880 f4a5 4745 5420 2f72 7063 2e70 6870 ....GET./rpc.php
0x0040: 3f74 7970 653d 696e 666f 2048 ?type=info.H
It is pretty clear that there is a problem with the GET request. The & between type=info and arg=tetris is gone.
All of a sudden I realized the problem...& is special to the shell. Backslash escaping the ampersand or enclosing the url in single or double quotes all solved the problem. All in all, I should have recognized the problem sooner, but, thanks to tcpdump, I figured it out eventually.