Server-side upload time tracking

I wanted to see if we could roughly log how long users are spending waiting for learner data uploads. The more accurate way to do this is on the client side. However I wanted to try it on the server side so it could be applied in many cases without needing instrumented clients that send back data to the server.

I looked around for a while to see if this has been documented anywhere, but I didn’t find anything. So I decided to try something and test it to see if it would work.

The Conclusion

Yes it is possible.

At least the ‘%t’ option when added to the request headers is the time which the request is started. This time is before most of the POST data is uploaded, so it can be used to get an estimate of upload times. This estimate seems very good with my testing, but it should be verified in the real world of computers in schools before relying on it for something important.

The Test

The idea for this test came from Aaron Unger.

In summary it was tested with a simple Rack app running on a EC2 server that was identical to the servers we use to run our portals. Then on the client side I used curl and Charles (the personal proxy) to send it a large chunk of data and record the timing.

The server was running Apache 2.2.22 and it was configured with a Passenger web app. I won’t go into that setup here. Additionally I added this to the Apache configuration:

RequestHeader set X-Queue-Start "%t"
Then in the web app folder I added this config.ru file:

run lambda { |env|
  start_time = Time.now
  if env['rack.input']
    env['rack.input'].read
  end
  sleep 5
  [200, {"Content-Type" => "text/plain"},
    ["Apache Start Time: #{env['HTTP_X_QUEUE_START']}\n" +
     "Start Time: #{start_time}\n" +
     "End Time: #{Time.now}\n"]]
}

Then on my local machine. I ran Charles the personal proxy. This starts a proxy on port 8888.

I made a large random data file with:

head -c 2000000 random_data

Then I sent that off to the server with curl:

% time curl -x localhost:8888 --data-urlencode something@random_data http://testserver-on-aws
Apache Start Time: t=1359399773413862
Start Time: 2013-01-28 19:02:55 +0000
End Time: 2013-01-28 19:03:00 +0000
.
real	0m8.229s
...

Converting the time stamp shows the apache start time is 3 seconds before the start time. The simple server always waits for 5 seconds so together this makes up the 8 seconds reported. Bingo!

I wasn’t convinced that the 3 seconds was actually the upload time. I thought perhaps it was some apache processing time that happened after the upload. So I used the throttle option in Charles to slow down the upload. Doing this gave the expected result: the apache start time was even earlier than before. And subtracting the end time from the apache start time was very close to the total request time reported on the command line.

Notes

This server side approach does not cover all the time that user is waiting for an upload to complete. I would guess there will be cases when it isn’t accurate. For example some proxy or other network device might delay POST requests in someway and in that case this approach would not record that time.