Friday, January 2, 2015

Mini-HTTP browser/server, how the server determines the end of a request

I'm writing a mini-HTTP browser and server in Ruby for the Odin Project:

http://www.theodinproject.com/ruby-programming/ruby-on-the-web

The browser's requests are structured as prescribed on this page:

http://www.jmarshall.com/easy/http/#structure

I've been getting tripped up on how to have the server read all the data that it needs from its socket (incoming data connection; the connection to which the browser is sending its request). The header part of the request ends with a CRLF (a blank line; "\r\n\r\n" in Ruby), and I originally had the server read the socket within a loop, using #gets, until it either hit one such CRLF in the case of a GET request, or two in the case of a POST request.

Problem: according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html , POST requests aren't supposed to have a CRLF following the last line of the transmission, after the request body.

The page says: "Certain buggy HTTP/1.0 client implementations generate extra CRLF's after a POST request. To restate what is explicitly forbidden by the BNF, an HTTP/1.1 client MUST NOT preface or follow a request with an extra CRLF."

Given that, I tried to update the loop according to what another student did: have it read the socket until the first CRLF in order to get the header portion of the request, which is good enough for a GET request.

That code looks like this:

client = server.accept
header = ""
while line = client.gets
  header += line
  break if header =~ /\r\n\r\n$/
end


Then, in the case of a POST request, start reading again (we're reading lines using gets and haven't closed the connection, so we can pick up where we left off) and read until...

Oh.

So if there isn't a second CRFL, I'm not sure how to determine that the request transmission is over.

I was stumped on this for a while, until finding this Q/A on StackOverflow:
http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body

Of course! It's so simple. The browser specifies and sends the request body file size in bytes. So, the server just needs to parse that size from the header, and wait until the request body is exactly that size.

So, I tried making a loop, using #gets as above, to concatenate lines to the variable "body" until its size equalled the size specified in the header lines. Something like:

# parse out the size, in bytes, of the request body from the header
body_size = header_lines[-2].split(" ")[1].to_i
body = ""
while line = client.gets
  body += line
  break if body_size == body.size
end 


Heartbreak and anguish!

Here's why: I think that #gets will read until it sees a newline. So since there's no \r\n\r\n, or even a \n at the end of the request body, it won't stop reading and won't even let the server do anything past "line = client.gets". It gets totally stuck.

Solution!

Don't use #gets! Use #read, with which you can specify the exact amount of bytes you want to read. Which suits our needs perfectly, since we happen to have that very number.

Here's how the code will look:

# parse out the size, in bytes, of the request body from the header
body_size = header_lines[-2].split(" ")[1].to_i
# and read exactly that many bytes out of the socket

body = client.read(body_size)

And with that, my mini-browser and mini-server are able to issue and handle GET and POST requests.

Here's my code: https://github.com/johnwquarles/simple-server-and-client


No comments:

Post a Comment