JohnQ's Dev Dive: Mini-HTTP browser/server, how the server determines the end of a request

I'm writing a mini-HTTP browser and server in Ruby for the Odin Project:

http://www.theodinproject.com/ruby-programming/ruby-on-the-web

The browser's requests are structured as prescribed on this page:

http://www.jmarshall.com/easy/http/#structure

I've been getting tripped up on how to have the server read all the data that it needs from its socket (incoming data connection; the connection to which the browser is sending its request). The header part of the request ends with a CRLF (a blank line; "\r\n\r\n" in Ruby), and I originally had the server read the socket within a loop, using #gets, until it either hit one such CRLF in the case of a GET request, or two in the case of a POST request.

Problem: according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html , POST requests aren't supposed to have a CRLF following the last line of the transmission, after the request body.

The page says: "Certain buggy HTTP/1.0 client implementations generate extra CRLF's after a POST request. To restate what is explicitly forbidden by the BNF, an HTTP/1.1 client MUST NOT preface or follow a request with an extra CRLF."

Given that, I tried to update the loop according to what another student did: have it read the socket until the first CRLF in order to get the header portion of the request, which is good enough for a GET request.

That code looks like this:


client = server.accept

header = ""

while line = client.gets

  header += line

  break if header =~ /\r\n\r\n$/

end

Then, in the case of a POST request, start reading again (we're reading lines using gets and haven't closed the connection, so we can pick up where we left off) and read until...

Oh.

So if there isn't a second CRFL, I'm not sure how to determine that the request transmission is over.

I was stumped on this for a while, until finding this Q/A on StackOverflow:
http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body

Of course! It's so simple. The browser specifies and sends the request body file size in bytes. So, the server just needs to parse that size from the header, and wait until the request body is exactly that size.

So, I tried making a loop, using #gets as above, to concatenate lines to the variable "body" until its size equalled the size specified in the header lines. Something like:

# parse out the size, in bytes, of the request body from the header


body_size = header_lines[-2].split(" ")[1].to_i

body = "" 

while line = client.gets 

   body += line 

   break if body_size == body.size 
end

Heartbreak and anguish!

Here's why: I think that #gets will read until it sees a newline. So since there's no \r\n\r\n, or even a \n at the end of the request body, it won't stop reading and won't even let the server do anything past "line = client.gets". It gets totally stuck.

Solution!

Don't use #gets! Use #read, with which you can specify the exact amount of bytes you want to read. Which suits our needs perfectly, since we happen to have that very number.

Here's how the code will look:

# parse out the size, in bytes, of the request body from the header
body_size = header_lines[-2].split(" ")[1].to_i
      
# and read exactly that many bytes out of the socket

body = client.read(body_size)

And with that, my mini-browser and mini-server are able to issue and handle GET and POST requests.

Here's my code: https://github.com/johnwquarles/simple-server-and-client

JohnQ's Dev Dive

Friday, January 2, 2015

Mini-HTTP browser/server, how the server determines the end of a request

No comments:

Post a Comment