Naked JavaScript

Taking the DOM off JavaScript 
« Back to blog

Going evented with Node.js

One of the most exciting presentations to come out of JSConf.eu was Ryan Dahl's presentation of his incredible Node.js project. Ironically we had just covered it at last week's NOVALANG session. I decided it would be a great article for Naked JavaScript in order to provide an nice introduction to Node.js and the broader topic of CommonJS as well. By the end of this article you will have a functional Node.js installation and built several interesting applications. Enjoy!

What is Node.js

Node.js is an evented I/O framework built on top of Google's V8 JavaScript engine. It's goal is to provide an incredibly powerful I/O system through which you can build highly efficient and s calable applications without any knowledge of "advanced topics" such as threading, processes, etc.. It does this by using an event-based programming model similar to Python's Twisted framework or Ruby's EventMachine. In the event-based model, you registered what should happen, commonly referred to as a callback, when an specific event happens. You do not worry about the capturing, execution, or closing of the event. This is distinct from threaded programming which requires the developer to identify the "event", create the thread, execute the processing, and clean up the thread, all of which is complex and littered with hard-to-debug issue. The easiest way to describe the event model, especially to people coming from browser-based JavaScript, is that it is exactly how you program interactivity in the browser. Take the following example which shows evented programming using AJAX requests in Dojo, jQuery, and Prototype.

What is going on in each of these AJAX requests is that we describe a function on what to do upon a successful GET request to the specific url, in this case "/dragons", and that function takes whatever data happens. In standard procedural programming, one would think that the process would wait for the request to be made and then continue about processing the rest of the program. In event-based programming, the function is identified and stored until the specified event (in this case successful response from the GET request). The processing of the program continues on, it does not stop or "block" waiting for the rest of the operation, in this case make and receive the GET request, to finish out. Once the request and response are executed, then the event is triggered and the stored function is called with the data passed into it. I am focusing on this because it is vastly different than the normal mode of programming, so its critical to get it right for both Node.js and general event based programming. 

Installing Node.js

At the time of this writing, Node.js is only installabled by source, due mainly because it is a constantly evolving project. To get started open up a terminal in our previous established javascript directory and issue the following commands. They will checkout the current Node.js branch from Github, configure, make both V8 and Node.js, and install it to /usr/local/bin. This should work perfectly fine on Mac OS X and Linux, Windows is not currently supported by Node.js.

Once this completes, it will install two Node.js specific executables, node and node-repl. The first one, node, is used to execute Node.js files, like the ones we will create during this article. The other, node-repl, is a Read-Eval-Print-Loop which will allow you to quick try out bits of c ode in Node.js without creating a file. This can be a lot of fun to start with if you want to just verify your installation. You can run any standard JavaScript code in node-repl directly. 

Building HTTP

The event-based programming is fundamental to proper JavaScript programming and Node.js programming and allows for elegant programs that handle massively concurrent systems with little code, memory, and processing power. The best example of this is the "build a web server" example on the homepage of the Node.js website, re-posted below. In this non-trivial example, we are setting up a highly scalable, web server bound to port 8000 that simply serves up a "hello world" web page after 2 seconds have past (have to make it feel real after all). 

The code is pure JavaScript so it should be relatively easy to understand. Briefly though, the application is doing the following:

  1. Include the standard libraries system and HTTP AND set the to contextually appropriate variable names.
  2. Create an HTTP server listening on port 8000 and attach the following function to handle any incoming calls.
  3. Within the function (req, res) {...} body is where our web action happens, which in this case just sets the response headers, sends the body "Hello World", and finishes the request. It does this only after 2 seconds (2000) have elapsed.

The interesting thing of note here is the way in which the server processing is handled. The function(req, res) {...} that we create here is actually registered to an event (in this case an incoming HTTP request on port 8000). When the event happens, the code is then execute with the provided parameters of request object and response object. This makes great little network server because when nothing is happening (IE no traffic) then nothing is happening (IE no processing). The code after the registration event happens immediately since the listen and processing is done not at the point of interpretation, but at event trigger. 

Nom, Nom, Nom

Now that we have created a server, lets create a brand new client to consume the data from that (or any server). Take a look at the following code and see if it makes sense to you, there are some tricky components of this so take your time. 

Forgoing the parts we have already covered, notice the first function we create called read. It takes a parameter callback which we execute and pass the data to once we have obtained it from the HTTP service. Within read, we create a HTTP client with the http.createClient(port, domain); syntax, this just sets up the necessary structure for the connection. Then we assign a get request of "/index" to the connection which returns a request object. This is not actually make the request yet. It waits until it is at least provided the finish command so that you can assign HTTP headers or do other processing. In the request.finish() function we pass an anonymous function that processes the response object, in here things get a little crazy.

Within the anonymous function (starting at line 6) we create an empty string called responseBody and then set the response encoding to UTF-8. We then attach a listener for the event "body" with another anonymous function that appends chunks of data to our responseBody string. This is the way to pull information from the service and is done in this way to facilitate chunked data delivery, you are actually pulling each chunk of data off the wire and into your string. This is great for large data chunks because you could start processing the data even while downloading the data. After one or many "body" events, there will be a "complete" event fired which indicates that the HTTP response has completed. In the "complete" event anonymous function we simply execute the callback parameter function and pass it the data. For this example, the call back function only outputs to the console, so nothing special, but in just a small number of lines, you have created a very powerful HTTP consuming application. 

Twitter Client

With what we have done so far we can actually create a full and meaningful application consuming an external service instead of our own hello world. The following code is just a simple modification of the localhost calling HTTP client we just wrote, but this time it is calling Twitter's search API to find any tweets about JSConf. This will automatically query the API every minute and pull the newest items and display in bottom posting order. You can just leave this terminal window open and watch all the tweets fly by while using very little processor time and memory space, 0.1% CPU and 30MB on my laptop. Perfect for netbooks and other battery constrained devices and best of all its only JavaScript!

Other Frivalities

While showing off the HTTP capabilities of Node.js are incredibly sexy and most likely the very future of web application development, it can do so much more. Take for instance this bit of code from the Node.js API documentation, which opens a raw TCP socket on port 7000 of the loopback interface. What is incredibly striking about this is that it is so fundamentally similar to the aforementioned HTTP client and HTTP server we created. In the same fashion as client side JavaScript libraries, Node.js callbacks are uniform regardless of what the event is that is being tracked. You do not worry about opening the TCP socket, threading, mutexes, or any of that complexity that in any other language would be an initial requirement.

You can have fun with this code using a simple telnet command of:

telnet localhost 7000

Also you apply the evented model to standard system execution as shown in the following code segment which executes a "ls" directory listing command and attaches a callback that will be executed upon the return of the system command execution. The Node.js execution does not stop or block waiting for the directory listing to occur, instead it continues to execute the next commands. 

Conclusion

Node.js is a revolutionary technology built on top of another powerful revolutionary technology, V8. It is gathering a lot of attention within the technology community, mainly driven by Ryan's riveting presentation at JSConf.eu. What we have covered is just scratching the surface of the power in this incredibly platform and you owe it to yourself to try your own hand at coding in Node.js, you mind find that you actually enjoy CommonJS programming more than client-side JavaScript!

More Power From The People

For interesting libraries on top of Node.js check out the libraries page on the Node.js Github wiki available at: https://www.wiki.github.com/ry/node There are some amazing projects out there that will allow you to combine the power of Node.js with other cutting edge technologies.

Loading mentions Retweet

Comments (14)

Nov 16, 2009
Chris Williams said...
I updated the post to add a link to the various libraries already available for Node.js. There are some amazing ones.
Nov 17, 2009
frgtn said...
Great article, thanks! Node.js is really an awesome platform, hope it gets the community support it needs to hit mainstream.

A note though: the twitter example code in the article seems to be missing request headers param on line 6 in connection.get() call. Header "Host: search.twitter.com" is required by HTTP 1.1 and Node.js doesn't seem to fill it by default (checked with wireshark).

Nov 17, 2009
"At the time of this writing, Node.js is only installabled by source".

If you're on a Mac and you're using homebrew (https://www.github.com/mxcl/homebrew), you can install simply with "brew install node".

Nov 18, 2009
Sasa Ebach said...
I am wondering if it would be possible to implement a crawler with this. I am thinking about a simple ThreadPool of x threads which go out and crawl/index a predefined list of urls without following links. As far as I can see this is not what node.js is intended for. At least the examples are not really geared towards something like that.
Nov 18, 2009
Christoph Dorn said...
I had to provide the following as the second argument to "connection.get" to make the twitter client work:

{"host": "search.twitter.com", "User-Agent": "NodeJS HTTP Client"}

Nov 18, 2009
Chris Williams said...
I have updated the Gist to include the header arguments, it works in some cases for some people without them, and others no such luck. Easier just to add them, should work for everyone now.

Thanks for the comments!

Nov 21, 2009
@Sasa: Implementing a crawler should be very easy & efficient using node.js. Just have a look at the http client documentation.

The only problem you may hit is the limitations of JavaScripts regex engine (no negative lookbehind for example). In that case you could try:

https://www.xregexp.com/

I have not yet tested if it works with node.js, but the author is a genius, so its worth a try : ).

Nov 24, 2009
Sasa Ebach said...
@Felix: I had a look into the documentation of the http client and this looks very promising.

Although I am having a little trouble figuring out how to "cap" the parallel processing at 10 "threads" any given moment. At the moment I have a list of 10s of thousands of URIs and suspect that if I simply push all of those into threads (events) that the machine will explode ;-)

At the moment I am thinking that I can just use setTimeout, although I think, that this is not really clean.

At this point I have the option to do this in Ruby (not known for its strength in parallel processing) and Clojure (good for parallel processing on one machine) or Erlang (good of distributed parallel processing).

I really do like about node.js that I don't have to learn another language and I really like the arguments from the author (events vs. threads). It seems like it is so obvious, but still almost nobody does think that way.

I will have to dive deeper.

Nov 24, 2009
@Sasa: You need two arrays. One is the queue for the urls you still one to parse, the other is your "thread" pool (I'll call them workers). Then you need a function that does the following:

function doWork() {
while (workers.length < 10 && urls.length > 0) {
spawnWorker();
}
}

The 'spawnWorker' function will pop one array from the url queue and process it. Once its done, you call the 'doWork' again which will spawn more workers as necessary.

Let me know if you need any more help.

Nov 24, 2009
Sasa Ebach said...
@Felix

Thanks! This is invaluable advice for me. Gruß nach Berlin ;-)

Nov 25, 2009
Glad you were able to decipher that comment. Seems like my brain tried to mix a few words & stuff up to make it more challenging : )
Dec 17, 2009
aoes said...
setTimeout(gettweets, 10000);

After processing a request, your twitter client will wait ten seconds, not a minute, before sending a new one.

Jan 26, 2010
Varria Studios said...
Thank you this code was a life saver for me.
Feb 05, 2010
zipizap said...
Thanks to you introduction and simple instructions to install node.js I was able to create a domestic-evented-program, to monitor new files uploaded into your company FTP server, and automatically process the uploaded file, returning a new file with results to be downloaded by the user.

Of interest, I've seen that node.js is capable of processing ~350-450 simultaneous files before crashing due to "Too many files open", which by far outperforms the FTP-server capability of uploading files :)

node.js is very efficient and fast for server-related tasks ;)

The program is on github: https://www.github.com/zipizap/newFileNotifier.js

Cheers, zipizap

Leave a comment...

 
Got an account with one of these? Login here, or just enter your comment below.