How to use PhantomJS with Node.js

How to use PhantomJS with Node.js

PhantomJS is a headless WebKit scriptable with a JavaScript API multiplatform, available on major operating systems as: Windows, Mac OS X, Linux, and other Unices. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG. PhantomJS is fully rendering pages under the hood, so the results can be exported as images. This is very easy to set up, and so is a useful approach for most projects requiring the generation of many browser screenshots (if you're looking how to create only screenshots we recommend you to read instead this article).

In this article, you will learn how to use PhantomJS with Node.js easily using a module or manipulating it by yourself with Javascript.

Requirements

You will need PhantomJS (installed or a standalone distribution) accesible from the PATH (learn how to add a variable to the PATH in windows here). In case it isn't available in the path, you can specify the executable to PhantomJS in the configuration later.

You can obtain PhantomJS from the following list in every platform (Windows, Linux, MacOS etc) in the download area of the official website here.

Note

there's no installation process in most of the platforms as you'll get .zip file with two folder, examples and bin (which contains the executable of PhantomJS).

Once you know that PhantomJS is available in your machine, let's get started !

A. Using a module

If you want to use a module to work with PhantomJS in Node.js, you can use the phantom module written by @amir20. This module offers integration for PhantomJS in Node.js. Although the workflow with Javascript ain't the same that the Javascript that you use to instruct PhantomJS, it's still easy to understand.

To install the module in your project, execute the following command in the terminal:

npm install phantom --save

Once the installation of the module finishes, you will be able to access the module using require("phantom").

The workflow (of creating the page and then with the page do other things) remains similar to the scripting with plain Javascript in PhantomJS. The page object that is returned with createPage method is a proxy that sends all methods to phantom. Most method calls should be identical to PhantomJS API. You must remember that each method returns a Promise.

The following script will open the Stack Overflow website and will print the html of the homepage in the console:

var phantom = require("phantom");
var _ph, _page, _outObj;

phantom.create().then(function(ph){
    _ph = ph;
    return _ph.createPage();
}).then(function(page){
    _page = page;
    return _page.open('https://stackoverflow.com/');
}).then(function(status){
    console.log(status);
    return _page.property('content')
}).then(function(content){
    console.log(content);
    _page.close();
    _ph.exit();
}).catch(function(e){
   console.log(e); 
});

If you're using Node.js v7+, then you can use the async and await features that this version offers. 

const phantom = require('phantom');

(async function() {
    const instance = await phantom.create();
    const page = await instance.createPage();
    await page.on("onResourceRequested", function(requestData) {
        console.info('Requesting', requestData.url)
    });

    const status = await page.open('https://stackoverflow.com/');
    console.log(status);

    const content = await page.property('content');
    console.log(content);

    await instance.exit();
}());

It simplifies the code significantly and is much easier to understand than with Promises.

B. Own implementation

As you probably (should) know, you work with PhantomJS through a js file with some instructions, then the script is executed providing the path of the script as first argument in the command line (phantomjs /path.to/script-to-execute.js). To learn how you can interact with PhantomJS using Node.js create the following test script (phantom-script.js) that works with PhantomJS perfectly. If you want to test it, use the command phantomjs phantom-script.js in a terminal:

/**
 * phantom-script.js
 */
"use strict";
// Example using HTTP POST operation in PhantomJS
// This website exists and is for test purposes, dont post sensitive information
var page = require('webpage').create(),
    server = 'http://posttestserver.com/post.php?dump',
    data = 'universe=expanding&answer=42';

page.open(server, 'post', data, function (status) {
    if (status !== 'success') {
        console.log('Unable to post!');
    } else {
        console.log(page.content);
    }

    phantom.exit();
});

The previous code should simply create a POST request to a website (check obviously that you have internet access while testing it).

Now we are going to use Node.js to cast a child process, this Node script should execute the following command (the same used in the command line):

phantomjs phantom-script.js

To do it, we are going to require the child_process module (available by default in Node.js) and save the spawn property in a variable. The child_process.spawn() method spawns a new process using the given command (as first argument), with command line arguments in args (as second argument). If omitted, args defaults to an empty array.

Declare a variable child that has as value the returned value from the used spawn method. In this case the first argument for spawn should be the path to the executable of phantomjs (only phantomjs if it's in the path) and the second parameter should be an array with a single element, the path of the script that phantom should use. From the child variable add a data listener for the stdout (standard output) and stderr (Standard error output). The callback of those listeners will receive an Uint8Array, that you obviously can't read unless you convert it to string. To convert the Uint8Array to its string representation, we are going to use the Uint8ArrToString method (included in the script below). It's a very simple way to do it, if you require scability in your project, we recommend you to read more ways about how to convert this kind of array to a string here.

Create a new script (executing-phantom.js) with the following code inside:

/**
 * executing-phantom.js
 */

var spawn = require('child_process').spawn;
var args = ["./phantom-script.js"];
// In case you want to customize the process, modify the options object
var options = {};

// If phantom is in the path use 'phantomjs', otherwise provide the path to the phantom phantomExecutable
// e.g for windows:
// var phantomExecutable = 'E:\\Programs\\PhantomJS\\bin\\phantomjs.exe';
var phantomExecutable = 'phantomjs';

/**
 * This method converts a Uint8Array to its string representation
 */
function Uint8ArrToString(myUint8Arr){
    return String.fromCharCode.apply(null, myUint8Arr);
};

var child = spawn(phantomExecutable, args, options);

// Receive output of the child process
child.stdout.on('data', function(data) {
    var textData = Uint8ArrToString(data);

    console.log(textData);
});

// Receive error output of the child process
child.stderr.on('data', function(err) {
    var textErr = Uint8ArrToString(err);
    console.log(textErr);
});

// Triggered when the process closes
child.on('close', function(code) {
    console.log('Process closed with status code: ' + code);
});

As final step execute the previous node script using:

node executing-phantom.js

And in the console you should get the following output:

<html><head></head><body>Time: Thu, 09 Feb 17 06:24:55 -0800
Source ip: xx.xxx.xxx.xxx

Headers (Some may be inserted by server)
REQUEST_URI = /post.php?dump
QUERY_STRING = dump
REQUEST_METHOD = POST
GATEWAY_INTERFACE = CGI/1.1
REMOTE_PORT = 57200
REMOTE_ADDR = 93.210.203.47
HTTP_HOST = posttestserver.com
HTTP_ACCEPT_LANGUAGE = de-DE,en,*
HTTP_ACCEPT_ENCODING = gzip, deflate
HTTP_CONNECTION = close
CONTENT_TYPE = application/x-www-form-urlencoded
CONTENT_LENGTH = 28
HTTP_ORIGIN = null
HTTP_USER_AGENT = Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
HTTP_ACCEPT = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
UNIQUE_ID = WJx7t0BaMGUAAHxjI0EAAAAD
REQUEST_TIME_FLOAT = 1486650295.8916
REQUEST_TIME = 1486650295

Post Params:
key: 'universe' value: 'expanding'
key: 'answer' value: '42'
Empty post body.

Upload contains PUT data:
universe=expanding&amp;answer=42</body></html>

We, personally prefer the self implemented method to work with PhantomJS as the learning curve of the module is steep (at least for those that knows how to work with PhantomJS directly with scripts), besides the documentation ain't so good.

Happy coding !

Become a more social person