How to read recursively a directory in Node.js

How to read recursively a directory in Node.js

Either to create some kind of file search algorithm or to get a list of all the files and folders inside a directory to compress it with zlib, this feature is often searched for Node.js developers (and the number of downloads and dependent packages on some known modules proves it).

In this article we are going to show you how to loop recursively through a directory to list all its content with 3 different methods (custom snippets or by using open source modules).

A. Using a custom snippet

If you want to loop recursively through a directory in Node.js, you don't need necessarily a module to achieve it as you can use a very simple recursive function. The following filewalker function will do the trick for you. It expects as first argument a string with the path of the folder that will be recursively explored and as second argument a function (the callback) executed once there are no mores directories inside the providen path. The callback receives 2 arguments, the error as first and the result array that contains all the mixed paths of files and subdirectories.

Note

If you only need to ignore all the folders and retrieve only the paths of the files, you can comment easily the line that stores the filepath in the array within the snippet.

const fs = require('fs');
const path = require('path');

/**
 * Explores recursively a directory and returns all the filepaths and folderpaths in the callback.
 * 
 * @see http://stackoverflow.com/a/5827895/4241030
 * @param {String} dir 
 * @param {Function} done 
 */
function filewalker(dir, done) {
    let results = [];

    fs.readdir(dir, function(err, list) {
        if (err) return done(err);

        var pending = list.length;

        if (!pending) return done(null, results);

        list.forEach(function(file){
            file = path.resolve(dir, file);

            fs.stat(file, function(err, stat){
                // If directory, execute a recursive call
                if (stat && stat.isDirectory()) {
                    // Add directory to array [comment if you need to remove the directories from the array]
                    results.push(file);

                    filewalker(file, function(err, res){
                        results = results.concat(res);
                        if (!--pending) done(null, results);
                    });
                } else {
                    results.push(file);

                    if (!--pending) done(null, results);
                }
            });
        });
    });
};

And it's usage is very simple:

filewalker("./some-existent-path", function(err, data){
    if(err){
        throw err;
    }
    
    // ["c://some-existent-path/file.txt","c:/some-existent-path/subfolder"]
    console.log(data);
});

This solution is perfect if you don't want to rely on a module to achieve something very simple and directly in your code.

B. Using the readdirp module

If your code isn't so simple, then the solution with a single snippet may be not enough for you due to the complexity of your code. In this case, you can use the readdirp module, yes readdirp not readdir. readdirp is a very useful module that exposes a recursive version of the readdir function available in the filesystem module of Node.js, besides it exposes a stream api.

To install this module in your project, execute the following command in your terminal:

npm install readdirp

The advantages of the usage of this module are very clear and it's not so simple as the first solution that we exposed in this article. This module allows you to filter your recursive exploration by file extension and by directory name, besides it allow you to set a depth (max value of subfolders to explore in the providen directory). It works in the following way, you need to require the readdirp module that is basically a function. With this function you will be able to iterate recursively through a folder path, it requires an object that specifies the settings, you can see all the available options of readdirp here:

var settings = {
    root: './a-folder-to-explore',
    entryType: 'all',
    // Filter files with js and json extension
    fileFilter: [ '*.js', '*.json' ],
    // Filter by directory
    directoryFilter: [ '!.git', '!*modules' ],
    // Work with files up to 1 subdirectory deep
    depth: 1
};

Basically you only need to provide the root property that indicates which directory will be explored.

The module offers 2 ways to be used, the first is with callbacks:

// Import the module
var readdirp = require('readdirp');

var settings = {
    root: './your-folder-path',
    entryType: 'all'
};

// In this example, this variable will store all the paths of the files and directories inside the providen path
var allFilePaths = [];

// Iterate recursively through a folder
readdirp(settings,
    // This callback is executed everytime a file or directory is found inside the providen path
    function(fileInfo) {
        
        // Store the fullPath of the file/directory in our custom array 
        allFilePaths.push(
            fileInfo.fullPath
        );
    }, 

    // This callback is executed once 
    function (err, res) {
        if(err){
            throw err;
        }

        // An array with all the fileEntry objects of the folder 
        // console.log(res);
        console.log(allFilePaths);
        // ["c:/file.txt",""]
    }
);

Or through the stream API:

// Import the module
var readdirp = require('readdirp');

var settings = {
    root: './',
    entryType: 'all'
};

// In this example, this variable will store all the paths of the files and directories inside the providen path
var allFilePaths = [];

// Iterate recursively through a folder
readdirp(settings)
    .on('data', function (entry) {
        // execute everytime a file is found in the providen directory

        // Store the fullPath of the file/directory in our custom array 
        allFilePaths.push(
            entry.fullPath
        );
    })
    .on('warn', function(warn){
        console.log("Warn: ", warn);
    })
    .on('error', function(err){
        console.log("Error: ", err);
    })
    .on('end', function(){

        console.log(allFilePaths);
        // ["c:/file.txt","c:/other-file.txt" ...]
    })
;

Every fileEntry object, has the following structure, so you will get not only the full path but more useful information about the file or directory:

{
    name: 'index.js',
    path: 'node_modules\\string_decoder\\index.js',
    fullPath: 'C:\\Users\\sdkca\\Desktop\\node-workspace\\node_modules\\string_decoder\\index.js',
    parentDir: 'node_modules\\string_decoder',
    fullParentDir: 'C:\\Users\\sdkca\\Desktop\\node-workspace\\node_modules\\string_decoder',
    stat:
    Stats {
        dev: -469691281,
        mode: 33206,
        nlink: 1,
        uid: 0,
        gid: 0,
        rdev: 0,
        blksize: undefined,
        ino: 562949954035272,
        size: 7796,
        blocks: undefined,
        atime: 2017 - 03 - 31T18: 27:30.703Z,
        mtime: 2017 - 03 - 31T18: 27:30.724Z,
        ctime: 2017 - 03 - 31T18: 27:30.724Z,
        birthtime: 2017 - 03 - 31T18: 27:30.703Z 
    }
};

For more information, please visit the repository of the module in Github here.

C. Using klaw and klaw-sync modules

Originally, a lot of developers used to rely on the wrench module (and many still rely on), however it's now officialy deprecated and we love to promote standards, you should not use it anymore (we do not prohibit it, so feel free to explore the module if you want but think on that's deprecated). The project now recommends to use the fs-extra module, however the module doesn't support the walk() and walkSync() functions anymore (reason why the developers used the wrench module to explore recursively a directory).

As the recursive functions aren't available anymore, the fs-extra module recommends to use the klaw module. This module exposes an asynchronous Node.js file system walker with a Readable stream interface originally extracted from the fs-extra module.

To install this module, execute the following command in your terminal:

npm install klaw

Klaw is very easy to use and customizable. To loop recursively through a directory, use the following snippet:

// Import the klaw module
var klaw = require('klaw');

// an array to store the folder and files inside
var items = [];

var directoryToExplore = "./some-folder";

klaw(directoryToExplore)
    .on('data', function (item) {
        items.push(item.path)
    })
    .on('end', function () {
        console.log(items);
    })
    .on('error', function (err, item) {
        console.log(err.message)
        console.log(item.path) // the file the error occurred on
    })    
;

As it's asynchronous, you need to rely on the end callback to do whatever you want with the list of found files and directories in the providen directory.

For more information about the asynchronous klaw module, please visit the official repository in Github here.

If you need the same functionality but synchronous, you can use the klaw-sync module. klaw-sync is a Node.js recursive file system walker, which is the synchronous counterpart of klaw. It lists all files and directories inside a directory recursively and returns an array of objects that each object has two properties: path and stats. path is the full path of the file or directory and stats is an instance of fs.Stats.

To install this module, execute the following command in your terminal:

npm install klaw-sync

The synchronous version of klaw is so easy to use as the asynchronous version, however it's more customizable than its counterpart. You can use the following snippet to explore a directory:

// Require the module
var klawSync = require('klaw-sync');

// Create an empty variable to be accesible in the closure
var paths;

// The directory that you want to explore
var directoryToExplore = "./folder-to-explore";

try {
    paths = klawSync(directoryToExplore);
} catch (err) {
    console.error(err);
}

// [
//   {path:"c:/file.txt", stats: {..File information..}},
//   {path:"c:/file.txt", stats: {..File information..}},
//   {path:"c:/file.txt", stats: {..File information..}},
// ]
console.log(paths);

klaw-sync allows you to filter directories and files by extension, name. Besides you can search only for directories or files by setting up the options:

var klawSync = require('klaw-sync');

var directoryToExplore = "./some-folder";

var files = klawSync(directoryToExplore, {
    nodir: true
});

// [
//   {path:"c:/file.txt", stats: {..File information..}},
//   {path:"c:/file2.txt", stats: {..File information..}},
//   {path:"c:/file3.txt", stats: {..File information..}},
// ]
console.log(files);
 
var paths = klawSync(directoryToExplore, {
    nofile: true
});

// [
//   {path:"c:/folder", stats: {..Folder information..}},
//   {path:"c:/folder2", stats: {..Folder information..}},
//   {path:"c:/folder3", stats: {..Folder information..}},
// ]
console.log(paths);

For more information about the synchronous klaw module, please visit the official repository in Github here.

Klaw stands for walk (but backwards) and it turned out (as of January 25, 2017) for the most cases that klaw and klaw-sync is faster than other modules.

Happy coding !

Become a more social person