Invalid html, broked markup and other undesirable side-effects of work with html strings without being escaped properly in Javascript, is a problem that at least 1 of every 5 web developers (that works with dynamic apps) have faced.
Javascript itself doesn't provide native methods to deal with it, unlike PHP (our beautiful server side language) which offers the htmlentities
, html_entity_decode
and html_entity_encode
functions ready to use.
Encode and decode everything
If you're one of those psychotic (just like me) developers that doesn't like to add huge portion of code in their projects, you may want to use the following snippet.
This piece of code works like a charm in both ways, encode and decode. It expects as first parameter the string (decoded or encoded acording to the method) and returns the processed string.
It doesn't provide too much customization but it works fine (at less to have only a couple of lines). Note that the encode method, will convert every single character into its html character.
If you want to replace only those weird characters that broke your html (<,>,/,\ etc) keep reading and don't use this method, otherwise this snippet comes in handy.
(function(window){
window.htmlentities = {
/**
* Converts a string to its html characters completely.
*
* @param {String} str String with unescaped HTML characters
**/
encode : function(str) {
var buf = [];
for (var i=str.length-1;i>=0;i--) {
buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
}
return buf.join('');
},
/**
* Converts an html characterSet into its original character.
*
* @param {String} str htmlSet entities
**/
decode : function(str) {
return str.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
}
};
})(window);
The previous code creates a global variable (in the window) named htmlentities. This object contains the 2 methods encode and decode.
To convert a normal string to its html characters use the encode method :
htmlentities.encode("Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters.");
// Output
"Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."
To convert an encoded html string to readable characters, use the decode method :
htmlentities.decode("Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters.");
// Output
"Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."
Note : feel free to copy every single function and include it in your project as you wish.
Using a library
As a task that is not easy to achieve, there is an awesome library that will solve this issue for you.
He.js (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and contrary to many other JavaScript solutions, he handles astral Unicode symbols just fine. An online demo is available.
Encode
This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and &
, <
, >
,"
, '
, and `
, replacing them with character references.
// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz ???? qux');
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly encode all symbols:
he.encode('foo © bar ≠ baz ???? qux', {
'encodeEverything': true
});
// → 'foo © bar ≠ baz 𝌆 qux'
// This setting can be combined with the `useNamedReferences` option:
he.encode('foo © bar ≠ baz ???? qux', {
'encodeEverything': true,
'useNamedReferences': true
});
// → 'foo © bar ≠ baz 𝌆 qux'
Decode
This function takes a string of HTML and decodes any named and numerical character references in it using the algorithm described in section 12.2.4.69 of the HTML spec.
he.decode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz ???? qux'
Have fun