A couple of months ago, I had to modify dynamically every article of the blog to include the lazy loading feature in images of articles that were written a couple of years ago. I used the hardly criticized DOMDocument class of PHP to do the job and ended up discovering why it's so hated. Besides not parsing correctly HTML5 elements, you will have a problem with the character encoding in most of the cases with special characters and emojis, for example, check the following PHP code that should simply parse the given HTML and print it again, without doing absolutely nothing:
The content printed by the last line, extracted from the saveHTML
method of the DOMDocument will mess up the output in the document, just like the first image of the article. If you test it as well, it won't look at all like the HTML we provided at the beginning, isn't it? In this article, I will show you some options that you may use to export properly the parsed HTML by the DOMDocument class of PHP.
A. utf8_decode the documentElement
In this approach, you will use the same code as usual, however, be sure to pass as the first argument of the saveHTML
method, the current document model, and use utf8_decode
to convert the string with ISO-8859-1 characters encoded with UTF-8:
B. Use the SmartDocument hack
The SmartDocument class is a class that overcomes a few common annoyances with the DOMDocument class of PHP. The source code of the project can be found in this repository at Github. You won't need to include the class in your document, but just as the code of the SmartDocument specifies while loading the HTML, before loading it into the DOMDocument, the HTML encoding needs to be changed using mb_convert_encoding
just like we'll do in the following example:
Happy coding ❤️!
0 Comments