Skip to content

Commit 9104e01

Browse files
committed
Fixed an error when characters are non-UTF-8 and added flags
Added mb_convert_encoding() to encode all of the HTML. Added flags to the HTML parser.
1 parent c94a730 commit 9104e01

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

includes/utils/getData.php

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,12 @@ function getDOM(string &$content) {
1818

1919
$dom->validateOnParse = false;
2020

21-
libxml_use_internal_errors(true);
22-
$dom->loadHTML($content);
23-
libxml_clear_errors();
21+
/**
22+
* mb_convert_encoding() is needed as loadHTML itself
23+
* does not use UTF-8 and characters like ä are
24+
* returned as a wrong character.
25+
*/
26+
$dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'), LIBXML_BIGLINES | LIBXML_COMPACT | LIBXML_NOERROR );
2427

2528
return $dom;
2629
}

0 commit comments

Comments
 (0)