cheeriojs Chinese Parsed to Unicode Issue When Loading HTML

cheeriojs Parsing Chinese to Unicode Issue

When using cheeriojs, I found that whenever using the html method, Chinese will be parsed to unicode by default.

Usage is as follows:

var cheerio = require('cheerio');
var $ = cheerio.load('<title>我是中文,我将会被解析成unicode</title>');
console.log($('title').html());

When using the text method, the above problem doesn’t occur

var cheerio = require('cheerio');
var $ = cheerio.load('<title>我是中文,我将会被解析成unicode</title>');
$('title').text()

Solution

Default Configuration

When we load html content, cheerio actually has default configuration. HTML parsing uses the htmlparser2 library, so htmlparser2 configuration is also applicable in cheerio.

var cheerio = require('cheerio');
var $ = cheerio.load('<title>我是中文,我将会被解析成unicode</title>',{
    withDomLvl1: true,
    normalizeWhitespace: false,
    xmlMode: false,
    decodeEntities: true
});

Modify Default Configuration

We just need to change decodeEntities to false to solve our problem.

{
    decodeEntities: false
}

Like this

var cheerio = require('cheerio');
var $ = cheerio.load('<title>我是中文,我将不会被解析成unicode</title>',{
    decodeEntities: false
});

Article Link:

https://alili.tech/en/archive/lbpnt17e1sc/

# Latest Articles