C中字符串有很多种类，详情参考C中的字符串类型。本文主要以string类型为例，讲一下字符串的编码，选择string主要是因为： byte是字符串二进制编码的最小结构，字符串本质上就是一个byte数组； C++没有byte类型，第三方的byte类型通常是char实现的； char可以直接转换成string，也就是说byte直接转string。

C中字符串有很多种类，详情参考C中的字符串类型。本文主要以string类型为例，讲一下字符串的编码，选择string主要是因为：

byte是字符串二进制编码的最小结构，字符串本质上就是一个byte数组
C++没有byte类型，第三方的byte类型通常是char实现的
char可以直接转换成string，也就是说byte直接转string

代码转自utf8与std::string字符编码转换，其它编码格式的转换方法类似（先转双字节Unicode编码，再通过转换为其它编码的多字节），代码如下：

std::string UTF8_To_string(const std::string& str)
{
    int nwLen = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, NULL, 0);    
    wchar_t* pwBuf = new wchar_t[nwLen + 1];//加1用于截断字符串 
    memset(pwBuf, 0, nwLen * 2 + 2);

    MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), pwBuf, nwLen);

    int nLen = WideCharToMultiByte(CP_ACP, 0, pwBuf, -1, NULL, NULL, NULL, NULL);

    char* pBuf = new char[nLen + 1];
    memset(pBuf, 0, nLen + 1);

    WideCharToMultiByte(CP_ACP, 0, pwBuf, nwLen, pBuf, nLen, NULL, NULL);

    std::string retStr = pBuf;

    delete[]pBuf;
    delete[]pwBuf;

    pBuf = NULL;
    pwBuf = NULL;

    return retStr;
}

std::string string_To_UTF8(const std::string& str)
{
    int nwLen = ::MultiByteToWideChar(CP_ACP, 0, str.c_str(), -1, NULL, 0);

    wchar_t* pwBuf = new wchar_t[nwLen + 1];//加1用于截断字符串 
    ZeroMemory(pwBuf, nwLen * 2 + 2);

    ::MultiByteToWideChar(CP_ACP, 0, str.c_str(), str.length(), pwBuf, nwLen);

    int nLen = ::WideCharToMultiByte(CP_UTF8, 0, pwBuf, -1, NULL, NULL, NULL, NULL);

    char* pBuf = new char[nLen + 1];
    ZeroMemory(pBuf, nLen + 1);

    ::WideCharToMultiByte(CP_UTF8, 0, pwBuf, nwLen, pBuf, nLen, NULL, NULL);

    std::string retStr(pBuf);

    delete[]pwBuf;
    delete[]pBuf;

    pwBuf = NULL;
    pBuf = NULL;

    return retStr;
}

注：string使用的ANSI编码，在简体中文系统下ANSI编码代表GB2312编码。

MultiByteToWideChar和WideCharToMultiByte用法参考MultiByteToWideChar和WideCharToMultiByte用法详解
，方法的第一个参数是指定指针所指字符串内存的编码格式，内容如下：

Value	Description
CP_ACP	ANSI code page
CP_MACCP	Not supported
CP_OEMCP	OEM code page
CP_SYMBOL	Not supported
CP_THREAD_ACP	Not supported
CP_UTF7	UTF-7 code page
CP_UTF8	UTF-8 code page

两个方法都会调用两次，第一次调用最后一个参数（目标字符串长度）为0，方法返回目标字符串长度的长度。第二次调用时，最后一个参数传入目标字符串长度+1，直接在缓冲区写入转换后的字符串。

注：在linux下也有类似的两个函数：mbstowcs()、wcstombs()，使用方法参考https://blog.csdn.net/yiyaaixuexi/article/details/6174971。

C++字符串编码转换

相关文章

热门标签

最新文章