Convert from UTF-8 to unicode c++(从 UTF-8 转换为 unicode C++)
问题描述
如何在 c++ 应用程序中转换 ú,其中应用程序接收字符为 UTF-8 编码 %C3%BA 并将其存储为 unicode 等效 %FA.我只想知道我将如何编写代码来执行此编码过程
How do I convert ú within a c++ application where the application receives the character as UTF-8 encoding %C3%BA and store it as the unicode equivalent %FA. I just want to know how I would go about writing code to perform this encoding process
推荐答案
我昨天刚刚写了一些代码来做到这一点...
I just wrote some code to do this yesterday...
我并不是说这是做到这一点的完美"方式,但它似乎适用于我运行过的所有测试用例(我为此编写了两个方向).
I'm not saying this is the "perfect" way to do this, but it appears to work for all testcases I've run through it (I wrote both directions for that purpose).
我会让你把 "%NN" 转换成一个整数值.
I'll leave it to you to translate "%NN" to an integer value.
#include <iostream>
#include <deque>
std::deque<int> unicode_to_utf8(int charcode)
{
std::deque<int> d;
if (charcode < 128)
{
d.push_back(charcode);
}
else
{
int first_bits = 6;
const int other_bits = 6;
int first_val = 0xC0;
int t = 0;
while (charcode >= (1 << first_bits))
{
{
t = 128 | (charcode & ((1 << other_bits)-1));
charcode >>= other_bits;
first_val |= 1 << (first_bits);
first_bits--;
}
d.push_front(t);
}
t = first_val | charcode;
d.push_front(t);
}
return d;
}
int utf8_to_unicode(std::deque<int> &coded)
{
int charcode = 0;
int t = coded.front();
coded.pop_front();
if (t < 128)
{
return t;
}
int high_bit_mask = (1 << 6) -1;
int high_bit_shift = 0;
int total_bits = 0;
const int other_bits = 6;
while((t & 0xC0) == 0xC0)
{
t <<= 1;
t &= 0xff;
total_bits += 6;
high_bit_mask >>= 1;
high_bit_shift++;
charcode <<= other_bits;
charcode |= coded.front() & ((1 << other_bits)-1);
coded.pop_front();
}
charcode |= ((t >> high_bit_shift) & high_bit_mask) << total_bits;
return charcode;
}
int main()
{
int charcode;
for(;;)
{
std::cout << "Enter unicode value:" << std::endl;
std::cin >> charcode;
auto x = unicode_to_utf8(charcode);
for(auto c : x)
{
std::cout << "\x" << std::hex << c << " ";
}
std::cout << std::endl;
int c = utf8_to_unicode(x);
std::cout << "reversed:" << std::dec << c << std::hex << " in hex:" << c << std::endl;
}
}
这篇关于从 UTF-8 转换为 unicode C++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从 UTF-8 转换为 unicode C++
基础教程推荐
- 通过引用传递 C++ 迭代器有什么问题? 2022-01-01
- 我应该对 C++ 中的成员变量和函数参数使用相同的名称吗? 2021-01-01
- 为什么 typeid.name() 使用 GCC 返回奇怪的字符以及如 2022-09-16
- CString 到 char* 2021-01-01
- 为什么 RegOpenKeyEx() 在 Vista 64 位上返回错误代码 2021-01-01
- 初始化列表*参数*评估顺序 2021-01-01
- 非静态 const 成员,不能使用默认赋值运算符 2022-10-09
- GDB 显示调用堆栈上函数地址的当前编译二进制文 2022-09-05
- 为什么派生模板类不能访问基模板类的标识符? 2021-01-01
- 如果我为无符号变量分配负值会发生什么? 2022-01-01
