Chinese character encoding? |
try to add this filter to your web.xml:
<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
and map it:
<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
i had a similar problem and this solved it.
|
Google translate API v2 encoding Chinese character and guide |
I have no clue, maybe try using the unicode?
橘,色,灰色,深,蓝,绿色
results in
橘,色,灰色,深蓝,绿,色
|
Chinese and Japanese character encoding issues when exporting HTML to PDF |
It sounds like it might be an issue with the fonts on the server. The
webpage version of the timeline renders correctly because you obviously
have the correct font on the client machine that is running the browser.
The PDF on the other hand is generated on the server, and thus has to use a
font available to it there.
If that's the case, then using a font that both exists on the server and
supports the correct CJK characters should fix this issue.
|
Dreaded ???? instead of characters when displaying chinese text. Simplified Chinese Character issue |
The first problem is that setting the LC_TIME category is not enough by
itself. The strftime function will attempt to format the time in the
appropriate language, but it seems to use the 1252 codepage by default, so
it will be incapable of producing the necessary characters for something
like Chinese.
I would have expected that setting LC_TIME and maybe LC_CTYPE would be
enough, but the only way I could get it to work was with LC_ALL. So your
setlocale calls should look like this:
if($language->language == 'zh-hans')
{
$loc=setlocale(LC_ALL, 'chs');
}else{
$loc=setlocale(LC_ALL, 'de');
}
The next problem is that mb_detect_encoding won't always detect the correct
encoding. What works for me on Windows is to use setlocale(LC_CTYPE, 0) to
get back the current code page.
$codep
|
C# chinese Encoding/Network |
This anser is "promoted" (by request from the Original Poster) from
comments by myself.
In the .NET Framework, strings are already Unicode strings.
(Don't test Unicode strings by writing to the console, though, since the
terminal window and console typically won't display them correctly.
However, since .NET version 4.5 there is some support for this.)
The thing to be aware of is the Encoding when you get text from an outside
source. In this case, the constructor of BinaryReader offers an overload
that takes in an Encoding:
using (var binaryReader = new BinaryReader(yourStream,
Encoding.GetEncoding("GB18030")))
...
On the SQL Server, be sure that any column that needs to hold Chinese
strings is of type nvarchar (or nchar), not just varchar (char). Otherwise,
depending on the colla
|
URL Encoding - Chinese Charcters |
Your website must use UTF-8. In HTML5 that would be (HTML):
<meta charset="UTF-8">
When you connect to your Database, be sure to use (PHP):
$mysqli->set_charset("utf8");
Finally, your table definition that stores Chinese characters must also
have UTF-8 collation (SQL):
`column name` TEXT CHARACTER SET utf8
|
jsf facelet persian character encoding error |
Every things look fine! You should check the following:
a) Check whether your Glassfish Resources has following inside :
<property name="useUnicode" value="true"/>
<property name="characterEncoding" value="UTF8"/>
b) Whether your database and the table are UTF-8 CHARCTER SET
for mysql use following code:
ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;
USE dbname;
ALTER TABLE tblname CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
|
File encoding for English and Chinese text |
Answering the question in your last comment, here's how to convert from one
encoding to another encoding:
#!/usr/bin/perl
use strict;
use warnings;
sub read_encoded {
my $file_name = shift;
my $encoding = shift;
my $content;
if ( open my $fh, "<:encoding($encoding)", $file_name ) {
$content = do {
local $/;
<$fh>;
};
}
else {
die "Could not open $file_name: $!";
}
return $content;
}
sub write_file {
my $file_name = shift;
my $content = shift;
if ( open my $fh, '>:encoding(UTF-8)', $file_name ) {
print $fh $content;
}
else {
die "Could not open $file_name: $!";
}
}
my $content1 = read_encoded( 'file1.txt', 'latin-1' );
my $content2 = read_encoded( 'fi
|
Solving error: unmappable character for encoding UTF8 |
These two lines do not start or perform re-encoding:
open(my $INFILE, '<:encoding(cp1252)', $filename) or die $!;
open(my $OUTFILE, '>:encoding(UTF-8)', $filename) or die $!;
Opening a file with > truncates it, which deletes the content. See the
open documentation for further details.
Rather, you have to read the data from the first file (which automatically
decodes it), and write it back to another file (which automatically encodes
it). Because source and target file are identical here, and because of the
quirks of file handling under Windows, we should write our output to a temp
file:
use autodie; # automatic error handling :)
open my $in, '<:encoding(cp1252)', $filename;
open my $out, '>:encoding(UTF-8)', "$filename~"; # or however you'd
like to call the tempfi
|
encoding='utf-8' raise UnicodeEncodeError when opening utf-8 file with Chinese char |
The exception you see comes from printing your data. Printing requires that
you encode the data to the encoding used by your terminal or Windows
console.
You can see this from the exception (and from the traceback, but you didn't
include that); if you have a problem with decoding data (which is what
happens when you read from a file) then you would get a UnicodeDecodeError,
you got a UnicodeEncodeError instead.
You need to either adjust your terminal or console encoding, or not print
the data
See http://wiki.python.org/moin/PrintFails for troubleshooting help.
|
UTF-8 Encoding name in downloaded file |
Use method setCharacterEncoding:
Sets the character encoding (MIME charset) of the response being sent
to the client, for example, to UTF-8. If the character encoding has
already been set by setContentType(java.lang.String) or
setLocale(java.util.Locale), this method overrides it. Calling
setContentType(java.lang.String) with the String of text/html and
calling this method with the String of UTF-8 is equivalent with
calling setContentType with the String of text/html; charset=UTF-8.
This method can be called repeatedly to change the character encoding.
This method has no effect if it is called after getWriter has been
called or after the response has been committed.
Modify your code with following:
response.setContentType("application/ms-excel; charset=UTF-8");
|
using scrapy crawl non-unicode asian language (chinese, for example) website encoding |
You should use unicode.encode to convert content from a unicode object to a
str object using whatever encoding you wish for your output file. Using
your example content:
>>> content =
[u'u4e00u5927u5b66u751fu88abu654cu4ebau6293u4e86uff0cu654cu4ebau628au4ed6u7ed1u5728u4e86u7535u7ebfu6746u4e0auff0cu7136u540e...']
>>> print content[0]
一大学生被敌人抓了,敌人把他绑在了电线杆上,然后...
>>> content_utf8 = content[0].encode('utf8')
>>> content_utf8[:10]
'xe4xb8x80xe5xa4xa7xe5xadxa6xe7'
>>> print content_utf8
一大学生被敌人抓了,敌人把他绑在了电线杆上,然后...
Then you can open the file and write the str object (content_utf8 in the
code above).
|
Add ' ' before chinese character |
This is surely not the best solution, but one approach would be to match a
string of ASCII characters via [x00-x7F]+ followed by a non-ASCII sequence
(same pattern negated with ^). It does not target Chinese specifically, but
that is tricky owing to the varied ranges of Chinese Unicode characters.
$string = 'Hello World 自立合作社';
// Capture ASCII sequence into $1 and non-ASCII into $2
echo preg_replace('/([x00-x7F]+)([^x00-x7F]+)/', '$1<br/>$2',
$string);
// Prints:
// Hello World
// 自立合作社
http://codepad.viper-7.com/1kqpOx
Actually, here's an improved version that does specifically target Chinese
characters via p{Han}. The $2 capture also includes s for whitespace.
// This matches any non-Chinese in $1 followed by Chinese (and whitespace)
in $2
echo preg_rep
|
urlencoding Chinese character to url |
I'm working on the subject and, to avoid that, I decided to force my
Chinese users to write address in english with occidental chars.
Like this :
Here the menu editor with the "on the fly" friendly url builder :
I went on Badu with Firefox, Chrome and IE.
As you can see, Firefox is able to display Chinese symbols in address bar
but not Chrome and IE so, I ask myself, what the best for SEO ?
Use occidental chars to display an address in english ?
Use codes as you do ?
Use occidental chars but write phonetic chinese symbol as in the chinese
input system ?
Now I choose the first option but I'm not sure it's the best one.
here the 3 captures :
|
Chinese character 数 encodes into too many bytes |
The most likely things are either:
There's an issue with the encoding of your source file, or
You have "invisible" characters prior to the 数 in it.
You can check both of those by completely deleting the string literal on
this line:
String s = new String("数");
so it looks like this (note I removed the quotes as well as the character):
String s = new String();
and then adding back "u6570" to get this:
String s = new String("u6570");
and seeing if your output changes (as 数 is Unicode code point U+6570 and
so that escape sequence should be the same character). If it changes,
either there's an encoding problem or you had invisible characters in the
string prior to the character. You can probably differentiate the two cases
by then adding back just that character (via copy and
|
Convert pinyin to Chinese Character |
The simplest way to do this is use javachinesepinyin, a lightweight Chinese
Pinyin Input Method.
You can find related code here.
private String[] pinyinToWord(String[] o) {
Result ret = null;
try {
ret = ptw.labelStateOfNodes(Arrays.asList(o));
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
Map<Double, String> results = new HashMap<Double,
String>();
if (null != ret && ret.states() != null) {
for (int pos = 0; pos < ret.states()[o.length - 1].length;
pos++) {
StringBuilder sb = new StringBuilder();
int[] statePath = Viterbi.getStatePath(ret.states(),
ret.psai(), o.length - 1, o.length, pos);
for (int state : statePath) {
Character name = ptw.getState
|
Convert Chinese character to Unicode using PHP |
function chineseToUnicode($str){
//split word
preg_match_all('/./u',$str,$matches);
$c = "";
foreach($matches[0] as $m){
$c .=
"&#".base_convert(bin2hex(iconv('UTF-8',"UCS-4",$m)),16,10);
}
return $c;
}
//Print result
$str =
"雨傘運動或稱雨傘革命、遮打運動、遮打革命(英語:Umbrella
Movement 或 Umbrella
Revolution)是於2014年9月26日起在香港為爭取真普選而發起的一系列公民抗命,逾60-70萬(人次)香港市民佔據多個主要商業區靜坐及示威,地點包括金鐘、中環、灣仔、銅鑼灣、旺角和尖沙咀,旨在要求包括撤回中國全國人大常委會(人大常委)所確定之2017年行政長官選舉及2016年立法會選舉框架和候選人提名方案,爭取行政
|
Chinese character to ASCII or Hexadecimal |
Your Java code is just doing this:
Take each 16-bit character of the string and add 0xf100 to it.
If you do the same thing in your above Objective-C code you will get the
result you want.
|
Chinese Character Issue in AJAX with C# & Jquery |
I found solution .... It is not working with Generic Handler but it works
fine with webservice without any problem.
[WebMethod]
[ScriptMethod(UseHttpGet = false, ResponseFormat =
ResponseFormat.Json)]
public Handler1.TicketResponse HelloWorld()
{
var ticketResponse = new Handler1.TicketResponse();
ticketResponse.AddedCount = 23;
// All tickets were available and were added to the cart
ticketResponse.Success = true;
ticketResponse.SuccessItems = new
List<Handler1.SuccessfullItem>
{
new
Handler1.SuccessfullItem()
{
OrderItemId = 1,
|
Emacs displays chinese character if I open xml file |
By figure B, it looks like this file is encoded with a mixture of
big-endian and little-endian UTF-16. It starts with fe ff, which is the
byte order mark for big-endian UTF-16, and the XML declaration (<?xml
version=...) is also big-endian, but the part starting with <report is
little-endian. You can tell because the letters appear on even positions
in the first part of the hexl display, but on odd positions further down.
Also, there is a null character (encoded as two bytes, 00 00) right before
<report. Null characters are not allowed in XML documents.
However, since some of the XML elements appear correctly in figure A, it
seems that the confusion goes on through the file. The file is corrupt,
and this probably needs to be resolved manually.
If there are no non-ASCII char
|
A garbage character appears when using substr() on a Chinese language string |
Assuming that you are using PHP:
when using non ascii character you should use the multibyte functions, in
this case the function mb_substr. Remember to set the internal encoding to
use through the mb_internal_encoding function.
|
trying to export csv in microsoft excel with special character like chinese but failed |
Solution available here:
Open in notepad (or equivalent)
Re-Save CSV as Unicode (not UTF-8)
Open in Excel
Profit
Excel does not handle UTF-8. If you go to the import options for CSV files,
you will notice there is no choice for UTF-8 encoding. Since Unicode is
supported, the above should work (though it is an extra step). The
equivalent can likely be done on the PHP side (if you can save as Unicode
instead of UTF-8). This seems doable according to this page which suggests:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str,
'UTF-16LE', 'UTF-8');
|
How ensure downloaded pdf is displayed in iframe using firefox? |
There is a default PDF Viewer in FireFox 20.0 and above. Your need set
Preview In Firefox option in Portable Document Format. You might need to
set in Firefox for other Adobe Plugin if you have.
Don't forget to do multipart/form-data encoded requests.
|
Character encoding in R |
I found the answer my self. The problem was with the transformantion from
UTF-8 to the system locale (the default encoding in R) through
fileEncoding. As I use RStudio, I just changed the default encoding to
UTF-8 and removed the fileEncoding="UTF-8-BOM" from read.csv. Then, the
entire csv file was read and RStudio displays all characters correctly.
|
Eclipse character encoding |
The file you are reading must be containg UTF-8 or some other encoding
characters and when you try to print them on on console then you will get
some chracters as �'. This is because the defualt console encoding is not
UTF-8 in eclipse. You need to set it by going to Run Configuration ->
Common -> Encoding -> Select UTF-8 formthe drop down. Check belo
screenshot:
|
How to set character encoding in postscript? |
I highly recommend John Deubert's Acumen Journal.
http://www.acumentraining.com/acumenjournal.html
I'd suggest you look at November/December 2001.
|
Converting character encoding within c++ |
You can use the ICU library to convert between almost all usable encodings.
This library also provides lots of string manipulation facilities.
|
Jquery UTF-8 Character Encoding |
This link helped me to resolve this problem. It turned out that i need also
to change some configuration on my Tomcat Server (server.xml) in order to
successfully get the Japanese character parameter.
http://tech.top21.de/techblog/20100421-solving-problems-with-request-parameter-encoding.html
|
C++ File character encoding |
If have UTF-8, and you output to a Window expecting ISO 8859-1,
it's not going to work. If you have UTF-8 (which will be the
case if the global locale is still the default "C"), then you
can either change the window to code page 65001, or you must
convert the encoding before outputting.
With regards to portability, there is no real solution; what you
have to do depends on how the destination interprets the bytes
you output. Under Windows, you can change the code page; under
Unix systems (X Windows), it is the encoding of the font the
window uses which matters. In both cases, they can be different
for different windows on the same machine.
|
Tomcat character encoding |
Check the docs for InputStreamReader, they are very clear:
Creates an InputStreamReader that uses the default charset.
So unless you specify a charset explicitly it will use the system charset,
which seems to be different on the two machines. So always specify charsets
as a rule of thumb and you're fine, as you found out already.
|
Multiple character encoding in Magento |
The problem you're seeing with accent letters doesn't have anything to do
with the change you made in your database. The 'utf8_general_ci' collation
is the correct one for a Magento website, so you've properly set everything
up.
Importing new translations into a Magento website consists of uploading a
bunch of CSV files into a path under the root Magento installation folder,
in app/locale/[some_locale_code] or to the locale folder under one of your
active themes.
Problems with question marks instead of accent characters are usually the
result of editing these CSV files in MS Excel, or in notepad, and saving
them without regard to the chosen encoding. Re-open the original file, and
make sure to save it with as UTF8. Then, re-upload those files to your
Magento installation folder.
|
HTML Character Encoding replaces ’ with ’ |
Thanks in part to both @deceze and this SO question it looks like I just
need this meta tag at the top of the HTML file:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
|
character Encoding in my html jsp page |
Well that is a browser-side setting. You should set it on the server side
directly in the headers.
Put the following in your jsp as the first line:
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
|
Jasper Report Character Encoding in PDF |
Can you find out what text rendering engine Jasper-reports uses?
The font 'Myanmar3' relies heavily on OpenType Features to generate the
correct sequences of characters. At a fairly high level, a text rendering
engine relies on low level routines to handle correctly drawing of glyphs
inside the font. That is, at the highest level you give the command to draw
a certain text string. This string is decomposed into separate characters,
which are then possibly re-ordered or replaced with glyphs according to the
OTF rules inside the font. Only after that, a correct string of glyphs --
no longer 'characters' -- is sent back to be displayed.
From your description and screen shots it seems your browser can work with
this font, but Jasper-reports cannot. That is visible in your PDF: your
input str
|
jquery autocomplete character encoding |
It sounds like you're passing encoded data directly to your function.
The browser is requesting and being sent json data.
In other words the browser will not parse your data as html.
I suggest trying either something like this
$(#some_div>.html(returned data from ajax call);
Then setting the autocomplete to take as source some_div
Note: the jQuery .html function will unencode your data.
Or use a custom conversion function like this one htmlUnescape here
http://stackoverflow.com/a/7124052/2478282
|
Jersey REST Character encoding |
The problem was on the server side php code. I have a script, which was
already encoded in UTF-8. I have encoded the string "Bitte wählen" with
the php function utf8_encode() which led to my problem.
Thank you McDowell, your hints brought me on the right track!
The solution was not to use the function and just send the string "Bitte
wählen".
|
jQuery FormData - how to set character encoding? |
You're going about this the wrong way entirely. Do not try to change the
Content-Type of a MPE request to something else. Also, do not expect the
data that appears in the dev tools console to reflect the properly decoded
values. For example, if you are on a Mac, the tool you are using to
inspect the form data is likely defaulting to Mac's default character
encoding of Mac OS Roman. This does not necessarily indicate a problem
with the actual form data character encoding.
All you need to do is set the encoding of your page properly:
<meta charset="utf-8" />
If you do this, the form data will be UTF-8 encoded. You must decode it
properly server-side.
|
R on Windows: character encoding hell |
According to Wikipedia:
The byte order mark (BOM) is a Unicode character used to signal the
endianness (byte order) [...]
The Unicode Standard permits the BOM in UTF-8, but does not require nor
recommend its use.
Anyway in the Windows world UTF8 is used with BOM. For example the standard
Notepad editor uses the BOM when saving as UTF-8.
Many applications born in a Linux world (including LaTex, e.g. when using
the inputenc package with the utf8) show problems in reading BOM-UTF-8
files.
Notepad++ is a typical option to convert from encoding types, Linux/DOS/Mac
line endings and removing BOM.
As we know that the UTF-8 non-recommended representation of the BOM is the
byte sequence
0xEF,0xBB,0xBF
at the start of the text stream, why not removing it with R itself?
## Conv
|
character encoding with asp.net dynamic textbox |
Appending a string in your html, with runat="server" does not make it a
server control. You will have to add your control dynamically from code
behind, in page_init like this:
Add a PlaceHolder control:
<asp:PlaceHolder runat="server" ID="myPlaceHolder">
</asp:PlaceHolder>
Then this code in your Page_Init event to create the TextBox control:
protected void Page_Init(object sender, EventArgs e)
{
TextBox txt = new TextBox();
txt.ID = "myTxt";
myPlaceHolder.Controls.Add(txt);
}
To get the Control from the Page_Load event:
TextBox txt = (TextBox)myPlaceHolder.FindControl("myTxt");
now you can access the Text property like you would with any other control:
txt.Text
Couple of things. Adding controls dynamically could be sometimes a
painfully experience. Asp.
|
Character encoding of GET request parameter |
If the HTML is in Windows-1252 (or the "subset" ISO-8859-1), then %E4 is
okay.
If however the HTML is in Unicode, UTF-8, then not.
String auml = "u00e4";
String aumlPerc = URLEncoder(auml, "UTF-8");
URLDecoder.decode(aumlPerc, "UTF-8");
Besides the HTML page having charset UTF-8, you can have <form
accept-charset="UTF-8" ...>.
It seems the page erroneously sends %E4, is accepted as ISO-8859-1 (the
default), converted to a multi-byte UTF-8 sequence, but that then is
wrongly considered to be ISO-8859-1.
There are some screws to set the encoding, like request.setEncoding, but
with the limited information I cannot say where to look. Maybe this
information suffices.
|