百度空间 | 百度首页 
 
查看文章
 
corba传值中文编码问题
2009-06-10 23:53

利用corba进行分布式传值,传值如果是英文则显示正常,如果含有中文则报错:WARNING: "IOP02400001: (DATA_CONVERSION) Character does not map to negotiated transmission code set" ...... 在百度查找了N久都没有找到结果,后来通过google发现一哥们也碰到了这个问题,并在sun上发帖了,通过他的启示把问题解决了,还是全球化得东西好使。现把论坛帖子记录如下:

A:

Here is the situation.
Using JDK/JRE 1.5.0_09.
Using Sun ORB on my client PC, which is running Windows XP Professional Version 2002, Service Pack 2.
My program connects and communicates to another device that uses omniORB from SourceForge.
Things seem to work fine (with the CORBA communications) as long as I stick to languages that can be represented in ISO 8859-1.

I used the Windows "Regional and Language Options", and installed the files for East Asian languages.
I then changed the "Default input language" at startup time to be "Chinese (Singapore) - Chinese (Simplified) - US Keyboard", and rebooted.

If I try to send some Chinese text through the ORB interface, I get this error:
WARNING: "IOP02400001: (DATA_CONVERSION) Character does not map to negotiated transmission code set"
org.omg.CORBA.DATA_CONVERSION: vmcid: OMG minor code: 1 completed: No
at com.sun.corba.se.impl.logging.OMGSystemException.charNotInCodeset(OMGSystemException.java:2093)
at com.sun.corba.se.impl.logging.OMGSystemException.charNotInCodeset(OMGSystemException.java:2111)
at com.sun.corba.se.impl.encoding.CodeSetConversion$JavaCTBConverter.convertCharArray(CodeSetConversion.java:259)
at com.sun.corba.se.impl.encoding.CodeSetConversion$JavaCTBConverter.convert(CodeSetConversion.java:206)
at com.sun.corba.se.impl.encoding.CDROutputStream_1_0.writeString(CDROutputStream_1_0.java:478)
at com.sun.corba.se.impl.encoding.CDROutputStream_1_0.write_string(CDROutputStream_1_0.java:467)
at com.sun.corba.se.impl.encoding.CDROutputStream.write_string(CDROutputStream.java:153)
at vdm._ITaskStub.SetDescription(_ITaskStub.java:2214) // this last line is in our code

When I step into the org.omg.CORBA_2_3.ORB.init() call with my debugger, the (Sun) client ORB is deciding to use
a native code set of ISO 8859-1, and conversion sets of UTF-8 and ISO 646. This is for char data.
The omniOrb comes up with a native code set of ISO 8859-1 and a conversion set of UTF-8.
The negotiation is simple, they decide to use ISO 8859-1, since it is native to both.
The problem is, my Chinese characters aren't in that code set.
My question is: why is my ORB deciding to use ISO 8859-1 as the native code set when I have my language set to Chinese?
Am I forgetting to do something?
Is the Sun ORB implementation lacking in this area?
Any ideas would be appreciated.

B:

Your default input language has nothing to do with the default encoding of the system. For that to change you need to pick the Language for non-Unicode programs on the Advanced tab (used to be Default system code page or something like that).

A:

I tried changing the "Language for non-Unicode programs" on the "Advanced" tab of "Regional and Language Options" to "Chinese (Singapore)". It did not seem to make any difference.
You are right in saying that what I am changing is not causing the default encoding to change. I guess the question is: what do I have to do to get this to happen?

B:

I don't know anything about CORBA, but is there a setting in Sun's version similar to this:

com.ibm.CORBA.ORBCharEncoding: (default: ISO8859_1)

Specifies the ORB's native encoding set for character data.

This is from IBM's specification, but I would assume Sun has a similar setting?

http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/html/orb_using.html

A:

Posting this on the chance that it will cause a bell to ring in somebody's head. Feel free to throw out ideas.
After I made the change in the previous post, I added a Locale.getDefault() in the first line of my main, and it reports that I am running en-US.
???
So I added a line:
Locale.setDefault(Locale.CHINA);
ahead of it, now it reports zh-CN.
This all happens well before org.omg.CORBA_2_3.ORB.init();
I still get the same error as stated in the first post, but at least now the OK button on the MessageBox has Chinese characters on it.

This is the code that calls Orb.init.

String interfaceIpAddress = "192.168.1.1";
java.util.Properties p = new java.util.Properties();
p.put("com.sun.CORBA.ORBServerHost", interfaceIpAddress);
final String[] args = null;
orb = org.omg.CORBA_2_3.ORB.init(args, p);

Per your suggestion, I found this in ORBConstants.java:

// The CHAR_CODESETS and WCHAR_CODESETS allow the user to override the default
// connection code sets. The value should be a comma separated list of OSF
// registry numbers. The first number in the list will be the native code
// set.
//
// Number can be specified as hex if preceded by 0x, otherwise they are
// interpreted as decimal.
//
// Code sets that we accept currently (see core/OSFCodeSetRegistry):
//
// char/string:
//
// ISO8859-1 (Latin-1) 0x00010001
// ISO646 (ASCII) 0x00010020
// UTF-8 0x05010001
//
// wchar/string:
//
// UTF-16 0x00010109
// UCS-2 0x00010100
// UTF-8 0x05010001
//
// Note: The ORB will let you assign any of the above values to
// either of the following properties, but the above assignments
// are the only ones that won't get you into trouble.
public static final String CHAR_CODESETS = SUN_PREFIX "codeset.charsets";
public static final String WCHAR_CODESETS = SUN_PREFIX
"codeset.wcharsets";

I'll have to get back to this tomorrow, I'll let you know what I find.

In case anyone is wondering, I got it to work, here's how:

For the ORB on the client side (the Sun one) I added:


       java.util.Properties p = new java.util.Properties();
       p.setProperty("com.sun.CORBA.codeset.charsets", "0x05010001, 0x00010109"); // UTF-8, UTF-16
       p.setProperty("com.sun.CORBA.codeset.wcharsets", "0x00010109, 0x05010001"); // UTF-16, UTF-8
       orb = org.omg.CORBA_2_3.ORB.init(args, p);


ALSO, for the ORB on the server side (OmniOrb) I also had to change the native code set from ISO-8859-1 to UTF-8, else the two would get together and negotiate themselves into talking ISO-8859-1.

Now everything seems to be working fine.

ALSO, for the ORB on the server side (OmniOrb) I also had to change the native code set from ISO-8859-1 to UTF-8, else the two would get together and negotiate themselves into talking ISO-8859-1.

Now everything seems to be working fine.

The battle continues on a different front now.

Things work fine when making calls that are defined as passing strings, even when those Strings contain multibyte characters, since we have the 2 ORBs talking UTF-8.

They don't when making calls that are defined as passing an Any type that happens to be a String, that contains multibyte characters.

Specifically, I cannot seem to be able to create an Any from a String that contains multibyte characters.

// I get an Any from my UTF-8 initialized orb:
org.omg.CORBA.Any myAny = myOrb.create_any();

// I insert my String into it...
WTStringHelper.insert(myAny, theString);

// Here is the code for WTStringHelper
// Generated by the IDL-to-Java compiler (portable), version "3.1"
public static void insert (org.omg.CORBA.Any a, String that)
{
org.omg.CORBA.portable.OutputStream out = a.create_output_stream ();
a.type (type ());
write (out, that);
a.read_value (out.create_input_stream (), type ());
}
public static void write (org.omg.CORBA.portable.OutputStream ostream, String value)
{
ostream.write_string (value);
}

The problem is that a.create_output_stream (), which is found in
com.sun.corba.se.impl.corba.AnyImpl.java creates an AnyOutputStream which extends EncapsOutputStream, which is hard coded to use ISO-8859-1 encoding. Therefore, when I get to the line ostream.write_string (value), I get an unmappable char exception thrown.

There is this comment at the top of the com.sun.corba.se.impl.encoding.EncapOutputStream.java file:

/
Encapsulations are supposed to explicitly define their
code sets and GIOP version. The original resolution to issue 2784
said that the defaults were UTF-8 and UTF-16, but that was not
agreed upon.

These streams currently use CDR 1.2 with ISO8859-1 for char/string and
UTF16 for wchar/wstring. If no byte order marker is available,
the endianness of the encapsulation is used.

When more encapsulations arise that have their own special code
sets defined, we can make all constructors take such parameters.
/

I believe that if this class was changed to use UTF-8 encoding, everything would work, as it is now, there does not seem to be an easy way to create an Any from a String if it has multibyte characters in it.
Is this a bug?
Does someone have a work-around?
Am I missing something?

参考链接:http://forums.sun.com/thread.jspa?threadID=5147616&tstart=0


类别:Corba | 添加到搜藏 | 浏览() | 评论 (0)
 
最近读者:
 
网友评论:
发表评论:
姓 名:
网址或邮箱: (选填)
内 容:
验证码: 请点击后输入四位验证码,字母不区分大小写
      

     

©2009 Baidu