查看文章
 
最近那个BOM及浏览器charset识别
2011-02-13 23:13

http://127.0.0.1/bom.html

header设置为:Content-Type:text/html;charset=utf-8

页面内容:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=big5">

<script> 

alert(document.charset);//IE chrome

alert(document.defaultCharset);//IE chrome

alert(document.characterSet);//FF

</script>

记事本里另存为bom.html  编码为unicode  。也就是该页面的BOM为:FF FE

分别用IE Chrome  Opera Firefox访问这个页面。

可以发现,不仅仅IE(ie8下测试),连Chrome都是  忽略了header头,根据BOM来设置浏览器charset。只有 Opera  FF是首先根据header头里的charset。(2月14日更新了下: Opera以header头为主)

查阅了W3上的文档,没有发现HTML4里对BOM有啥很明确的优先级说明。如果你发现了,一定要告诉我,非常感谢‍)

http://www.w3.org/TR/html4/charset.html#h-5.2.2

5.2.2 Specifying the character encoding

To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

1、An HTTP "charset" parameter in a "Content-Type" field.

2、A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".

3、The charset attribute set on an element that designates an external resource.

In addition to this list of priorities, the user agent may use heuristics and user settings.

而在HTML5相关的文档里,http://www.w3.org/TR/2011/WD-html5-diff-20110113/

For the HTML syntax of HTML5, authors have three means of setting the character encoding:

  At the transport level. By using the HTTP Content-Type header for instance.

  Using a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used.

  Using a meta element with a charset attribute that specifies the encoding within the first 512 bytes of the document. E.g. <meta charset="UTF-8">could be used to specify the UTF-8 encoding. This replaces the need for <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> although that syntax is still allowed.

上述没有明确指出优先级。另外html5文档中还有几处类似的地方,也没有发现明确指明HTTP Content-Type header和BOM的优先级。(虽然感觉HTTP Content-Type header应该是高优先级),不过基本可以肯定的是,BOM的优先级是大于meta的。


相关的测试还有很多,如父页的charset对子页的影响等。

除此之外,在识别编码时,如果浏览器渲染时发现自己识别错了,会重新加载一遍原页面。

http://code.google.com/intl/zh-CN/speed/page-speed/docs/rendering.html

Browsers differ with the respect to the number of bytes buffered and the default encoding assumed if no character set is found. However, once they have buffered the requisite number of bytes and begun to render the page, if they encounter a character set specification that doesn't match their default, they need to reparse the input and redraw the page. Sometimes, they may even have to rerequest resources, if the mismatch affects the URLs of external resources. 

http://simon.html5.org/test/html/parsing/encoding/charset-reload-200k.htm

 

与《FF 3.6 bug?》里相似的原因?


 


类别:扯谈||添加到搜藏 |分享到i贴吧|浏览(951)|评论 (0)
 
 
最近读者:
 
网友评论:
发表评论:
姓 名:
网址或邮箱: (选填)
内 容:
     

   
帮助中心 | 空间客服 | 投诉中心 | 空间协议
©2012 Baidu