About encoding in detail |
|
|||||||||||||||||||
Character encodings provide a map between a series of numbers and the characters people expect to see when they enter text into computers. The capital letter "A", for example, is represented by the decimal number 65 (41 in hexadecimal) in a variety of character encodings, including the ASCII text familiar to many Western programmers and Windows Code Page-1252, the default encoding used by most MicrosoftR WindowsR Western systems. Character encodings are not fonts, which provide graphic representations, glyphs, that map to a particular character encoding. Microsoft Word, for example, includes a version of Arial (Arial Unicode MS) with tens of thousands of characters. All XML processors are required to understand two transformations of the Unicode character encoding, UTF-8 and UTF-16. The Microsoft XML Parser (MSXML) supports more encodings, but all text in XML documents is treated internally as the Unicode UCS-2 character encoding. Even different platforms representing the same set of Western characters can use different bytes to represent the same character, as shown in the following table.
Parsers can read in documents written ISO-8859-1, Big-5, or Shift-JIS, but the processing rules treat everything as Unicode. XML parsers perform the conversion while loading XML documents. There are some limitations to auto-detecting character encodings. For example, 8-bit ASCII text is acceptable UTF-8, but UTF-8 is more than 8-bit ASCII text. For reliable processing, XML documents that use character encodings other than UTF-8 or UTF-16 must include an encoding declaration in the XML declaration. This makes it possible for a parser to read the characters correctly or report errors when it cannot process an encoding. Because the XML declaration is written in basic ASCII text, parsers can read its contents even if the document is in a very different encoding. The encoding declaration significantly increases the likelihood that documents in encodings other than UTF-8 and UTF-16 will be interpreted correctly. Some transactions, for example, those carried over HTTP and e-mail protocols, also provide information about character encodings. Microsoft Internet Explorer uses that information in document processing, but it isn't available, for example, if you load an XML document from a local hard drive or even a file server. |
||||||||||||||||||||
Copyright © HiBase Group, 2002-2016 | Privacy Policy |