opmllogo:
Outline Processor Markup Language

 
About

Home

Spec

Directory

Validator

Editor



Members
Join Now
Login

 
 

Character encoding in OPML, Manila and Radio

Sat, Dec 30, 2000; by Jake Savin.

We've been working on resolving some bugs in Radio UserLand and Manila, which are related to the way characters in OPML documents are encoded and decoded on Macintosh vs. Windows.

This page details how character encoding/decoding will work, when OPML is:

  • Transmitted via XML-RPC or SOAP,
  • Stored in Manila sites (in a Frontier object database),
  • Saved as files on disk,
  • Edited in Radio UserLand.
Guiding Principles 

There are three principles that we're adhering to in our implementation:

1) OPML is based upon XML 1.0 -- it's an XML format. All UserLand generated OPML documents will be valid XML 1.0 documents.

2) OPML is always stored and transmitted using the ISO-8559-1 character set, in order to maintain XML 1.0 validity. (UserLand-generated OPML documents specify an encoding declaration of ISO-8859-1.)

3) In Radio UserLand, outlines will always be presented for editing using the platform's native character set: Latin-1 (ISO-8859-1) for Windows clients, and Macintosh text for Macintosh clients. (See: RFC 1345.)

Transmission and storage of OPML 

Whenever OPML text is transmitted via RPC, or stored in an object database, the text will be in the ISO-8859-1 character set, in conformance with its encoding declaration.

Since servers can assume that both incoming and outgoing OPML is encoded as ISO-8859-1 text, no translation is ever necessary on the server when OPML enters or exits the system.

Radio will save OPML files as ISO-8859-1 text to ensure cross-platform compatibility when OPML files are moved from platform-to-platform via email, FTP, floppy disk, file-sharing, etc.

All &, <, >, " and ' characters in attribute values and character data will be represented as named entities.

Note: According to the XML 1.0 specification, a valid XML document isn't required to declare these entities in a DTD. (See Section 4.1.)

Editing outlines in Radio UserLand 

When a user edits an outline using Radio UserLand, editing takes place using the native platform's character set.

Since ISO-8859-1 is the native character set on Windows, there is no need for Windows clients to do any character translation.

On Macintosh, translation from ISO-8859-1 to the native Macintosh character set takes place when the OPML is converted to an outline for editing, and is reversed when the outline is converted back to OPML for storage or transmission via RPC.




© Copyright 2000-2008, Scripting News, Inc.
OPML is a trademark of Scripting News, Inc.
Last update: Sunday, December 31, 2000 at 11:58:24 AM.

Create your own Manila site in minutes. Everyone's doing it!