#include <TextEncoding.h>
Public Types | |
enum | { MAX_SEQUENCE_LENGTH = 6 } |
typedef int | CharacterMap [256] |
typedef SharedPtr< TextEncoding > | Ptr |
Public Member Functions | |
virtual const char * | canonicalName () const =0 |
Destroys the encoding. | |
virtual const CharacterMap & | characterMap () const =0 |
virtual int | convert (const unsigned char *bytes) const |
virtual int | convert (int ch, unsigned char *bytes, int length) const |
virtual bool | isA (const std::string &encodingName) const =0 |
virtual | ~TextEncoding () |
Static Public Member Functions | |
static void | add (TextEncoding::Ptr encoding) |
static void | add (TextEncoding::Ptr encoding, const std::string &name) |
static TextEncoding & | byName (const std::string &encodingName) |
static TextEncoding::Ptr | find (const std::string &encodingName) |
static TextEncoding::Ptr | global (TextEncoding::Ptr encoding) |
static TextEncoding & | global () |
static void | remove (const std::string &encodingName) |
Static Public Attributes | |
static const std::string | GLOBAL |
Return the current global TextEncoding object. | |
Static Protected Member Functions | |
static TextEncodingManager & | manager () |
Name of the global TextEncoding, which is the empty string. |
An abstract base class for implementing text encodings like UTF-8 or ISO 8859-1.
Subclasses must override the canonicalName(), isA(), characterMap() and convert() methods and need to be thread safe and stateless.
TextEncoding also provides static member functions for managing mappings from encoding names to TextEncoding objects.
Definition at line 53 of file TextEncoding.h.
typedef int Poco::TextEncoding::CharacterMap[256] |
Definition at line 73 of file TextEncoding.h.
typedef SharedPtr<TextEncoding> Poco::TextEncoding::Ptr |
Definition at line 66 of file TextEncoding.h.
anonymous enum |
Definition at line 68 of file TextEncoding.h.
Poco::TextEncoding::~TextEncoding | ( | ) | [virtual] |
The map[b] member gives information about byte sequences whose first byte is b. If map[b] is c where c is >= 0, then b by itself encodes the Unicode scalar value c. If map[b] is -1, then the byte sequence is malformed. If map[b] is -n, where n >= 2, then b is the first byte of an n-byte sequence that encodes a single Unicode scalar value. Byte sequences up to 6 bytes in length are supported.
Definition at line 141 of file TextEncoding.cpp.
void Poco::TextEncoding::add | ( | TextEncoding::Ptr | encoding | ) | [static] |
Returns a pointer to the TextEncoding object for the given encodingName, or NULL if no such TextEncoding object exists.
Definition at line 174 of file TextEncoding.cpp.
void Poco::TextEncoding::add | ( | TextEncoding::Ptr | encoding, |
const std::string & | name | ||
) | [static] |
Adds the given TextEncoding to the table of text encodings, under the encoding's canonical name.
If an encoding with the given name is already registered, it is replaced.
Definition at line 180 of file TextEncoding.cpp.
TextEncoding & Poco::TextEncoding::byName | ( | const std::string & | encodingName | ) | [static] |
Transform the Unicode character ch into the encoding's byte sequence. The method returns the number of bytes used. The method must not use more than length characters. Bytes and length can also be null - in this case only the number of bytes required to represent ch is returned. If the character cannot be converted, 0 is returned and the byte sequence remains unchanged. The default implementation simply returns 0.
Definition at line 158 of file TextEncoding.cpp.
virtual const char* Poco::TextEncoding::canonicalName | ( | ) | const [pure virtual] |
Destroys the encoding.
Implemented in Poco::UTF16Encoding, Poco::Latin9Encoding, Poco::ASCIIEncoding, Poco::Latin1Encoding, Poco::UTF8Encoding, and Poco::Windows1252Encoding.
virtual const CharacterMap& Poco::TextEncoding::characterMap | ( | ) | const [pure virtual] |
Returns true if the given name is one of the names of this encoding. For example, the "ISO-8859-1" encoding is also known as "Latin-1".
Encoding name comparision are be case insensitive.
Implemented in Poco::UTF16Encoding, Poco::Latin9Encoding, Poco::ASCIIEncoding, Poco::Latin1Encoding, Poco::UTF8Encoding, and Poco::Windows1252Encoding.
int Poco::TextEncoding::convert | ( | const unsigned char * | bytes | ) | const [virtual] |
Returns the CharacterMap for the encoding. The CharacterMap should be kept in a static member. As characterMap() can be called frequently, it should be implemented in such a way that it just returns a static map. If the map is built at runtime, this should be done in the constructor.
Reimplemented in Poco::UTF16Encoding, Poco::Latin9Encoding, Poco::ASCIIEncoding, Poco::Latin1Encoding, Poco::UTF8Encoding, and Poco::Windows1252Encoding.
Definition at line 146 of file TextEncoding.cpp.
int Poco::TextEncoding::convert | ( | int | ch, |
unsigned char * | bytes, | ||
int | length | ||
) | const [virtual] |
The convert function is used to convert multibyte sequences; bytes will point to a byte sequence of n bytes where getCharacterMap()[*bytes] == -n.
The convert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed. The default implementation returns (int) bytes[0].
Reimplemented in Poco::UTF16Encoding, Poco::Latin9Encoding, Poco::ASCIIEncoding, Poco::Latin1Encoding, Poco::UTF8Encoding, and Poco::Windows1252Encoding.
Definition at line 152 of file TextEncoding.cpp.
TextEncoding::Ptr Poco::TextEncoding::find | ( | const std::string & | encodingName | ) | [static] |
Returns the TextEncoding object for the given encoding name.
Throws a NotFoundException if the encoding with given name is not available.
Definition at line 168 of file TextEncoding.cpp.
TextEncoding::Ptr Poco::TextEncoding::global | ( | TextEncoding::Ptr | encoding | ) | [static] |
Removes the encoding with the given name from the table of text encodings.
Definition at line 192 of file TextEncoding.cpp.
TextEncoding & Poco::TextEncoding::global | ( | ) | [static] |
Sets global TextEncoding object.
This function sets the global encoding to the argument and returns a reference of the previous global encoding.
Definition at line 200 of file TextEncoding.cpp.
virtual bool Poco::TextEncoding::isA | ( | const std::string & | encodingName | ) | const [pure virtual] |
Returns the canonical name of this encoding, e.g. "ISO-8859-1". Encoding name comparisons are case insensitive.
Implemented in Poco::UTF16Encoding, Poco::Latin9Encoding, Poco::ASCIIEncoding, Poco::Latin1Encoding, Poco::UTF8Encoding, and Poco::Windows1252Encoding.
TextEncodingManager & Poco::TextEncoding::manager | ( | ) | [static, protected] |
Name of the global TextEncoding, which is the empty string.
Definition at line 206 of file TextEncoding.cpp.
void Poco::TextEncoding::remove | ( | const std::string & | encodingName | ) | [static] |
Adds the given TextEncoding to the table of text encodings, under the given name.
If an encoding with the given name is already registered, it is replaced.
Definition at line 186 of file TextEncoding.cpp.
const std::string Poco::TextEncoding::GLOBAL [static] |
Return the current global TextEncoding object.
Definition at line 159 of file TextEncoding.h.