Package rosdeb :: Module BeautifulSoup :: Class UnicodeDammit
[frames] | no frames]

Class UnicodeDammit

source code

A class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.

Instance Methods
 
__init__(self, markup, overrideEncodings=[], smartQuotesTo='xml', isHTML=False) source code
 
find_codec(self, charset) source code
Class Variables
  CHARSET_ALIASES = {'macintosh': 'mac-roman', 'x-sjis': 'shift-...
  EBCDIC_TO_ASCII_MAP = None
hash(x)
  MS_CHARS = {'\x80': ('euro', '20AC'), '\x81': ' ', '\x82': ('s...
Class Variable Details

CHARSET_ALIASES

Value:
{'macintosh': 'mac-roman', 'x-sjis': 'shift-jis'}

MS_CHARS

Value:
{'\x80': ('euro', '20AC'),
 '\x81': ' ',
 '\x82': ('sbquo', '201A'),
 '\x83': ('fnof', '192'),
 '\x84': ('bdquo', '201E'),
 '\x85': ('hellip', '2026'),
 '\x86': ('dagger', '2020'),
 '\x87': ('Dagger', '2021'),
...