Package rosdeb :: Module BeautifulSoup :: Class UnicodeDammit
[frames] | no frames]

Class UnicodeDammit

source code

A class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.

Instance Methods
 
__init__(self, markup, overrideEncodings=[], smartQuotesTo='xml', isHTML=False) source code
 
find_codec(self, charset) source code
Class Variables
  CHARSET_ALIASES = {"macintosh": "mac-roman", "x-sjis": "shift-...
  EBCDIC_TO_ASCII_MAP = None
hash(x)
  MS_CHARS = {'\x80':('euro', '20AC'), '\x81': ' ', '\x82':('sbq...
Class Variable Details

CHARSET_ALIASES

Value:
{"macintosh": "mac-roman", "x-sjis": "shift-jis"}

MS_CHARS

Value:
{'\x80':('euro', '20AC'), '\x81': ' ', '\x82':('sbquo', '201A'), '\x83\
':('fnof', '192'), '\x84':('bdquo', '201E'), '\x85':('hellip', '2026')\
, '\x86':('dagger', '2020'), '\x87':('Dagger', '2021'), '\x88':('circ'\
, '2C6'), '\x89':('permil', '2030'), '\x8A':('Scaron', '160'), '\x8B':\
('lsaquo', '2039'), '\x8C':('OElig', '152'), '\x8D': '?', '\x8E':('#x1\
7D', '17D'), '\x8F': '?', '\x90': '?', '\x91':('lsquo', '2018'), '\x92\
':('rsquo', '2019'), '\x93':('ldquo', '201C'), '\x94':('rdquo', '201D'\
), '\x95':('bull', '2022'), '\x96':('ndash', '2013'), '\x97':('mdash',\
...