
Public Member Functions | |
| def | __init__ | 
| def | start_meta | 
Public Attributes | |
| declaredHTMLEncoding | |
| originalEncoding | |
Static Public Attributes | |
| tuple | CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)") | 
| list | NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del'] | 
| list | NESTABLE_INLINE_TAGS | 
| dictionary | NESTABLE_LIST_TAGS | 
| dictionary | NESTABLE_TABLE_TAGS | 
| tuple | NESTABLE_TAGS | 
| list | NON_NESTABLE_BLOCK_TAGS = ['address', 'form', 'p', 'pre'] | 
| dictionary | QUOTE_TAGS = {'script': None} | 
| tuple | RESET_NESTING_TAGS | 
| tuple | SELF_CLOSING_TAGS | 
This parser knows the following facts about HTML:
* Some tags have no closing tag and should be interpreted as being
  closed as soon as they are encountered.
* The text inside some tags (ie. 'script') may contain tags which
  are not really part of the document and which should be parsed
  as text, not tags. If you want to parse the text as tags, you can
  always fetch it and parse it explicitly.
* Tag nesting rules:
  Most tags can't be nested at all. For instance, the occurance of
  a <p> tag should implicitly close the previous <p> tag.
   <p>Para1<p>Para2
    should be transformed into:
   <p>Para1</p><p>Para2
  Some tags can be nested arbitrarily. For instance, the occurance
  of a <blockquote> tag should _not_ implicitly close the previous
  <blockquote> tag.
   Alice said: <blockquote>Bob said: <blockquote>Blah
    should NOT be transformed into:
   Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah
  Some tags can be nested, but the nesting is reset by the
  interposition of other tags. For instance, a <tr> tag should
  implicitly close the previous <tr> tag within the same <table>,
  but not close a <tr> tag in another table.
   <table><tr>Blah<tr>Blah
    should be transformed into:
   <table><tr>Blah</tr><tr>Blah
    but,
   <tr>Blah<table><tr>Blah
    should NOT be transformed into
   <tr>Blah<table></tr><tr>Blah
Differing assumptions about tag nesting rules are a major source
of problems with the BeautifulSoup class. If BeautifulSoup is not
treating as nestable a tag your page author treats as nestable,
try ICantBelieveItsBeautifulSoup, MinimalSoup, or
BeautifulStoneSoup before writing your own subclass.Definition at line 1231 of file BeautifulSoup.py.
| def BeautifulSoup.BeautifulSoup.__init__ | ( | self, | |
| args, | |||
| kwargs | |||
| ) | 
Definition at line 1279 of file BeautifulSoup.py.
| def BeautifulSoup.BeautifulSoup.start_meta | ( | self, | |
| attrs | |||
| ) | 
Beautiful Soup can detect a charset included in a META tag, try to convert the document to that charset, and re-parse the document from the beginning.
Definition at line 1334 of file BeautifulSoup.py.
tuple BeautifulSoup.BeautifulSoup::CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)") [static] | 
        
Definition at line 1332 of file BeautifulSoup.py.
Definition at line 1336 of file BeautifulSoup.py.
list BeautifulSoup.BeautifulSoup::NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del'] [static] | 
        
Definition at line 1299 of file BeautifulSoup.py.
list BeautifulSoup.BeautifulSoup::NESTABLE_INLINE_TAGS [static] | 
        
['span', 'font', 'q', 'object', 'bdo', 'sub', 'sup', 'center']
Definition at line 1293 of file BeautifulSoup.py.
dictionary BeautifulSoup.BeautifulSoup::NESTABLE_LIST_TAGS [static] | 
        
{ 'ol' : [],
                           'ul' : [],
                           'li' : ['ul', 'ol'],
                           'dl' : [],
                           'dd' : ['dl'],
                           'dt' : ['dl'] }
Definition at line 1302 of file BeautifulSoup.py.
dictionary BeautifulSoup.BeautifulSoup::NESTABLE_TABLE_TAGS [static] | 
        
{'table' : [], 
                           'tr' : ['table', 'tbody', 'tfoot', 'thead'],
                           'td' : ['tr'],
                           'th' : ['tr'],
                           'thead' : ['table'],
                           'tbody' : ['table'],
                           'tfoot' : ['table'],
                           }
Definition at line 1310 of file BeautifulSoup.py.
tuple BeautifulSoup.BeautifulSoup::NESTABLE_TAGS [static] | 
        
buildTagMap([], NESTABLE_INLINE_TAGS, NESTABLE_BLOCK_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS)
Reimplemented from BeautifulSoup.BeautifulStoneSoup.
Reimplemented in BeautifulSoup.MinimalSoup, and BeautifulSoup.ICantBelieveItsBeautifulSoup.
Definition at line 1328 of file BeautifulSoup.py.
list BeautifulSoup.BeautifulSoup::NON_NESTABLE_BLOCK_TAGS = ['address', 'form', 'p', 'pre'] [static] | 
        
Definition at line 1319 of file BeautifulSoup.py.
Reimplemented from BeautifulSoup.BeautifulStoneSoup.
Definition at line 1336 of file BeautifulSoup.py.
dictionary BeautifulSoup.BeautifulSoup::QUOTE_TAGS = {'script': None} [static] | 
        
Reimplemented from BeautifulSoup.BeautifulStoneSoup.
Definition at line 1288 of file BeautifulSoup.py.
tuple BeautifulSoup.BeautifulSoup::RESET_NESTING_TAGS [static] | 
        
buildTagMap(None, NESTABLE_BLOCK_TAGS, 'noscript', NON_NESTABLE_BLOCK_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS)
Reimplemented from BeautifulSoup.BeautifulStoneSoup.
Reimplemented in BeautifulSoup.MinimalSoup.
Definition at line 1323 of file BeautifulSoup.py.
tuple BeautifulSoup.BeautifulSoup::SELF_CLOSING_TAGS [static] | 
        
buildTagMap(None, ['br' , 'hr', 'input', 'img', 'meta', 'spacer', 'link', 'frame', 'base'])
Reimplemented from BeautifulSoup.BeautifulStoneSoup.
Definition at line 1284 of file BeautifulSoup.py.