
Public Member Functions | |
| def | __get_current_state__ |
| def | __handle_body_tag__ |
| def | __init__ |
| def | __update_state_machine_end__ |
| def | __update_state_machine_start__ |
| def | get_breadcrumbs |
| def | get_doc |
| def | get_links |
| def | get_title |
| def | handle_charref |
| def | handle_data |
| def | handle_decl |
| def | handle_endtag |
| def | handle_entityref |
| def | handle_starttag |
Public Attributes | |
| breadcrumbs | |
| current_state | |
| div_bookmark | |
| div_level | |
| div_state_map | |
| links | |
| out_doc | |
| page_title | |
| state | |
| toc | |
WikidotParser is used to clean a page from www.wikidot.com, keeping only the interesting content.
| def wikidot.parser.WikidotParser.__init__ | ( | self | ) |
| def wikidot.parser.WikidotParser.__handle_body_tag__ | ( | self, | |
| tag, | |||
| attrs | |||
| ) |
| def wikidot.parser.WikidotParser.__update_state_machine_end__ | ( | self, | |
| tag | |||
| ) |
| def wikidot.parser.WikidotParser.__update_state_machine_start__ | ( | self, | |
| tag, | |||
| attrs | |||
| ) |
| def wikidot.parser.WikidotParser.get_breadcrumbs | ( | self | ) |
| def wikidot.parser.WikidotParser.get_doc | ( | self | ) |
| def wikidot.parser.WikidotParser.get_links | ( | self | ) |
| def wikidot.parser.WikidotParser.get_title | ( | self | ) |
| def wikidot.parser.WikidotParser.handle_charref | ( | self, | |
| name | |||
| ) |
| def wikidot.parser.WikidotParser.handle_data | ( | self, | |
| data | |||
| ) |
| def wikidot.parser.WikidotParser.handle_decl | ( | self, | |
| decl | |||
| ) |
| def wikidot.parser.WikidotParser.handle_endtag | ( | self, | |
| tag | |||
| ) |
| def wikidot.parser.WikidotParser.handle_entityref | ( | self, | |
| name | |||
| ) |
| def wikidot.parser.WikidotParser.handle_starttag | ( | self, | |
| tag, | |||
| attrs | |||
| ) |
Overridden - Called when a start tag is parsed The heart of this function is the state machine. When a <div> tag is detected, the attributes are compared with a map of the form (name,value) -> state. If a match occurs, the state is pushed on top of the stack. Depending on the current state, the start tag is queued for output, or not.