wikidot::parser::WikidotParser Class Reference
List of all members.
Detailed Description
WikidotParser is used to clean a page from www.wikidot.com,
keeping only the interesting content.
Definition at line 53 of file parser.py.
Member Function Documentation
def wikidot::parser::WikidotParser::__get_current_state__ |
( |
|
self |
) |
|
def wikidot::parser::WikidotParser::__handle_body_tag__ |
( |
|
self, |
|
|
|
tag, |
|
|
|
attrs | |
|
) |
| | |
def wikidot::parser::WikidotParser::__init__ |
( |
|
self |
) |
|
Intialize internal variables
Definition at line 56 of file parser.py.
def wikidot::parser::WikidotParser::__update_state_machine_end__ |
( |
|
self, |
|
|
|
tag | |
|
) |
| | |
def wikidot::parser::WikidotParser::__update_state_machine_start__ |
( |
|
self, |
|
|
|
tag, |
|
|
|
attrs | |
|
) |
| | |
Update the state machine.
Definition at line 212 of file parser.py.
def wikidot::parser::WikidotParser::get_breadcrumbs |
( |
|
self |
) |
|
def wikidot::parser::WikidotParser::get_doc |
( |
|
self |
) |
|
Retrieve the parsed and cleaned document
Definition at line 79 of file parser.py.
def wikidot::parser::WikidotParser::get_links |
( |
|
self |
) |
|
Retrieve the links embedded in the page (including images)
Definition at line 92 of file parser.py.
def wikidot::parser::WikidotParser::get_title |
( |
|
self |
) |
|
def wikidot::parser::WikidotParser::handle_charref |
( |
|
self, |
|
|
|
name | |
|
) |
| | |
Overridden - Called when a charref (&#xyz) is parsed
Depending on the current state, the charref is queued for output,
or not.
Definition at line 172 of file parser.py.
def wikidot::parser::WikidotParser::handle_data |
( |
|
self, |
|
|
|
data | |
|
) |
| | |
Overridden - Called when some data is parsed
Depending on the current state, the data is queued for output,
or not.
Definition at line 157 of file parser.py.
def wikidot::parser::WikidotParser::handle_decl |
( |
|
self, |
|
|
|
decl | |
|
) |
| | |
Overridden - Called when a SGML declaration (<!) is parsed
Depending on the current state, the declaration is queued for output,
or not.
Definition at line 202 of file parser.py.
def wikidot::parser::WikidotParser::handle_endtag |
( |
|
self, |
|
|
|
tag | |
|
) |
| | |
Overridden - Called when an end tag is parsed
The state machine is updated when a </div> tag is encountered.
Depending on the current state, the end tag is queued for output,
or not.
Definition at line 138 of file parser.py.
def wikidot::parser::WikidotParser::handle_entityref |
( |
|
self, |
|
|
|
name | |
|
) |
| | |
Overridden - Called when an entityref (&xyz) tag is parsed
Depending on the current state, the entityref is queued for output,
or not.
Definition at line 187 of file parser.py.
def wikidot::parser::WikidotParser::handle_starttag |
( |
|
self, |
|
|
|
tag, |
|
|
|
attrs | |
|
) |
| | |
Overridden - Called when a start tag is parsed
The heart of this function is the state machine.
When a <div> tag is detected, the attributes are compared with
a map of the form (name,value) -> state. If a match occurs,
the state is pushed on top of the stack.
Depending on the current state, the start tag is queued for output,
or not.
Definition at line 103 of file parser.py.
Member Data Documentation
The documentation for this class was generated from the following file: