Public Member Functions | Public Attributes
wikidot.parser.WikidotParser Class Reference
Inheritance diagram for wikidot.parser.WikidotParser:
Inheritance graph
[legend]

List of all members.

Public Member Functions

def __get_current_state__
def __handle_body_tag__
def __init__
def __update_state_machine_end__
def __update_state_machine_start__
def get_breadcrumbs
def get_doc
def get_links
def get_title
def handle_charref
def handle_data
def handle_decl
def handle_endtag
def handle_entityref
def handle_starttag

Public Attributes

 breadcrumbs
 current_state
 div_bookmark
 div_level
 div_state_map
 links
 out_doc
 page_title
 state
 toc

Detailed Description

WikidotParser is used to clean a page from www.wikidot.com,
keeping only the interesting content.

Definition at line 53 of file parser.py.


Constructor & Destructor Documentation

Intialize internal variables

Definition at line 56 of file parser.py.


Member Function Documentation

Definition at line 252 of file parser.py.

def wikidot.parser.WikidotParser.__handle_body_tag__ (   self,
  tag,
  attrs 
)

Definition at line 255 of file parser.py.

Definition at line 235 of file parser.py.

Update the state machine.

Definition at line 212 of file parser.py.

Definition at line 99 of file parser.py.

Retrieve the parsed and cleaned document

Definition at line 79 of file parser.py.

Retrieve the links embedded in the page (including images)

Definition at line 92 of file parser.py.

Definition at line 96 of file parser.py.

def wikidot.parser.WikidotParser.handle_charref (   self,
  name 
)
Overridden - Called when a charref (&#xyz) is parsed

Depending on the current state, the charref is queued for output,
or not.

Definition at line 172 of file parser.py.

def wikidot.parser.WikidotParser.handle_data (   self,
  data 
)
Overridden - Called when some data is parsed

Depending on the current state, the data is queued for output,
or not.

Definition at line 157 of file parser.py.

def wikidot.parser.WikidotParser.handle_decl (   self,
  decl 
)
Overridden - Called when a SGML declaration (<!) is parsed

Depending on the current state, the declaration is queued for output,
or not.

Definition at line 202 of file parser.py.

def wikidot.parser.WikidotParser.handle_endtag (   self,
  tag 
)
Overridden - Called when an end tag is parsed

The state machine is updated when a </div> tag is encountered.
Depending on the current state, the end tag is queued for output,
or not.

Definition at line 138 of file parser.py.

Overridden - Called when an entityref (&xyz) tag is parsed

Depending on the current state, the entityref is queued for output,
or not.

Definition at line 187 of file parser.py.

def wikidot.parser.WikidotParser.handle_starttag (   self,
  tag,
  attrs 
)
Overridden - Called when a start tag is parsed

The heart of this function is the state machine.
When a <div> tag is detected, the attributes are compared with
a map of the form (name,value) -> state. If a match occurs,
the state is pushed on top of the stack.

Depending on the current state, the start tag is queued for output,
or not.

Definition at line 103 of file parser.py.


Member Data Documentation

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 79 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.

Definition at line 56 of file parser.py.


The documentation for this class was generated from the following file:


aseba
Author(s): Stéphane Magnenat
autogenerated on Thu Jan 2 2014 11:17:18