length-constrained maximum-sum subtree algorithms
- "= \ " to ""
 - "\ " to " "
 - "=\r\n" to ""
 - "\r\n=" to ""
 - "[IMAGE]" to ""
 - "\r\n" to " "
 - "\n" to " " # collapse to one line
 
Def of word:
- only alphabetic letters: 13041 unique tokens
 - otherwise: 21862 unique tokens
 
- davis utilities san plant plants times million utility blackouts generators commission customers trading companies percent electric officials federal wed edison California eletricity crisis
 - ect iso enronxgate amto confidential report draft enroncc susan joe communications ken comments order david june transmission markets language chairman Ken's email
 - bush jones president dow stock bank companies trading dynegy confidential news service natural oil credit services copyright deal percent policies Bush and Ken Lay: Slip Slidin' Away
 - davis utilities edison billion federal generators utility commission governor plan million crisis san plants electric pay companies thursday iso southern. Davis buy transmission lines
 
should contain fields:
- message_id
 - subject
 - body
 - timestamp
 - datetime(optional)
 - sender_id
 - recipient_ids
 
For the last two fields, it can be replaced by participant_ids
gen_cand_trees.sh and check_events.sh