Crawls/downloads algorithmic problems from the Topcoder problem archive and compiles them into a single PDF file for portability.
Note: After cloning the repository, you need to run git submodule update --init --recursive to fetch the PyPDF2 submodule. Install the version of PyPDF2 that is provided in this repository. The newer versions do not seem to work in this situation.
-
topcoderParse.pycrawls the topcoder archive and saves the htmls in the folderhtmls. -
Downloading all the problems can take a lot of time and can even fail. In that case one might stop and rerun the program. Before re-running the program, set
done = xintopcoderParse.py, wherexdenotes the number of problems to skip downloading. Note that the program prints the problem number of the problem being downloaded, so setdoneas the problem number of the last successful download. This way it will skip downloading the problems already downloaded. -
topcoderGenPdf.pycleans the htmls and usespdfkitto generate pdfs for all the files into thePDFsfolder. -
filemerger.pymerges the pdfs into single files. This produces two filessrmmerged.pdfandothermerged.pdffor SRMs and non-SRMs respectively. -
createindex.pygenerates the LaTeX code for the final pdfs of the two files. This also includes a generated index for easy navigation. -
The command
pdflatex Topcoder<X>.texcompiles the LaTeX documents to the final PDFs.Xstands for SRMs and Others. The final PDFs are named asTopcoderSRMs.pdfandTopcoderOthers.pdf. -
To make this work behind a proxy, uncomment the proxy option in the following lines of the file
topcoderGenPdf.pyand set the proper proxy address.options = { 'page-size': 'A5', 'margin-top': '0.30in', 'margin-right': '0.0in', 'margin-bottom': '0.30in', 'margin-left': '0.0in', 'cache-dir': 'html_cache', # 'proxy': '10.3.100.207:8080' }