Stars
3
stars
written in Java
Clear filter
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Apache Nutch is an extensible and scalable web crawler



