News-Crawler

所属分类:特征抽取
开发工具:Java
文件大小:34KB
下载次数:0
上传日期:2017-03-31 18:45:14
上 传 者sh-1993
说明:  最后一年项目。新闻提取与摘要
(Final Year Project. News Extraction and Summarization)

文件列表:
.classpath (2698, 2017-04-01)
.project (370, 2017-04-01)
.settings (0, 2017-04-01)
.settings\org.eclipse.jdt.core.prefs (587, 2017-04-01)
bin (0, 2017-04-01)
bin\AnaphoraAndTagging$1.class (920, 2017-04-01)
bin\AnaphoraAndTagging.class (16279, 2017-04-01)
bin\BasicCrawlController.class (4021, 2017-04-01)
bin\BasicCrawler.class (4955, 2017-04-01)
bin\DbConnector.class (3289, 2017-04-01)
bin\NoiseRemover.class (3234, 2017-04-01)
bin\Stoplist.txt (5102, 2017-04-01)
src (0, 2017-04-01)
src\AnaphoraAndTagging.java (17246, 2017-04-01)
src\BasicCrawlController.java (2267, 2017-04-01)
src\BasicCrawler.java (4713, 2017-04-01)
src\DbConnector.java (2590, 2017-04-01)
src\NoiseRemover.java (3034, 2017-04-01)
src\Stoplist.txt (5102, 2017-04-01)

# News Crawler News Crawler is the offline phase of the News Extraction and Summarization project ## Tech News Crawler uses a number of open source projects to work properly: * [Crawler4j](https://github.com/yasserg/crawler4j) - an open source web crawler for Java * [JSoup](https://jsoup.org/) - a Java library for working with real-world HTML * [Stanford CoreNLP](http://stanfordnlp.github.io/CoreNLP/) - a set of natural language analysis tools ## Installation News Crawler requires the following JARs to run * crawler4j-4.1-jar-with-dependencies.jar * slf4j-simple-1.6.1.jar * jsoup-1.10.2.jar * mysql-connector-java-5.1.40-bin.jar * All JARs in Stanford CoreNLP Suite ## Instructions ```sh - Download the dependencies and import the project on eclipse - Right click on project -> Build Path -> Configure Build Path -> Libraries -> Add External JAR - Add the JARs to the class path - Create a database and relations according to the schema diagram - Modify the default file locations for storing temporary crawl data and file repository - Run the CrawlController as a java application - Run the AnaphoraAndTagging as a java application ``` ### Authors * [Abha Suman](mailto:abhasuman2@gmail.com?Subject=Hello%20again) * [Hariprasad KR](mailto:krhp2236@gmail.com?Subject=Hello%20again) * [Kailash Karthik](mailto:kailashkarthik9@gmail.com?Subject=Hello%20again)

近期下载者

相关文件


收藏者