thesis-server

所属分类:搜索引擎
开发工具:JavaScript
文件大小:64KB
下载次数:0
上传日期:2016-10-14 21:16:56
上 传 者sh-1993
说明:  博客驱动的搜索引擎,用于编程内容
(A blogger-driven search engine for programming content)

文件列表:
.travis.yml (308, 2016-10-15)
CONTRIBUTING.md (8782, 2016-10-15)
PRESS-RELEASE.md (1221, 2016-10-15)
STYLE-GUIDE.md (10384, 2016-10-15)
crawlerService (0, 2016-10-15)
crawlerService\baseUrls.json (14824, 2016-10-15)
crawlerService\childProc.js (643, 2016-10-15)
crawlerService\frontPageCrawler.js (9508, 2016-10-15)
crawlerService\loadWL.js (633, 2016-10-15)
crawlerService\postCrawler.js (9654, 2016-10-15)
crawlerService\scheduler.js (4908, 2016-10-15)
crawlerService\startCrawlers.js (4014, 2016-10-15)
crawlerService\whitelist.json (36790, 2016-10-15)
crawlerService\workerUtils (0, 2016-10-15)
crawlerService\workerUtils\postUtils.js (3691, 2016-10-15)
crawlerService\workerUtils\wlUtils.js (772, 2016-10-15)
cron.js (821, 2016-10-15)
db (0, 2016-10-15)
db\database.js (3508, 2016-10-15)
gruntfile.js (1048, 2016-10-15)
indexService (0, 2016-10-15)
indexService\main.js (4535, 2016-10-15)
indexService\solver.js (2142, 2016-10-15)
karma.conf.js (1675, 2016-10-15)
package.json (1803, 2016-10-15)
server (0, 2016-10-15)
server\controllers (0, 2016-10-15)
server\controllers\authController.js (1023, 2016-10-15)
server\controllers\postController.js (2269, 2016-10-15)
server\index.js (177, 2016-10-15)
server\router.js (1130, 2016-10-15)
server\server.js (1003, 2016-10-15)
server\utils (0, 2016-10-15)
server\utils\tagQuery.js (2935, 2016-10-15)
test (0, 2016-10-15)
test\coverage (0, 2016-10-15)
test\coverage\file (0, 2016-10-15)
... ...

# BlogRank BlogRank is your go-to source for discovering programming knowledge from bloggers. Independent bloggers power the world of development, often-times disseminating the best new ideas, practices, and even novel language contructs that the creators of a technology didn't even think of. The core idea of BlogRank is that the articles you want to see most are the ones vetted by other independent bloggers. The more highly cited a blog post is by other authors, the more highly we rank it. You can also search by author who are ranked according to their h-index, as inspired by the world of academia. Our search engine is powered by the graph data structure you see visualized in the background, and a web crawler and indexing service running behind the scenes. The project was made with React, Node.js, and postgreSQL. ## Team - __Product Owner__: Amir Bandeali - __Scrum Master__: Nick Olszowy - __Development Team Members__: Amir Bandeali, Nick Olszowy, Pete Herbert ## Table of Contents 1. [Usage](https://github.com/truncatedAvocados/thesis-server/blob/master/#Usage) 1. [Requirements](https://github.com/truncatedAvocados/thesis-server/blob/master/#requirements) 1. [Development](https://github.com/truncatedAvocados/thesis-server/blob/master/#development) 1. [Installing Dependencies](https://github.com/truncatedAvocados/thesis-server/blob/master/#installing-dependencies) 1. [Server-side](https://github.com/truncatedAvocados/thesis-server/blob/master/#server-side) 1. [Client-side](https://github.com/truncatedAvocados/thesis-server/blob/master/#client-side) 1. [Worker Service and Index Service](https://github.com/truncatedAvocados/thesis-server/blob/master/#worker-service-and-index-service) 1. [Roadmap](https://github.com/truncatedAvocados/thesis-server/blob/master/#roadmap) 1. [Contributing](https://github.com/truncatedAvocados/thesis-server/blob/master/#contributing) 1. [API](https://github.com/truncatedAvocados/thesis-server/blob/master/#API) ## Usage Just enter your search term or phrase and see what we give you back! The app shows you relevant information when you click on a search result, mainly the blog posts who have cited the result you are on. This is useful knowledge, because while you are visiting a website you can never see those who have linked to it, only links from the page outwards. Having this information is very useful to guide your search in finding relevant and well-written information. ## Requirements - Node (v6.6^) - PostgreSQL (v9.5^) ## Development In order to start developing there are several steps to take. First, you should have a local postgreSQL up and running with a database named `testgraph`. See [this page](https://github.com/truncatedAvocados/thesis-server/blob/master/https://www.postgresql.org/docs/9.0/static/tutorial-createdb.html) to get started with postgreSQL locally. From there, you'll want to use `brew` or another package manager to get the grunt command line interface and the mocha command line interface if you don't already. Once you have a working postgres server up, move on to installing dependencies. ### Installing Dependencies Clone the client repo into the top level of the server repo. From within the server directory, and then again within the client directory: ```sh npm install ``` That's it! ### Server-side To develop server-side, within your server-side directory run `npm run start:dev` which will intialize nodemon to start the server connected to your local postgres DB and watch the files for changes. [Postman](https://github.com/truncatedAvocados/thesis-server/blob/master/https://www.getpostman.com/) is a very useful app for testing routes. ### Client-side ES6 syntax and JSX notation must be transpiled to vanilla javascript. To achieve this we are using Babel within grunt to transpile and browserify files into the compiled directory. To transpile files just once, run `grunt` in the terminal within the client directory. To watch the files for changes and transpile on all changes run `grunt watch`. ### Worker Service and Index Service This is where the magic happens. These files are accessed by CRON jobs on the deployed server, and as such export their functionality. If you want to test the services on your local DB, go into the top level file for each (startCrawlers.js and main.js, respectively) and uncomment the code at the bottom that initializes the main function. Then you will be able to use the `node` terminal command to test either one. One thing to note, before the crawler will work you must load in the whitelist to the database with the loadWL.js file. Finally, the worker must find posts and the index service must populate the query lists before your local client will be able to search for anything. The crawler has a few arguments that can be provided to it. If your crawler ever crashes or you command-C on the terminal, the crawlers current queue qill be written to a JSON file. You can restart the crawler with this queue using the `--continue` argument. Additionally, if you wish to interactively add to the whitelist implemented by your local crawler, run the crawler with the argument `--add`. This will give you a prompt driven system where random sites are chosen and presented to you. You have four options at any given link: - `y` to add the site to the whitelist - `n` to not add it and move on to the next link in the queue - `a` to add this base url to the list of urls you know are not blogs and can be filtered out automatically - `e` to exit interactive mode and continue crawling with the new links you have amassed ### Roadmap FOR PETE View the project roadmap [here](https://github.com/truncatedAvocados/thesis-server/blob/master/LINK_TO_PROJECT_ISSUES) ## Contributing See [CONTRIBUTING.md](https://github.com/truncatedAvocados/thesis-server/blob/master/CONTRIBUTING.md) for contribution guidelines. ## API ###**Search Posts or Authors**### ---- Fetches an array of posts or authors that best match the provided tags and page number. Also returns a count of the total number of matching posts or authors in our database. * **URL** /api/posts or /api/authors * **Method:** `GET` * **URL Params** **Required:** `tags=[string]` example: tags=["javascript", "node"] **Optional:** `page=number` default=1 * **Post Response:** * **Code:** 200
**Content:** ~~~ { "results": [ { "url": ..., "postId": ..., "inLinks": [ ... ], "title": ..., "oldTags": [ ... ], "author": ..., "publishDate": ..., "rank": ..., "createdAt": ..., "updatedAt": ... }, ... ], "count": ... } ~~~ * **Author Response:** * **Code:** 200
**Content:** ~~~ { "results": [ { "id": ..., "name": ..., "hIndex": ..., "createdAt": ..., "updatedAt": ..., "posts": [ ... ] }, ... ], "count": ... } ~~~ * **Sample Call:** ~~~ $.ajax({ url: blogrank.io/api/posts?tags=["javascript"]&page=2, method: 'GET', success: data => console.log(data); }); ~~~ ###**Fetch Post Inlinks**### ---- Fetches an array of posts that link to the post with the sent in Id. * **URL** /api/posts/:id * **Method:** `GET` * **URL Params** **Required:** `id=number` * **Response:** * **Code:** 200
**Content:** ~~~ [ { "url": ..., "postId": ..., "inLinks": [ ... ], "title": ..., "oldTags": [ ... ], "author": ..., "publishDate": ..., "rank": ..., "createdAt": ..., "updatedAt": ... }, ... ] ~~~ * **Sample Call:** ~~~ $.ajax({ url: blogrank.io/api/posts/22882, method: 'GET', success: data => console.log(data); }); ~~~ ###**Get Database Statistics**### ---- Returns a JSON object with database statistics. * **URL** /api/stats * **Method:** `GET` * **Response:** * **Code:** 200
**Content:** ~~~ { "posts": ..., "connected": ..., "authors": ... } ~~~ * **Sample Call:** ~~~ $.ajax({ url: blogrank.io/api/stats, method: 'GET', success: data => console.log(data); }); ~~~

近期下载者

相关文件


收藏者