workshops-setup_cloud_analytics_machine

所属分类:数值算法/人工智能
开发工具:SCSS
文件大小:1378KB
下载次数:0
上传日期:2023-02-24 01:22:18
上 传 者sh-1993
说明:  使用R、RStudio和Shiny服务器、Python和JupyterLab为分析和数据科学设置云机器的提示和技巧
(Tips and Tricks to setup a cloud machine for Analytics and Data Science with R, RStudio and Shiny Servers, Python and JupyterLab ,)

文件列表:
Dockerfile (2843, 2023-02-24)
Dockerfiles (0, 2023-02-24)
Dockerfiles\Dockerfile.base (1270, 2023-02-24)
Dockerfiles\Dockerfile.conf (2735, 2023-02-24)
Dockerfiles\Dockerfile.rstudio (822, 2023-02-24)
Dockerfiles\Dockerfile.shiny (1157, 2023-02-24)
LICENSE (1074, 2023-02-24)
R.conf (719, 2023-02-24)
Renviron (496, 2023-02-24)
Rprofile.site (404, 2023-02-24)
TL;DR.md (3418, 2023-02-24)
app.R (24160, 2023-02-24)
libs (0, 2023-02-24)
libs\font-awesome-5.3.1 (0, 2023-02-24)
libs\font-awesome-5.3.1\css (0, 2023-02-24)
libs\font-awesome-5.3.1\css\fontawesome-all.css (60985, 2023-02-24)
libs\font-awesome-5.3.1\css\fontawesome-all.min.css (48649, 2023-02-24)
libs\font-awesome-5.3.1\less (0, 2023-02-24)
libs\font-awesome-5.3.1\less\_animated.less (297, 2023-02-24)
libs\font-awesome-5.3.1\less\_bordered-pulled.less (422, 2023-02-24)
libs\font-awesome-5.3.1\less\_core.less (291, 2023-02-24)
libs\font-awesome-5.3.1\less\_fixed-width.less (119, 2023-02-24)
libs\font-awesome-5.3.1\less\_icons.less (80836, 2023-02-24)
libs\font-awesome-5.3.1\less\_larger.less (454, 2023-02-24)
libs\font-awesome-5.3.1\less\_list.less (320, 2023-02-24)
libs\font-awesome-5.3.1\less\_mixins.less (1264, 2023-02-24)
libs\font-awesome-5.3.1\less\_rotated-flipped.less (711, 2023-02-24)
libs\font-awesome-5.3.1\less\_screen-reader.less (118, 2023-02-24)
libs\font-awesome-5.3.1\less\_shims.less (60686, 2023-02-24)
libs\font-awesome-5.3.1\less\_stacked.less (478, 2023-02-24)
libs\font-awesome-5.3.1\less\_variables.less (34071, 2023-02-24)
libs\font-awesome-5.3.1\less\brands.less (759, 2023-02-24)
libs\font-awesome-5.3.1\less\fontawesome.less (504, 2023-02-24)
libs\font-awesome-5.3.1\less\regular.less (778, 2023-02-24)
libs\font-awesome-5.3.1\less\solid.less (771, 2023-02-24)
libs\font-awesome-5.3.1\less\v4-shims.less (235, 2023-02-24)
libs\font-awesome-5.3.1\scss (0, 2023-02-24)
... ...

# How to Setup a Cloud Server for Data Science **Author**: [Luca Valnegri](https://www.linkedin.com/in/lucavalnegri/)
**Last Updated**: 01 November 2021
**Last Addition**: [OSRM Routing Server](#osrm) --- * [Motivations](#motivations) * [Create a Virtual Private Server](#create-a-virtual-private-server) + [Sign up to Digital Ocean](#sign-up-do) + [Login into Digital Ocean](#login-do) + [Secure Digital Ocean Account](#secure-do) + [Create Your First *droplet*](#droplet-without-ssh-key) + [First Connection](#first-connection) - [Windows Users](#without-key-windows) - [Linux and macOS Users](#without-key-linux-macos) * [Customize Your New Server](#customize-your-new-server) + [Upgrade the System](#upgrade-system) + [Adding swap space](#swap) + [Changing localization](#locale) + [Add Admin User](#add-admin-user) + [Add *public* Group and Repository](#add-public) + [Add Security Layers](#add-security) + [Install *Webmin*](#install-webmin) + [Add Domain Name](#domain-name) + [Take Your First *Snapshot*](#first-snapshot) * [The *R* Stack](#r-stack) + [Install core *R*](#install-r) + [Install *RStudio Server*](#install-rstudio-server) - [Using Projects with Version Control](#rstudio-using-projects-with-version-control) + [Install *Shiny Server*](#install-shiny-server) + [Testing the *R* stack](#testing-the-r-stack) + [Install *Ubuntu* Dependencies for *R* packages](#install-linux-dependencies-for-r-packages) + [Install *R* packages](#install-r-packages) * [Ngnix](#ngnix) + [Install Nginx](#nginx-install) + [Install php preprocessor](#nginx-php) + [Dedicated URLs for RStudio Server and Shiny Server](#nginx-url) + [Add SSL Certificate for encrypted *https* connection](#nginx-ssl) + [Add Basic Authentication to Shiny Apps](#nginx-auth) + [Nginx Configuration](#nginx-conf) * [The *Python* Data Science Stack](#python) + [Install *Python*](#python-install) + [Install the Data Science Stack](#python-stack) + [Install *Jupyterlab*](#jupyterlab) * [Install *code-server* as general IDE](#vscode) * [Storage engines](#storage-engines) + [MySQL Server](#mysql) - [MySQL Web Cient: *DbNinja*](#dbninja) - [MySQL Web Cient: *PhpMyAdmin*](#phpmyadmin) + [MS SQL Server](#mssql) + [PostgreSQL](#postgres) + [Neo4j](#neo4j) + [MongoDB](#mongodb) + [Redis](#redis) + [MonetDB Lite](#monetdb) + [Influx DB](#influxdb) * [Docker](#docker) + [Install Docker](#docker-install) + [Basic Commands](#docker-commands) + [Dockerfile](#dockerfile) + [Example: *Selenium* for Web Driving](#docker-selenium) + [Resources](#docker-resources) * [OSRM Routing Server](#osrm) * [Nominatim Geoserver](#nominatim) + [Dependencies](#dep-nominatim) + [Postgres](#psql-nominatim) + [Apache](#apache-nominatim) + [Download Data](#data-nominatim) + [Build Software](#build-nominatim) + [Populate Database](#pop-nominatim) * [Additional Tools](#additional-tools) + [Samba Server](#samba) + [Personal Cloud Storage](#next-cloud) + [Fonts](#fonts) + [Spark](#spark) + [Add SSH Key Pair for Enhanced Security](#droplet-with-ssh-key) - [Windows users](#with-key-windows) - [Linux and macOS users](#with-key-linux-macos) + [Node.js](#node) * [Appendix: Linux Basics](#linux-basics) + [Files and Folders](#linux-files-folders) - [Root Directory Structure](#linux-directory-structure) - [File Compressing](#linux-file-compressing) + [Users, Groups, Permissions, Ownerships](#linux-users-groups-permissions-ownerships) + [Software Management](#linux-software-management) + [Scheduling Tasks](#linux-cronjobs) + [Bash Shell](#linux-shell) * [Resources](#resources) --- ## Motivations If you’ve always wanted to have: - an *RStudio Server* of your own, so that you can access your *R* projects from anywhere (albeit with an internet connection) - your own *Shiny Server* to host your awesome data visualizations or dashboards, the results of statistical modeling, monitor your machine learning algorithms, or simply deploy some *RMarkdown* documents or reports - a *JupyterLab* server to share your knowledge with your team colleagues - one or more *database management systems* to store any kind of data, small or/and big, relational and/or schema-less - your own *cloud storage* to access your file from everyhere without paying another company to do so (on top of the actual storage, obviously!) the following notes will help you! This tutorial is quite lengthy, as it's been thought full of details useful for the very novice. If you just want the step-by-step list, a sort of cloud server setup cheat-sheet, it's more convenient for you to follow [this document](https://github.com/lvalnegri/workshops-setup_cloud_analytics_machine/blob/master/TL%3BDR.md) instead.
:point_up_2:[Back to Index](#index)
## Create a Virtual Private Server ### Sign up to Digital Ocean - go to https://m.do.co/c/ef1c7bc80083 (you'll be credited $100 lasting 60 days, offer valid at the time of writing) - insert your email and a sufficiently strong password (you can generate one suitable [here](https://www.random.org/passwords/?num=1&len=15&format=html&rnd=new)). I advise you to use a password manager to securely collect, store and organize all your credentials. My suggestion is the free open source [KeePass Password Safe](https://keepass.info/). - you'll be asked for a credit card, but no money will be taken from your account. Just remember to check in at the end of the grace period! - check your email and validate your new account ### Login into Digital Ocean - go to https://cloud.digitalocean.com/login - click *Login* top right - enter username and password ### Secure Digital Ocean Account - go to `Account` (bottom left) > `Settings` > `Security` (tab) > `Secure your account` > `Enable Two-Factor Authentication` - choose which system you prefer, then follow the corresponding instructions - in both cases, remember to generate the backup codes, and save them in some secure place ### Create Your First *droplet* - Click the green *Create* button in the top right - Click *Droplets* from the unfolding menu - For the installation step, you should create a *** which is at least 2GB RAM, because a few packages require more than 1GB RAM to compile. You can always change up or down to some amount either number of CPUs or amount of RAM later. For the moment being, choose the following (moving top to bottom): - Image / Distributions: `Ubuntu 20.04 (LTS) x***` - Plan / Starter (Standard): `RAM 2GB`, `Power 1CPU`, `Storage 50GB`, `Transfer 2TB`, `cost $10 monthly` - Datacenter Region: `London` - Authentication: `Password` (we'll move to `SSH key` later) - Create *root* password - Hostname: choose a memorable name for your server. You can always change it later from inside the machine - Tags: choose the reference project. I guess you only have the default one at the moment though. You can build more structure to your account later, if you decide to stick with Digital Ocean and become a devOps guru! - Click `Create` - Wait for the droplet to be created. Once it's done, the process should end showing the project page opened on the *Resources* tab. In the *Droplets* list you can easily find the droplet by the name, with the IP address on the right. ### First Connection **SSH** stands for ***S**ecure **SH**ell* which is a [cryptographic](https://en.wikipedia.org/wiki/Cryptography "Cryptography") [network protocol](https://en.wikipedia.org/wiki/Network_protocol "Network protocol") that allows secure access over an otherwise unsecured network. SSH is encrypted with *Secure Sockets Layer* (SSL), which makes it difficult for these communications to be intercepted and read. Any *** could be accessed with a typical user/password exchange, but it's also possible to setup *SSH keys* that identify trusted computers without the need for passwords. For additional security, it's also possible to add a *passphrase* to the key pair, that act as a password to access the key itself. #### Windows users Windows has no embedded ssh client by default. Many software can be downloaded for free, one of the most famous is [PuTTY](https://www.putty.org/), but we are going to use the much enhanced [MobaXTerm](https://mobaxterm.mobatek.net/), which is free for personal use and allows, among other functionalities, *sftp*, tunnelling, multi-tabbing and saving sessions. - [Download](https://mobaxterm.mobatek.net/download-home-edition.html) the *Home* edition of *MobaXTerm*. You can use, if you so prefer, the *portable* edition that doesn't need any installation. Just unzip the downloaded file in some folder of your choice, then run the included executable. - Open *MobaXTerm* - For a more standard copy and paste behaviour, click `Settings` towards the far right of the button bar, then click the tab `Terminal`. Uncheck the option *Paste using right-click*, then click `OK`. Now you can paste content in any terminal window using the standard `SHIFT+INS` keys combination (but you can't *copy* and *paste* using the more frequent `CTRL+C` and `CTRL+V`). In addition, a right-click button of the mouse exposes a quite extensive actions menu. - Click `Session` in the upper left button bar, then `SSH` in the upper left button bar of the new window - Paste the IP address you received with the email into the *Remote host* textbox, then click OK - type in `root` when asked to *login as*, then copy the password you received with the email and paste it into the terminal. _**Notice**_ that by default Linux systems do not give any feedback from the password field. So don't try to paste again and again only because you feel the need to see some feedback, just paste the password once and hit enter! #### Linux and macOS users Both Linux *distros* and *macOS* have a built-in SSH client called *Terminal* which can be used to connect to remote servers: - **macOS**. *Terminal.app* is located in the `Applications > Utilities` folder. Double-click on the icon to start the client. - **Linux**. A Terminal window can be easily open using the shortcut `CTRL+ALT+T`. At the prompt you would type in general: `ssh usrname@ip_address`. At the moment there is no other user than *root* , so to connect to your droplet just type: `ssh root@ip_address` If the IP address and the user name are correctly recognized, the system then prompts to enter the password associated with the specified user.
:point_up_2:[Back to Index](#index)
## Customize Your New Server ### Upgrade the System - To enable monitoring from the DO dashboard enter the following command (or simply copy and paste, it doesn't hurt): ~~~ curl -sSL https://repos.insights.digitalocean.com/install.sh | sudo bash ~~~ After a few minutes, you'll start to see a bunch of graphs and KPIs populating your droplets dashboard. - Enter the command `date` to test if the timezone is correct. If it doesn't show the correct time and/or desired timezone, run the following commands: ~~~ dpkg --configure -a dpkg-reconfigure tzdata ~~~ then enter the correct zone for your location. Notice that if you leave the timezone as **UTC**, there will be no automatic passage between winter and summer time (the timezones for the UK are **GMT** from November to March, and **BST** from April to October). - Before proceeding any further, let's thouroughly upgrade the system: ~~~ apt-get update apt-get -y full-upgrade apt-get -y autoremove ~~~ If during the above upgrading session a window pops up and asks for any changes, be sure to accept the choice: `keep the local version currently installed` - install some needed *basic* libraries that could be missed from the system (this much depends on how your chosen provider has decided to install the OS): ~~~ apt-get -y install apt-transport-https software-properties-common htop file nano dos2unix man-db ufw git-core libgit2-dev libauthen-oath-perl openssh-server build-essential libsocket6-perl dirmngr ~~~ - restart the system: ~~~ shutdown -r now ~~~ You should now wait a few seconds, to give the server sufficient time to reboot, then reconnect. If you're using *MobaXTerm* you can simply press the *R* key every now and then until it's asking for the login step. If you're on a different service than Digital Ocean, it'd also a good idea to disable the boot menu, or reduce the time it shows up: - open the conf file for editing: ~~~ nano /etc/default/grub ~~~ - if you can find the line `GRUB_TIMEOUT=0`, then you don't need any changes and you can exit the editor pressing the combination key `CTRL+x`. Otherwise, add or change the following lines: ~~~ GRUB_TIMEOUT=3 GRUB_RECORDFAIL_TIMEOUT=3 ~~~ - save the file pressing the *exit* combination key `CTRL+x` then `y` followed by `Enter` - update the boot loader: ~~~ update-grub ~~~ ### Adding *swap* space You can find yourself sometimes in a situation where you have no sufficient memory to run your scripts, or even to install some packages, and for any reason you can't upzize the RAM to your machine. You can add what is called *swap space*, that essentially mimic some storage to use as memory. You can see if the system has any configured swap by typing: ``` swapon --show ``` If you don't get back any output, this means that your system doesn't have any swap space availability. You can instead verify the memory activity using either the `free -h` command or the more complete `top` or `htop` utilities. Before starting the operation, you should run the `df` command to be sure you have the necessary storage space for the planned swap file. Although there are many opinions about the appropriate size of a swap space, it really depends on application requirements. Generally, an amount equal to or double the amount of RAM on the system is a good starting point. The following are the instructions to add 1GB of swap space, you can easily change the value to add less or more of that (`G` stands for gigabyte, `M` would stand for megabyte). - allocate a file called `swapfile` of the desired size: ``` fallocate -l 1G /swapfile ``` - make the file accessible only by `root`: ``` chmod 600 /swapfile ``` - mark the file as actual swap space: ``` mkswap /swapfile ``` - enable the swap operation, allowing the system to start using the file for that purpose: ``` swapon /swapfile ``` - to make the swap operation permanent after reboot, add the swap file information to the end of the `/etc/fstab` file: ``` echo '/swapfile none swap sw 0 0' | tee -a /etc/fstab ``` - add some configuration information to the services control, to better adapt the swap operation to a server: ``` echo ' ########################### # ADDED BY user vm.swappiness=10 vm.vfs_cache_pressure=40 ########################### ' | tee -a /etc/sysctl.conf ``` Notice that Digital Ocean, like many other *** providers, highly discourage the creation of *swap* space, practice often used to keep down the size, and hence the cost, of the droplet. This is due to the fact that their system is all made up of SSD storage, that is highly degraded by the continous read/write access, typical when swapping. Besides, upgrading the droplet leads to much better results in general. ### Changing localization - `locale` to check for your current locale (the default at OS installation is *en_US.UTF-8*). - `locale -a` list all the installed locales - `sudo locale-gen ` generate and install a new locale (see [here]() for the available strings) - `` to change a locale value on a temporary basis - `sudo nano /etc/default/locale` and add or modify the instruction(s) therein to change locale on a permanent basis (needs a fresh login) ### Add admin user The Linux system is well known for its strong management of users, file and directories permissions and ownerships. In particular, it's an absolute no-no to use the default admin user, called *root*, as it could be a disaster if you get something even slightly wrong in a command (`rm /` will completely wipe your system disk with no possibility of return), or simply for security to avoid giving complete control of the machine to anyone stealing your password. It's customary instead to use a group called *sudo* whose components can act as temporary admins. - create a new user (change *usrname* with the actual user name): ~~~ adduser usrname ~~~ - enter a password twice (try not to include any special character, as they can cause some problem down the line), and then the required information (you can simply void all the fields) - add new user as *sudoer* to the *sudo* group: ~~~ usermod -aG sudo usrname ~~~ - switch control to the new user *usrname* (you can recognize the change in the prompt): ~~~ su - usrname ~~~ - check if *usrname* can actually run admin commands, trying to switch control to the *root* user (you can run any admin command using the *sudo* preprocessor): ~~~ sudo su ~~~ - always remember to exit from *root* when you're done (also `CTRL+D` as shortcut): ~~~ exit ~~~ From now on you should forget there exists a user called *root*, and try not to run the command `sudo su` unless you actually need to. Always use instead *usrname* to run admin stuff through the `sudo` preprocessor. If you need to change a user's password, run the command: ~~~ sudo passwd usrname ~~~ then enter the new password. Notice that only the *root* user, or a *sudoer*, can change a password of any other user. If you want instead to completely delete a user, you need to properly log in as *root*, or switch to the *root* user: ~~~ sudo su - ~~~ then run the command: ~~~ userdel -r usrname ~~~ You would drop the `-r` option if you want to keep the user's *home* directory. ### Add *public* group and repository One of the main problems beginners encounter when they start using Linux, and the *Shiny* Server in particular, is related to the much dreaded *file permissions*. Briefly explained, everything in Linux is a file, and each file admits three operations: **r**ead, **w**rite, e**x**ecute, that can be carried out by three (groups of) users: the *owner* of the file, any user belonging to one or more specific *groups*, and all the *other* users. When you list the content of a directory, using for example the `ls -l` command, you can see all the permissions in a form of nine binary numbers attached to it, where 0 means *not permitted* and 1 means *permitted*. These numbers must be read in group of three (see also the picture below): the first three (from the left) are the operations allowed to the *owner*, the next three are for the *group*, the last three for *others*. Besides the binary mode, there i ... ...

近期下载者

相关文件


收藏者