python-statista-programming-challenge

所属分类:Leetcode/题库
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-11-18 22:43:41
上 传 者sh-1993
说明:  Statista编程挑战
(Statista Programming Challenge)

文件列表:
.docker.env (345, 2023-11-20)
.dockerignore (134, 2023-11-20)
Dockerfile (348, 2023-11-20)
config.py (1490, 2023-11-20)
core/ (0, 2023-11-20)
core/__init__.py (0, 2023-11-20)
core/helpers.py (1087, 2023-11-20)
core/models.py (7183, 2023-11-20)
data/ (0, 2023-11-20)
data/Input_Dataset.csv (5277526, 2023-11-20)
db_create.py (2365, 2023-11-20)
db_create_import.sh (60, 2023-11-20)
db_import.py (5559, 2023-11-20)
db_test.py (1553, 2023-11-20)
docker-compose.yml (485, 2023-11-20)
docs/ (0, 2023-11-20)
docs/Statista_Programming Challenge_OPS_task_description.pdf (142002, 2023-11-20)
docs/dbdiagram.dbml (1766, 2023-11-20)
docs/dbdiagram.pdf (288985, 2023-11-20)
docs/queries.sql (1268, 2023-11-20)
docs/task_20.ipnb (253970, 2023-11-20)
docs/task_20.pdf (209194, 2023-11-20)
docs/task_24.json (1101, 2023-11-20)
instance/ (0, 2023-11-20)
main.py (3777, 2023-11-20)
pytest.ini (135, 2023-11-20)
requirements.txt (174, 2023-11-20)
run_coverage.sh (35, 2023-11-20)
run_pytest.sh (36, 2023-11-20)
statista-challenge-app.sh (2184, 2023-11-20)
templates/ (0, 2023-11-20)
templates/export.html (1287, 2023-11-20)
templates/index.html (249, 2023-11-20)
... ...

## Statista Programming Challenge [![tests](https://github.com/dsuprunov/python-statista-programming-challenge/actions/workflows/tests.yml/badge.svg)](https://github.com/dsuprunov/python-statista-programming-challenge/actions/workflows/tests.yml) [![docker build](https://github.com/dsuprunov/python-statista-programming-challenge/actions/workflows/docker.yml/badge.svg)](https://github.com/dsuprunov/python-statista-programming-challenge/actions/workflows/docker.yml) ## Table of Contents - [Task 1: Ingesting the data into a database](#task-1-ingesting-the-data-into-a-database) - [Task 2: Data insights](#task-2-data-insights) - [Task 3: Web Application](#task-3-web-application) - [Task 4: Making our app distribution ready](#task-4-making-our-app-distribution-ready) - [Task 5: Updating, testing, and documenting](#task-5-updating-testing-and-documenting) - [Bonus task](#bonus-task) If you are a Data Engineer or a DevOps Engineer, you can proceed directly to the [Task 5: Updating, testing, and documenting](#task-5-updating-testing-and-documenting) to review the instructions for launching the application and fulfilling all the infrastructure requirements. --- ## Task 1: Ingesting the data into a database - ### Preparation for the next two steps The project was developed using **Python** version `3.10.12` and additionally tested in version `3.12.0` For the next two steps, performed on your local Linux PC, create a virtual environment, activate it, and pre-install all packages listed in the `requirements.txt` file. ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` By default (without environmental parameters), a local SQLite database `./instance/census.db.sqlite3` will be created from a local CSV file `./data/Input_Dataset.csv`.

If you prefer to use PostgreSQL and/or change the location of the input CSV file, please check the `.docker.env` file, and remember to import it into the current environment before running the script(s).
In case you decided to use PostgreSQL, please create both the user and the database.
```sql CREATE USER census WITH PASSWORD 'secret'; CREATE DATABASE census OWNER census; ``` - ### Subtask: Normalized database schema - [As pdf](docs/dbdiagram.pdf) - [As interactive view](https://dbdiagram.io/d/655516d67d8bbd6465445e36) - [As DBML (Database Markup Language) file](docs/dbdiagram.dbml) - ### Subtask: Database schema creation script ```console ./db_create.py ``` - ### Subtask: Database data import ```console ./db_import.py ``` ## Task 2: Data insights - ### Subtask: [Jupyter Notebook](docs/task_20.ipnb) I added [Jupyter Notebook as PDF](docs/task_20.pdf) because sometimes my local instance of Jupyter Notebook does not display it correctly. - ### Subtask: [JSON with questions and answers](docs/task_24.json) ## Task 3: Web Application - ### at local Linux PC Start the web application ```bash ./main.py ``` The application will be accessible at [http://localhost:8181](http://localhost:8181) - ### as dockerized service - Make sure to complete all the steps outlined in [Task 5: Updating, testing, and documenting](#task-5-updating-testing-and-documenting) - The application will be accessible at `http://-YOUR-DOCKER-PC-ADDRESS-OR-NAME-:8181` - ### You can find 'all data', dynamically joined into one table (in a format similar to the original CSV), in the `unit` table. - ### For graphical visualizations, I discovered several appealing charts related to the 'unit' table. - Age vs Workclass - Gender vs Workclass - Age vs Income - etc ## Task 4: Making our app distribution ready - ### Subtask: [Dockerfile](Dockerfile) - ### Subtask: [Docker compose file](docker-compose.yml) ## Task 5: Updating, testing, and documenting ### Comments The entire process is built around the `./data/Input_Dataset.csv` file. Under normal circumstances, it is a part of the `statista-challenge-app:latest` image. If you want to work with your own dataset, ensuring it follows the same format and structure, you have two options: 1. Replace `./data/Input_Dataset.csv` with your file and rebuild the Docker image. \ You will need to restart the docker-compose service named `statista-challenge-app`. 2. Alternatively, you can mount `./data/` into your container. \ This will allow you to change the content of the `./data/Input_Dataset.csv` without the need to rebuild and restart the service.

Please uncomment the relevant code in `docker-compose.yml` ```yaml # volumes: # - ./data/:/app/data ``` But, once again, you will need to restart the docker-compose service `statista-challenge-app`. By default (without environmental parameters), a local (inside of `statista-challenge-app` container) SQLite database `/app/instance/census.db.sqlite3` will be created from a local (again inside of `statista-challenge-app` container) CSV file `/app/data/Input_Dataset.csv`.

If you prefer to use your own PostgreSQL server, please check the `.docker.env` file. The values in it will be automatically imported and used inside the containers. ### Typical workflow 1. **This step is necessary only if you have not downloaded and run this project before.** \ Download [statista-challenge-app.sh](statista-challenge-app.sh) if it has not been downloaded yet. ```bash wget https://raw.githubusercontent.com/dsuprunov/python-statista-programming-challenge/main/statista-challenge-app.sh chmod 755 statista-challenge-app.sh ./statista-challenge-app.sh pull rm statista-challenge-app.sh cd python-statista-programming-challenge ``` The other option is to simply clone the GitHub repository. \ You need to have the Git client installed. ```bash git clone https://github.com/dsuprunov/python-statista-programming-challenge.git cd python-statista-programming-challenge ``` 2. Pull the latest code from GitHub repo ```bash ./statista-challenge-app.sh pull ``` 3. Build an image ```bash ./statista-challenge-app.sh build ``` 4. Start the services ```bash ./statista-challenge-app.sh start ``` 5. Test database connection and import script ```bash ./statista-challenge-app.sh test ``` 6. Import data from CSV file ```bash ./statista-challenge-app.sh init ``` 7. Do something... ``` # For example, work with data or the database. ``` 8. Stop services (if you do not need them) ```bash ./statista-challenge-app.sh stop ``` ### General usage ```bash ./statista-challenge-app.sh {pull|start|init|help|stop|restart|build|test} ``` Commands: - **pull** - Pull the latest code from a GitHub repository - **start** - Create and starts the containers in the background and leaves them running - **init** - Populates own/containerized PostgreSQL database with data from CSV file - **help** - Displays this help message - **stop** - Stops running containers without removing them - **restart** - Restarts all stopped and running containers - **build** - Build or rebuild services - **test** - Tests the connection to the production database and uploads scripts ## Bonus task The bonus task, unfortunately, was not completed because I decided to invest my time in completing all the primary tasks, bringing the code into compliance with PEP8 standards (with a few exceptions, such as docstrings), formatting the documentation to a more readable state, and releasing a new version. Additionally, time was spent searching for a more optimal way to import a CSV file into the database. Two additional solutions were found (caching and bulk upload directly through the database), the implementation of which was beyond the scope of this assignment.

近期下载者

相关文件


收藏者