# TableCrow
[![tests](https://github.com/zacharyburnett/TableCrow/workflows/tests/badge.svg)](https://github.com/zacharyburnett/TableCrow/actions?query=workflow%3Atests)
[![codecov](https://codecov.io/gh/zacharyburnett/tablecrow/branch/master/graph/badge.svg?token=FCJ2HXV0K6)](https://codecov.io/gh/zacharyburnett/tablecrow)
[![build](https://github.com/zacharyburnett/TableCrow/workflows/build/badge.svg)](https://github.com/zacharyburnett/TableCrow/actions?query=workflow%3Abuild)
[![version](https://img.shields.io/pypi/v/tablecrow)](https://pypi.org/project/tablecrow)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
`tablecrow` is an abstraction over a generalized database table. Currently, `tablecrow` offers an abstraction for PostGreSQL and SQLite tables with
simple PostGIS and SpatiaLite operations.
```bash
pip install tablecrow
```
## Data Model
`tablecrow` sees a database schema as a mapping of field names to Python types, and a database record / row as a mapping of field names to values:
```python
from datetime import datetime
fields = {'id': int, 'time': datetime, 'length': float, 'name': str}
record = {'id': 1, 'time': datetime(2020, 1, 1), 'length': 4.4, 'name': 'long boi'}
```
For databases with a spatial extension, you can use [Shapely geometries](https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects):
```python
from shapely.geometry import Polygon
fields = {'id': int, 'polygon': Polygon}
record = {'id': 1, 'polygon': Polygon([(-77.1, 39.65), (-77.1, 39.725), (-77.4, 39.725), (-77.4, 39.65), (-77.1, 39.65)])}
```
## Usage
#### connect to an existing database
```python
import tablecrow
# list all tables in a SQLite database file
sqlite_tables = tablecrow.connect('test_database.db')
# connect to a PostGres database table
postgres_table = tablecrow.connect(
'https://user:password@test.com/database:5432',
database='postgres',
table_names=['test_table'],
)
```
#### create a simple table (single primary key, no geometries)
```python
from datetime import datetime
from tablecrow import PostGresTable
table = PostGresTable(
hostname='localhost:5432',
database='postgres',
table_name='testing',
fields={'id': int, 'time': datetime, 'length': float, 'name': str},
primary_key='id',
username='postgres',
password='
',
)
# add a list of records
table.insert([
{'id': 1, 'time': datetime(2020, 1, 1), 'length': 4.4, 'name': 'long boi'},
{'id': 3, 'time': datetime(2020, 1, 3), 'length': 2, 'name': 'short boi'},
{'id': 2},
{'id': 15, 'time': datetime(2020, 3, 3)},
])
# set, access, or delete a single record using its primary key value
table[4] = {'time': datetime(2020, 1, 4), 'length': 5, 'name': 'long'}
record = table[3]
del table[2]
# list of records in the table
num_records = len(table)
records = table.records
# query the database with a dictionary, or a SQL `WHERE` clause as a string
records = table.records_where({'name': 'short boi'})
records = table.records_where({'name': None})
records = table.records_where({'name': '%long%'})
records = table.records_where("time <= '20200102'::date")
records = table.records_where("length > 2 OR name ILIKE '%short%'")
# delete records with a query
table.delete_where({'name': None})
```
#### create a table with multiple primary key fields
```python
from datetime import datetime
from tablecrow import SQLiteTable
table = SQLiteTable(
path='test_database.db',
table_name='testing',
fields={'id': int, 'time': datetime, 'length': float, 'name': str},
primary_key=('id', 'name'),
)
# a compound primary key allows more flexibility in ID
table.insert([
{'id': 1, 'time': datetime(2020, 1, 1), 'length': 4.4, 'name': 'long boi'},
{'id': 1, 'time': datetime(2020, 1, 1), 'length': 3, 'name': 'short boi'},
{'id': 3, 'time': datetime(2020, 1, 3), 'length': 2, 'name': 'short boi'},
{'id': 3, 'time': datetime(2020, 1, 3), 'length': 6, 'name': 'long boi'},
{'id': 2, 'name': 'short boi'},
])
# key accessors must include entire primary key
table[4, 'long'] = {'time': datetime(2020, 1, 4), 'length': 5}
record = table[3, 'long boi']
```
#### create a table with geometry fields
the database must have a spatial extension (such as PostGIS) installed
```python
from pyproj import CRS
from shapely.geometry import MultiPolygon, Polygon, box
from tablecrow import PostGresTable
table = PostGresTable(
hostname='localhost:5432',
database='postgres',
table_name='testing',
fields={'id': int, 'polygon': Polygon, 'multipolygon': MultiPolygon},
primary_key='id',
username='postgres',
password='',
crs=CRS.from_epsg(4326),
)
big_box = box(-77.4, 39.65, -77.1, 39.725)
little_box_inside_big_box = box(-77.7, 39.725, -77.4, 39.8)
little_box_touching_big_box = box(-77.1, 39.575, -76.8, 39.65)
disparate_box = box(-77.7, 39.425, -77.4, 39.5)
big_box_in_utm18n = box(268397.8, 4392279.8, 320292.0, 4407509.6)
multi_box = MultiPolygon([little_box_inside_big_box, little_box_touching_big_box])
table.insert([
{'id': 1, 'polygon': little_box_inside_big_box},
{'id': 2, 'polygon': little_box_touching_big_box},
{'id': 3, 'polygon': disparate_box, 'multipolygon': multi_box},
])
# find all records with any geometry intersecting the given geometry
records = table.records_intersecting(big_box)
# find all records with only specific geometry fields intersecting the given geometry
records = table.records_intersecting(big_box, geometry_fields=['polygon'])
# you can also provide geometries in a different CRS
records = table.records_intersecting(
big_box_in_utm18n,
crs=CRS.from_epsg(32618),
geometry_fields=['polygon'],
)
```
## Extending
to write your own custom table interface, extend `DatabaseTable`:
```python
from typing import Any, Mapping, Sequence, Union
from tablecrow.tables.base import DatabaseTable
class CustomDatabaseTable(DatabaseTable):
# mapping from Python types to database types
FIELD_TYPES = {
'NoneType': 'NotImplemented',
'bool': 'NotImplemented',
'float': 'NotImplemented',
'int': 'NotImplemented',
'str': 'NotImplemented',
'bytes': 'NotImplemented',
'date': 'NotImplemented',
'time': 'NotImplemented',
'datetime': 'NotImplemented',
'timedelta': 'NotImplemented',
}
def __init__(self, database: str, name: str, fields: {str: type}, primary_key: Union[str, Sequence[str]] = None, hostname: str = None,
username: str = None, password: str = None, users: [str] = None):
super().__init__(database, name, fields, primary_key, hostname, username, password, users)
raise NotImplementedError('implement database connection and table creation here')
@property
def exists(self) -> bool:
raise NotImplementedError('implement database table existence check here')
@property
def schema(self) -> str:
raise NotImplementedError('implement string generation for the database schema here')
@property
def remote_fields(self) -> {str: type}:
raise NotImplementedError('implement accessor for database fields here')
def records_where(self, where: Union[Mapping[str, Any], str, Sequence[str]]) -> [{str: Any}]:
raise NotImplementedError('implement database record query here')
def insert(self, records: [{str: Any}]):
raise NotImplementedError('implement database record insertion here')
def delete_where(self, where: Union[Mapping[str, Any], str, Sequence[str]]):
raise NotImplementedError('implement database record deletion here')
def delete_table(self):
raise NotImplementedError('implement database table deletion here')
```
## Acknowledgements
The original core code and methodology of `tablecrow` was developed for the National Bathymetric Source project under
the [Office of Coast Survey of the National Oceanic and Atmospheric Administration (NOAA)](https://nauticalcharts.noaa.gov), a part of the United
States Department of Commerce, by [@glenrice-noaa](https://github.com/glenrice-noaa), [@Casiano-Koprowski](https://github.com/Casiano-Koprowski),
and [@zacharyburnett](https://github.com/zacharyburnett).