smartcsv:CSV很棒,但是却很笨。 让我们变得更聪明!

  • e8_100202
  • 30.3KB
  • zip
  • 0
  • VIP专享
  • 0
  • 2022-06-08 01:00
聪明又棒的CSV工具 CSV很棒,但是却很笨。 让我们变得更聪明! smartcsv是一个python实用程序,用于根据模型定义读取和解析CSV。 不仅可以将CSV解析为列表(例如内置的csv模块),还可以使用属性名称指定模型。 最重要的是,它添加了一些不错的功能,例如验证,自定义解析,故障控制和漂亮的错误消息。 >> > reader = smartcsv . reader ( file_object , columns = COLUMNS , fail_fast = False ) >> > my_object = next ( reader ) >> > my_object [ 'title' ] # Accessed by model name. 'iPhone 5c Blue' >> > my_object [ 'price' ] # Value transform in
![Travis status](   ![PyPi version](   ![PyPi downloads]( # Smart and awesome CSV utils **CSVs are awesome, yet they're pretty dumb. Let's get them smarter!** smartcsv is a python utility to read and parse CSVs based on model definitions. Instead of just parsing the CSV into lists (like the builtin `csv` module) it adds the ability to specify models with attributes names. On top of that it adds nice features like **validation, custom parsing, failure control and nice error messages**. ```python >>> reader = smartcsv.reader(file_object, columns=COLUMNS, fail_fast=False) >>> my_object = next(reader) >>> my_object['title'] # Accessed by model name. 'iPhone 5c Blue' >>> my_object['price'] # Value transform included Decimal("799.99") >>> my_object['currency'] # Based on choices = ['USD', 'YEN'] 'USD' >>> my_object['url'] # custom validator lambda v: v.startswith('http') # Nice errors >>> from pprint import pprint as pp >>> pp(my_object.errors) { 17: { # The row number 'row': ['','',...] # The complete row for reference, 'errors': { # Description of the errors 'url': 'Validation failed', 'currency': 'Invalid choice. Expected ['USD', 'YEN']. Got 'AUD' instead. } } } ``` ### Installation pip install smartcsv ### Usage To see an entire set of usages check the `test` package (99% coverage). The basic is to define a spec for the columns of your csv. Assuming the following CSV file: title,category,subcategory,currency,price,url,image_url iPhone 5c blue,Phones,Smartphones,USD,399,, iPad mini,Tablets,Apple,USD,699,, First you need to define the spec for your columns. This is an example (the one used in `tests`): ```python CURRENCIES = ('USD', 'ARS', 'JPY') COLUMNS_1 = [ {'name': 'title', 'required': True}, {'name': 'category', 'required': True}, {'name': 'subcategory', 'required': False}, { 'name': 'currency', 'required': True, 'choices': CURRENCIES }, { 'name': 'price', 'required': True, 'validator': is_number }, { 'name': 'url', 'required': True, 'validator': lambda c: c.startswith('http') }, { 'name': 'image_url', 'required': False, 'validator': lambda c: c.startswith('http') }, ] ``` You can then use `smartcsv` to parse the CSV: ```python import smartcsv with open('my-csv.csv', 'r') as f: reader = smartcsv.reader(f, columns=COLUMNS_1) for obj in reader: print(obj['title']) ``` `smartcsv.reader` uses the builtin `csv` module and accepts a dialect to use. ### More advanced usage **Errors** By default `smartcsv` will raise a `smartcsv.exceptions.InvalidCSVException` when it encounters an error in a column (a missing required field, a field different than choices, a validation failure, etc). The exception will have a nice error message in that case: ```python # Assuming the price field is missing try: item = next(reader) except InvalidCSVException as e: print(e.errors) # {'price': 'Field required and not provided.'} ``` You can always avoid fast-failure (raising an exception on failure). You can pass the `fail_fast` argument as `False`. That will prevent exceptions, instead the errors are reported in the reader object (indicating the row number and the detail of the errors). For example, assuming a CSV with the an error in the second row: ```python reader = smartcsv.reader(f, columns=COLUMNS_1, fail_fast=False) for obj in reader: # All the processing is done Ok without exceptions raised. print(obj['title']) error_row = reader.errors['rows'][1] # Second row has index = 1. Errors are 0-indexed. print(error_row['row']) # Print original row data print(error_row['errors'].keys()) # currency (the currency column) print(error_row['errors']['currency']) # Invalid currency... (nice error explanation) ``` You can also specify a `max_failures` parameter. It will count failures and will raise an exception when that threshold is exceeded. **Strip white spaces** By default the `strip_white_spaces` option is set to True. Example: ``` sample.csv title,price Some Product , 55.5 ``` `row['title']` will be "Some Product" and `row['price']` will be "55.5" (spaces stripped) **Skip lines** ``` sample.csv GENERATED BY AWESOME SCRIPT 2014-08-12 title,price Some Product,55.5 ``` The first 3 lines don't contain any valuable data so we'll skip them. ```python reader = smartcsv.reader(f, columns=COLUMNS_1, fail_fast=False, skip_lines=3) for obj in reader: print(obj['title']) ``` **Break (stop) on occurrance of first error** By default, value of `fail_fast` is `True`. You can also mention it explicitly with `fail_fast=True`. This will cause halting execution of reader() function as soon as it faces an error in the csv file. This error can be data mismatch in between your data specification and found value in csv file. Data-validation failure also trigger `fail_fast`. ```python reader = smartcsv.reader(f, columns=COLUMNS_1, fail_fast=True) for obj in reader: print(obj['title']) ``` ### Contributing Fork, code, watch your tests pass, submit PR. To test: ```bash $ python test # Run tests in your venv $ tox # Make sure it passes in all versions. ``` ### Integration tests There are "integration" tests included under `tests/integration`. They are not run by the default test runner. The idea of those tests is to have real examples of use cases for `smartcsv` documented. You'll have to run them manually: ```bash py.test tests/integration/lpnk/ ```