feedletter

所属分类:其他
开发工具:Scala
文件大小:0KB
下载次数:0
上传日期:2024-01-22 20:47:04
上 传 者sh-1993
说明:  将RSS提要转换为电子邮件新闻稿和通知桥的服务。
(A service that converts RSS feeds into e-mail newsletters and notification bridges.)

文件列表:
mill-build/
playpen/
resources/
src/com/mchange/feedletter/
untemplate/com/mchange/feedletter/default/email/
.mill-version
LICENSE
build.sc
feedletter
feedletter-style
millw

# feedletter ## Turn an RSS feed into a newsletter or notification bot ### Introduction **feedletter** is a service that - **watches RSS feeds with great care** * works great with feeds generated by static-site generators! * distinguishes between new items and older stuff or stuff already seen that flakily reappears * awaits "finalization" of items, meaning their stabilization (and nondeletion) over specified time intervals - lets you **define a wide variety of subscriptions** to those feeds * Over different media - e-mail - Post to Mastodon - SMS (coming soon!) - etc * In different arrangements - each item as newsletter - daily or weekly digests - compilalions of every `n` posts - etc - which are **formatted via rich, customizable [`untemplates`](https://github.com/swaldman/untemplate-doc#readme)** - which are **managed via a web API** for easy subscription, confirmation, and unsubscription by users ### Prerequisites The application requres a Java 17+ JVM and a Postgres database. Typical installations will proxy the web API behind e.g. `nginx`, and run the daemon as a `systemd` service. ### Getting Started A tutorial is coming soon! ### Developer Notes #### Lifecycle 1. A `feed` is added 2. One or more `subscribable`s is defined against the feed 3. One or more `destination`s subscribe to the feed. 4. `item`s are observed in the feed, and are added in the `Unassigned` state 5. Each `item` is assigned, in a single transaction, to all the collections (`assignable`s) to which they will ever belong. (Steps 4 and 5 can repeat arbitrarily as new items come in.) 6. Separately, collections (`assignable`s) are periodically marked "complete" and, in the same transaction forwarded to subscribers. 7. Complete `assignable`s are deleted, along with their `assignment`s 8. `item`s that are... * Already assigned * No longer belong to not-yet-completed `assignables` can drop their cached contents, and then move into the `Cleared` state. #### Database Schema I want to sketch the not-so-obvious db schema I've adopted for this project while I still understand it. ##### feed First there are feeds: ```sql CREATE TABLE feed( id INTEGER, url VARCHAR(1024), min_delay_minutes INTEGER NOT NULL, await_stabilization_minutes INTEGER NOT NULL, max_delay_minutes INTEGER NOT NULL, assign_every_minutes INTEGER NOT NULL, added TIMESTAMP NOT NULL, last_assigned TIMESTAMP NOT NULL, -- we'll start at added PRIMARY KEY(id) ) ``` Feeds must be defined before subscriptions can be created against them. They are defined by a URL, and they define what it means for a feed to "finalize", in the sense of being ready for notification. Feeds are permanent and basically unchanging until (when someday I implement this) they are manually removed. (`last_assigned` changes, but so far it's just informational, has no role in the application.) ##### item Next there are items: ```sql CREATE TABLE item( feed_id INTEGER, guid VARCHAR(1024), single_item_rss TEXT, content_hash INTEGER, -- ItemContent.contentHash link VARCHAR(1024), first_seen TIMESTAMP NOT NULL, last_checked TIMESTAMP NOT NULL, stable_since TIMESTAMP NOT NULL, assignability ItemAssignability NOT NULL, PRIMARY KEY(feed_id, guid), FOREIGN KEY(feed_id) REFERENCES feed(id) )""".stripMargin ``` * `feed_id` and `guid` identify an item. * `single_item_rss` _caches_ the RSS item. We want to cache this, in case by the time we get around to notifying, the item is no longer available in the feed. * `content_hash` is a hash based on the prior five fields. We use it to identify whether an item has changed. * `link` may eventually be used as a neurotic double-check so we never notify the same human-perceived item twice * `first_seen`, `last_checked`, and `stable_since` are pretty self-explanatory timestamps, We use these to calculate whether an item has stabilized and so can be "assigned". (See below.) * `assignability`: items can be in one of four states * `Unassigned` — The item has not yet been assigned to the collections (including single member collections) to which it will eventually belong, but is eligible for assignment. * `Assigned` — The item _hash_ been assigned to _all_ the collections (including single member collections) to which it will eventually belong. The application may not be done assigning to those collections, and the items may not yet be distributed to subscribers. * `Cleared` — This is the terminal state for an item. The item has been assigned to all collections, and have already been distributed to subscribers. The cache fields (`title`, `author`, `article`, `publication_date`, and `link`) should all be cleared in this state. `Cleared` items are not deleted, but retained indefinitely, so that we don't renotify if the item (an item with the same `guid`) reappears in the feed. * `Excluded` — Items which are marked to always be ignored. Items are marke `Excluded` only upon initial insert. Items can be manually updated from `Excluded` to `Unassigned` (timestamps should be reset to the tie of the update), to cause `Excluded` posts to be published. ##### subscribable (subscription definition) Next there is `subscribable`, which represents the definition of a subscription by which parties will be notified of items or collections of items. ```sql CREATE TABLE subscribable( subscribable_name VARCHAR(64), feed_id INTEGER NOT NULL, subscription_manager_json JSONB NOT NULL, last_completed_wti VARCHAR(1024), PRIMARY KEY (subscribable_name), FOREIGN KEY (feed_id) REFERENCES feed(id) ) ``` A subscribable maps a name to a feed and a `SubscriptionManager`. For our purposes here, the main role of a `SubscriptionManager` (a serialization of a Scala ADT) is to 1. Generate for items a `within_type_id`, which is really just a collection identifier. All items in a collection of items that will be distributed will share the same `within_type_id`. 2. Determine whether a collection (identified by its `within_type_id`) is "complete" — that is, no further items need by assigned the same `within_type_id`. 3. When a collection has been notified, it is deleted from the database. However, some `SubscriptionManagers` need to maintain a sequence of `within_type_id` identifiers. So for each subscribable, the `last_completed_wti` is retained. `SubscriptionManager` determines how collections are compiled, to what kind of destination (e-mail, Mastodon, mobile message, whatever) notifications will be sent, and how they will be formatted. Names are scoped on a per-feed-URL basis. Users subscribe to a `(feed_url, subscribable_name)` pair. ##### assignable (a collection of items) Next there is `assignable`, which represents a collection. They essentially map `subscribables` (subscription definitions) to `within_type_id`s (the collections generated by the subscription definition and notified to subscribers). ```sql CREATE TABLE assignable( subscribable_name VARCHAR(64), within_type_id VARCHAR(1024), opened TIMESTAMP NOT NULL, PRIMARY KEY(subscribable_name, within_type_id), FOREIGN KEY(subscribable_name) REFERENCES subscribable(subscribable_name) ) ``` `opened` is the timestamp of the first assignment to the collection. Once an assignable has been notified ("completed"), it is simply deleted from the database. For each subscribable, the `within_type_id` of only the most recently completed assignable is retained (see `subscribable` table above). ##### assignment (an item in a collection) Next there is `assignment`, which represents an item in an `assignable`, i.e. a collection. It's pretty self-explanatory I think. ```sql CREATE TABLE assignment( subscribable_name VARCHAR(64), within_type_id VARCHAR(1024), guid VARCHAR(1024), PRIMARY KEY( subscribable_name, within_type_id, guid ), FOREIGN KEY( subscribable_name, within_type_id ) REFERENCES assignable( subscribable_name, within_type_id ) ) ``` ##### subscription Next there is `subscription`, which just maps a destination to a `subscribable`. the destination is JSON blob that can refer to a variety of things: e-mail addresses, SMS numbers, mastodon instances, etc. Each `SubscriptionManager` works with a destination subtype. ```sql CREATE TABLE subscription( subscription_id BIGINT, destination_json JSONB NOT NULL, destination_unique VARCHAR(1024) NOT NULL, subscribable_name VARCHAR(64) NOT NULL, confirmed BOOLEAN NOT NULL, added TIMESTAMP NOT NULL, PRIMARY KEY( subscription_id ), FOREIGN KEY( subscribable_name ) REFERENCES subscribable( subscribable_name ) ) ``` Since destinations can have ornamentation (an e-mail address, for example, might have a personal part (e.g. Buffy in "Buffy "), it's not sufficient to prevent multiple subscriptions to insist that the JSON entities be unique. So destinations declare a unique core, whose uniqueness within a subscription the database enforces: ```sql CREATE UNIQUE INDEX destination_unique_subscribable_name ON subscription(destination_unique, subscribable_name) ``` That's it for the base schema! There are also tables that convert destinations specific to subscription types into their various queues for notification. I'm omitting those for now.

近期下载者

相关文件


收藏者