Duplicate Content · User Defined Rules #14

Open
opened 2024-08-16 08:17:00 +00:00 by corbz · 2 comments
Owner

I believe it's a race condition.

The first article is sent, and a request is pushed to the api to process it, but before it can be processed and stored in the db, the next article is processed and checked against existing content.

The best way might be to handle a locally stored set of sent content for every time the task is run, then clear it.
Must be async friendly.

image
I believe it's a race condition. The first article is sent, and a request is pushed to the api to process it, but before it can be processed and stored in the db, the next article is processed and checked against existing content. The best way might be to handle a locally stored set of sent content for every time the task is run, then clear it. Must be async friendly. <img width="310" alt="image" src="attachments/da491a08-1ad3-4a48-9b48-665776f287da">
corbz added the
bug
label 2024-08-16 08:17:00 +00:00
corbz self-assigned this 2024-08-16 08:17:00 +00:00
corbz added this to the PYRSS project 2024-08-16 08:17:00 +00:00
Author
Owner

This problem is even worse, it's entirely the fault of feed creators who don't stick to any standard for uniquely identifying a feed, as seen with this duplicate from the same rss feed.

image

I'm changing the scope of this issue, now the user must be able to define rules for identifying duplicate RSS items, I'll move the data model and user input site to the webapp, but it will need to be supported here.

This problem is even worse, it's entirely the fault of feed creators who don't stick to any standard for uniquely identifying a feed, as seen with this duplicate from **the same rss feed**. <img width="917" alt="image" src="attachments/e76f9f90-4cd5-4c6f-b5d2-45145f870228"> I'm changing the scope of this issue, now the user must be able to define rules for identifying duplicate RSS items, I'll move the data model and user input site to the webapp, but it will need to be supported here.
corbz changed title from Duplicate Content From Different Feeds to Duplicate Content · User Defined Rules 2024-09-13 18:50:42 +00:00
corbz added a new dependency 2024-09-13 19:59:06 +00:00
Author
Owner

I've added various options into the webui for how to identify duplicate articles, being:

  • GUID
  • ID
  • URL
  • Title
  • Content Hash

The user can select one or many per subscription.

This change needs to be reflected in this repository, via usage of the API.

I've added various options into the webui for how to identify duplicate articles, being: - GUID - ID - URL - Title - Content Hash The user can select one or many per subscription. This change needs to be reflected in this repository, via usage of the API.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Depends on
Reference: corbz/PYRSS-Bot#14
No description provided.