README (36379B)
1 sfeed 2 ----- 3 4 RSS and Atom parser (and some format programs). 5 6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are 7 formatting programs included to convert this TAB-separated format to various 8 other formats. There are also some programs and scripts included to import and 9 export OPML and to fetch, filter, merge and order feed items. 10 11 12 gearsix changes 13 --------------- 14 15 I've just added the sfeed_read script, which runs sfeed_updated and uses the 16 specified sfeed_* tool (defaults to html, specified with -e) to xdg-user-dir 17 DOCUMENTS (or ~/Documents if missing). 18 Output files are named after the current time. 19 Finally, it opens the generated output file using xdg-open. 20 21 22 Build and install 23 ----------------- 24 25 $ make 26 # make install 27 28 29 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: 30 31 $ make SFEED_CURSES="" 32 # make SFEED_CURSES="" install 33 34 35 To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ 36 directory for the theme names. 37 38 $ make SFEED_THEME="templeos" 39 # make SFEED_THEME="templeos" install 40 41 42 Usage 43 ----- 44 45 Initial setup: 46 47 mkdir -p "$HOME/.sfeed/feeds" 48 cp sfeedrc.example "$HOME/.sfeed/sfeedrc" 49 50 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file 51 is included and evaluated as a shellscript for sfeed_update, so it's functions 52 and behaviour can be overridden: 53 54 $EDITOR "$HOME/.sfeed/sfeedrc" 55 56 or you can import existing OPML subscriptions using sfeed_opml_import(1): 57 58 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" 59 60 an example to export from an other RSS/Atom reader called newsboat and import 61 for sfeed_update: 62 63 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 64 65 an example to export from an other RSS/Atom reader called rss2email (3.x+) and 66 import for sfeed_update: 67 68 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 69 70 Update feeds, this script merges the new items, see sfeed_update(1) for more 71 information what it can do: 72 73 sfeed_update 74 75 Format feeds: 76 77 Plain-text list: 78 79 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" 80 81 HTML view (no frames), copy style.css for a default style: 82 83 cp style.css "$HOME/.sfeed/style.css" 84 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" 85 86 HTML view with the menu as frames, copy style.css for a default style: 87 88 mkdir -p "$HOME/.sfeed/frames" 89 cp style.css "$HOME/.sfeed/frames/style.css" 90 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* 91 92 To automatically update your feeds periodically and format them in a way you 93 like you can make a wrapper script and add it as a cronjob. 94 95 Most protocols are supported because curl(1) is used by default and also proxy 96 settings from the environment (such as the $http_proxy environment variable) 97 are used. 98 99 The sfeed(1) program itself is just a parser that parses XML data from stdin 100 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, 101 Gopher, SSH, etc. 102 103 See the section "Usage and examples" below and the man-pages for more 104 information how to use sfeed(1) and the additional tools. 105 106 107 Dependencies 108 ------------ 109 110 - C compiler (C99). 111 - libc (recommended: C99 and POSIX >= 200809). 112 113 114 Optional dependencies 115 --------------------- 116 117 - POSIX make(1) for the Makefile. 118 - POSIX sh(1), 119 used by sfeed_update(1) and sfeed_opml_export(1). 120 - POSIX utilities such as awk(1) and sort(1), 121 used by sfeed_content(1), sfeed_markread(1) and sfeed_update(1). 122 - curl(1) binary: https://curl.haxx.se/ , 123 used by sfeed_update(1), but can be replaced with any tool like wget(1), 124 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ 125 - iconv(1) command-line utilities, 126 used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 127 encoded then you don't need this. For a minimal iconv implementation: 128 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c 129 - mandoc for documentation: https://mdocml.bsd.lv/ 130 - curses (typically ncurses), otherwise see minicurses.h, 131 used by sfeed_curses(1). 132 - a terminal (emulator) supporting UTF-8 and the used capabilities, 133 used by sfeed_curses(1). 134 135 136 Optional run-time dependencies for sfeed_curses 137 ----------------------------------------------- 138 139 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. 140 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. 141 - awk, used by the sfeed_content and sfeed_markread script. 142 See the ENVIRONMENT VARIABLES section in the man page to change it. 143 - lynx, used by the sfeed_content script to convert HTML content. 144 See the ENVIRONMENT VARIABLES section in the man page to change it. 145 146 147 Formats supported 148 ----------------- 149 150 sfeed supports a subset of XML 1.0 and a subset of: 151 152 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 153 - Atom 0.3 (draft, historic). 154 - RSS 0.91+. 155 - RDF (when used with RSS). 156 - MediaRSS extensions (media:). 157 - Dublin Core extensions (dc:). 158 159 Other formats like JSONfeed, twtxt or certain RSS/Atom extensions are supported 160 by converting them to RSS/Atom or to the sfeed(5) format directly. 161 162 163 OS tested 164 --------- 165 166 - Linux, 167 compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, 168 libc: glibc, musl. 169 - OpenBSD (clang, gcc). 170 - NetBSD (with NetBSD curses). 171 - FreeBSD 172 - DragonFlyBSD 173 - GNU/Hurd 174 - Illumos (OpenIndiana). 175 - Windows (cygwin gcc + mintty, mingw). 176 - HaikuOS 177 - SerenityOS 178 - FreeDOS (djgpp). 179 - FUZIX (sdcc -mz80, with the sfeed parser program). 180 181 182 Architectures tested 183 -------------------- 184 185 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. 186 187 188 Files 189 ----- 190 191 sfeed - Read XML RSS or Atom feed data from stdin. Write feed data 192 in TAB-separated format to stdout. 193 sfeed_atom - Format feed data (TSV) to an Atom feed. 194 sfeed_content - View item content, for use with sfeed_curses. 195 sfeed_curses - Format feed data (TSV) to a curses interface. 196 sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. 197 sfeed_gopher - Format feed data (TSV) to Gopher files. 198 sfeed_html - Format feed data (TSV) to HTML. 199 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. 200 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. 201 sfeed_markread - Mark items as read/unread, for use with sfeed_curses. 202 sfeed_mbox - Format feed data (TSV) to mbox. 203 sfeed_plain - Format feed data (TSV) to a plain-text list. 204 sfeed_twtxt - Format feed data (TSV) to a twtxt feed. 205 sfeed_update - Update feeds and merge items. 206 sfeed_web - Find URLs to RSS/Atom feed from a webpage. 207 sfeed_xmlenc - Detect character-set encoding from a XML stream. 208 sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. 209 style.css - Example stylesheet to use with sfeed_html(1) and 210 sfeed_frames(1). 211 212 213 Files read at runtime by sfeed_update(1) 214 ---------------------------------------- 215 216 sfeedrc - Config file. This file is evaluated as a shellscript in 217 sfeed_update(1). 218 219 At least the following functions can be overridden per feed: 220 221 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program. 222 - filter: to filter on fields. 223 - merge: to change the merge logic. 224 - order: to change the sort order. 225 226 See also the sfeedrc(5) man page documentation for more details. 227 228 The feeds() function is called to process the feeds. The default feed() 229 function is executed concurrently as a background job in your sfeedrc(5) config 230 file to make updating faster. The variable maxjobs can be changed to limit or 231 increase the amount of concurrent jobs (8 by default). 232 233 234 Files written at runtime by sfeed_update(1) 235 ------------------------------------------- 236 237 feedname - TAB-separated format containing all items per feed. The 238 sfeed_update(1) script merges new items with this file. 239 The format is documented in sfeed(5). 240 241 242 File format 243 ----------- 244 245 man 5 sfeed 246 man 5 sfeedrc 247 man 1 sfeed 248 249 250 Usage and examples 251 ------------------ 252 253 Find RSS/Atom feed URLs from a webpage: 254 255 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" 256 257 output example: 258 259 https://codemadness.org/atom.xml application/atom+xml 260 https://codemadness.org/atom_content.xml application/atom+xml 261 262 - - - 263 264 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To 265 update your feeds (configfile argument is optional): 266 267 sfeed_update "configfile" 268 269 Format the feeds files: 270 271 # Plain-text list. 272 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt 273 # HTML view (no frames), copy style.css for a default style. 274 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html 275 # HTML view with the menu as frames, copy style.css for a default style. 276 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* 277 278 View formatted output in your browser: 279 280 $BROWSER "$HOME/.sfeed/feeds.html" 281 282 View formatted output in your editor: 283 284 $EDITOR "$HOME/.sfeed/feeds.txt" 285 286 - - - 287 288 View formatted output in a curses interface. The interface has a look inspired 289 by the mutt mail client. It has a sidebar panel for the feeds, a panel with a 290 listing of the items and a small statusbar for the selected item/URL. Some 291 functions like searching and scrolling are integrated in the interface itself. 292 293 Just like the other format programs included in sfeed you can run it like this: 294 295 sfeed_curses ~/.sfeed/feeds/* 296 297 ... or by reading from stdin: 298 299 sfeed_curses < ~/.sfeed/feeds/xkcd 300 301 By default sfeed_curses marks the items of the last day as new/bold. To manage 302 read/unread items in a different way a plain-text file with a list of the read 303 URLs can be used. To enable this behaviour the path to this file can be 304 specified by setting the environment variable $SFEED_URL_FILE to the URL file: 305 306 export SFEED_URL_FILE="$HOME/.sfeed/urls" 307 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" 308 sfeed_curses ~/.sfeed/feeds/* 309 310 It then uses the shellscript "sfeed_markread" to process the read and unread 311 items. 312 313 - - - 314 315 Example script to view feed items in a vertical list/menu in dmenu(1). It opens 316 the selected URL in the browser set in $BROWSER: 317 318 #!/bin/sh 319 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ 320 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') 321 test -n "${url}" && $BROWSER "${url}" 322 323 dmenu can be found at: https://git.suckless.org/dmenu/ 324 325 - - - 326 327 Generate a sfeedrc config file from your exported list of feeds in OPML 328 format: 329 330 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc 331 332 - - - 333 334 Export an OPML file of your feeds from a sfeedrc config file (configfile 335 argument is optional): 336 337 sfeed_opml_export configfile > myfeeds.opml 338 339 - - - 340 341 The filter function can be overridden in your sfeedrc file. This allows 342 filtering items per feed. It can be used to shorten URLs, filter away 343 advertisements, strip tracking parameters and more. 344 345 # filter fields. 346 # filter(name) 347 filter() { 348 case "$1" in 349 "tweakers") 350 awk -F '\t' 'BEGIN { OFS = "\t"; } 351 # skip ads. 352 $2 ~ /^ADV:/ { 353 next; 354 } 355 # shorten link. 356 { 357 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { 358 $3 = substr($3, RSTART, RLENGTH); 359 } 360 print $0; 361 }';; 362 "yt BSDNow") 363 # filter only BSD Now from channel. 364 awk -F '\t' '$2 ~ / \| BSD Now/';; 365 *) 366 cat;; 367 esac | \ 368 # replace youtube links with embed links. 369 sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ 370 371 awk -F '\t' 'BEGIN { OFS = "\t"; } 372 function filterlink(s) { 373 # protocol must start with http, https or gopher. 374 if (match(s, /^(http|https|gopher):\/\//) == 0) { 375 return ""; 376 } 377 378 # shorten feedburner links. 379 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { 380 s = substr($3, RSTART, RLENGTH); 381 } 382 383 # strip tracking parameters 384 # urchin, facebook, piwik, webtrekk and generic. 385 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); 386 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); 387 388 gsub(/\?&/, "?", s); 389 gsub(/[\?&]+$/, "", s); 390 391 return s 392 } 393 { 394 $3 = filterlink($3); # link 395 $8 = filterlink($8); # enclosure 396 397 # try to remove tracking pixels: <img/> tags with 1px width or height. 398 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); 399 400 print $0; 401 }' 402 } 403 404 - - - 405 406 Aggregate feeds. This filters new entries (maximum one day old) and sorts them 407 by newest first. Prefix the feed name in the title. Convert the TSV output data 408 to an Atom XML feed (again): 409 410 #!/bin/sh 411 cd ~/.sfeed/feeds/ || exit 1 412 413 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 414 BEGIN { OFS = "\t"; } 415 int($1) >= old { 416 $2 = "[" FILENAME "] " $2; 417 print $0; 418 }' * | \ 419 sort -k1,1rn | \ 420 sfeed_atom 421 422 - - - 423 424 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and 425 showing them as plain-text per line similar to sfeed_plain(1): 426 427 Create a FIFO: 428 429 fifo="/tmp/sfeed_fifo" 430 mkfifo "$fifo" 431 432 On the reading side: 433 434 # This keeps track of unique lines so might consume much memory. 435 # It tries to reopen the $fifo after 1 second if it fails. 436 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' 437 438 On the writing side: 439 440 feedsdir="$HOME/.sfeed/feeds/" 441 cd "$feedsdir" || exit 1 442 test -p "$fifo" || exit 1 443 444 # 1 day is old news, don't write older items. 445 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 446 BEGIN { OFS = "\t"; } 447 int($1) >= old { 448 $2 = "[" FILENAME "] " $2; 449 print $0; 450 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" 451 452 cut -b is used to trim the "N " prefix of sfeed_plain(1). 453 454 - - - 455 456 For some podcast feed the following code can be used to filter the latest 457 enclosure URL (probably some audio file): 458 459 awk -F '\t' 'BEGIN { latest = 0; } 460 length($8) { 461 ts = int($1); 462 if (ts > latest) { 463 url = $8; 464 latest = ts; 465 } 466 } 467 END { if (length(url)) { print url; } }' 468 469 ... or on a file already sorted from newest to oldest: 470 471 awk -F '\t' '$8 { print $8; exit }' 472 473 - - - 474 475 Over time your feeds file might become quite big. You can archive items of a 476 feed from (roughly) the last week by doing for example: 477 478 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new 479 mv feed feed.bak 480 mv feed.new feed 481 482 This could also be run weekly in a crontab to archive the feeds. Like throwing 483 away old newspapers. It keeps the feeds list tidy and the formatted output 484 small. 485 486 - - - 487 488 Convert mbox to separate maildirs per feed and filter duplicate messages using the 489 fdm program. 490 fdm is available at: https://github.com/nicm/fdm 491 492 fdm config file (~/.sfeed/fdm.conf): 493 494 set unmatched-mail keep 495 496 account "sfeed" mbox "%[home]/.sfeed/mbox" 497 $cachepath = "%[home]/.sfeed/fdm.cache" 498 cache "${cachepath}" 499 $maildir = "%[home]/feeds/" 500 501 # Check if message is in the cache by Message-ID. 502 match case "^Message-ID: (.*)" in headers 503 action { 504 tag "msgid" value "%1" 505 } 506 continue 507 508 # If it is in the cache, stop. 509 match matched and in-cache "${cachepath}" key "%[msgid]" 510 action { 511 keep 512 } 513 514 # Not in the cache, process it and add to cache. 515 match case "^X-Feedname: (.*)" in headers 516 action { 517 # Store to local maildir. 518 maildir "${maildir}%1" 519 520 add-to-cache "${cachepath}" key "%[msgid]" 521 keep 522 } 523 524 Now run: 525 526 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 527 $ fdm -f ~/.sfeed/fdm.conf fetch 528 529 Now you can view feeds in mutt(1) for example. 530 531 - - - 532 533 Read from mbox and filter duplicate messages using the fdm program and deliver 534 it to a SMTP server. This works similar to the rss2email program. 535 fdm is available at: https://github.com/nicm/fdm 536 537 fdm config file (~/.sfeed/fdm.conf): 538 539 set unmatched-mail keep 540 541 account "sfeed" mbox "%[home]/.sfeed/mbox" 542 $cachepath = "%[home]/.sfeed/fdm.cache" 543 cache "${cachepath}" 544 545 # Check if message is in the cache by Message-ID. 546 match case "^Message-ID: (.*)" in headers 547 action { 548 tag "msgid" value "%1" 549 } 550 continue 551 552 # If it is in the cache, stop. 553 match matched and in-cache "${cachepath}" key "%[msgid]" 554 action { 555 keep 556 } 557 558 # Not in the cache, process it and add to cache. 559 match case "^X-Feedname: (.*)" in headers 560 action { 561 # Connect to a SMTP server and attempt to deliver the 562 # mail to it. 563 # Of course change the server and e-mail below. 564 smtp server "codemadness.org" to "hiltjo@codemadness.org" 565 566 add-to-cache "${cachepath}" key "%[msgid]" 567 keep 568 } 569 570 Now run: 571 572 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 573 $ fdm -f ~/.sfeed/fdm.conf fetch 574 575 Now you can view feeds in mutt(1) for example. 576 577 - - - 578 579 Convert mbox to separate maildirs per feed and filter duplicate messages using 580 procmail(1). 581 582 procmail_maildirs.sh file: 583 584 maildir="$HOME/feeds" 585 feedsdir="$HOME/.sfeed/feeds" 586 procmailconfig="$HOME/.sfeed/procmailrc" 587 588 # message-id cache to prevent duplicates. 589 mkdir -p "${maildir}/.cache" 590 591 if ! test -r "${procmailconfig}"; then 592 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 593 echo "See procmailrc.example for an example." >&2 594 exit 1 595 fi 596 597 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do 598 name=$(basename "${d}") 599 mkdir -p "${maildir}/${name}/cur" 600 mkdir -p "${maildir}/${name}/new" 601 mkdir -p "${maildir}/${name}/tmp" 602 printf 'Mailbox %s\n' "${name}" 603 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" 604 done 605 606 Procmailrc(5) file: 607 608 # Example for use with sfeed_mbox(1). 609 # The header X-Feedname is used to split into separate maildirs. It is 610 # assumed this name is sane. 611 612 MAILDIR="$HOME/feeds/" 613 614 :0 615 * ^X-Feedname: \/.* 616 { 617 FEED="$MATCH" 618 619 :0 Wh: "msgid_$FEED.lock" 620 | formail -D 1024000 ".cache/msgid_$FEED.cache" 621 622 :0 623 "$FEED"/ 624 } 625 626 Now run: 627 628 $ procmail_maildirs.sh 629 630 Now you can view feeds in mutt(1) for example. 631 632 - - - 633 634 The fetch function can be overridden in your sfeedrc file. This allows to 635 replace the default curl(1) for sfeed_update with any other client to fetch the 636 RSS/Atom data or change the default curl options: 637 638 # fetch a feed via HTTP/HTTPS etc. 639 # fetch(name, url, feedfile) 640 fetch() { 641 hurl -m 1048576 -t 15 "$2" 2>/dev/null 642 } 643 644 - - - 645 646 Caching, incremental data updates and bandwidth-saving 647 648 For servers that support it some incremental updates and bandwidth-saving can 649 be done by using the "ETag" HTTP header. 650 651 Create a directory for storing the ETags per feed: 652 653 mkdir -p ~/.sfeed/etags/ 654 655 The curl ETag options (--etag-save and --etag-compare) can be used to store and 656 send the previous ETag header value. curl version 7.73+ is recommended for it 657 to work properly. 658 659 The curl -z option can be used to send the modification date of a local file as 660 a HTTP "If-Modified-Since" request header. The server can then respond if the 661 data is modified or not or respond with only the incremental data. 662 663 The curl --compressed option can be used to indicate the client supports 664 decompression. Because RSS/Atom feeds are textual XML content this generally 665 compresses very well. 666 667 These options can be set by overriding the fetch() function in the sfeedrc 668 file: 669 670 # fetch(name, url, feedfile) 671 fetch() { 672 etag="$HOME/.sfeed/etags/$(basename "$3")" 673 curl \ 674 -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \ 675 --compressed \ 676 --etag-save "${etag}" --etag-compare "${etag}" \ 677 -z "${etag}" \ 678 "$2" 2>/dev/null 679 } 680 681 These options can come at a cost of some privacy, because it exposes 682 additional metadata from the previous request. 683 684 - - - 685 686 CDNs blocking requests due to a missing HTTP User-Agent request header 687 688 sfeed_update will not send the "User-Agent" header by default for privacy 689 reasons. Some CDNs like Cloudflare don't like this and will block such HTTP 690 requests. 691 692 A custom User-Agent can be set by using the curl -H option, like so: 693 694 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' 695 696 The above example string pretends to be a Windows 10 (x86-64) machine running 697 Firefox 78. 698 699 - - - 700 701 Page redirects 702 703 For security and efficiency reasons by default redirects are not allowed and 704 are treated as an error. 705 706 For example to prevent hijacking an unencrypted http:// to https:// redirect or 707 to not add time of an unnecessary page redirect each time. It is encouraged to 708 use the final redirected URL in the sfeedrc config file. 709 710 If you want to ignore this advise you can override the fetch() function in the 711 sfeedrc file and change the curl options "-L --max-redirs 0". 712 713 - - - 714 715 Shellscript to update feeds in parallel more efficiently using xargs -P. 716 717 It creates a queue of the feeds with its settings, then uses xargs to process 718 them in parallel using the common, but non-POSIX -P option. This is more 719 efficient than the more portable solution in sfeed_update which can stall a 720 batch of $maxjobs in the queue if one item is slow. 721 722 sfeed_update_xargs shellscript: 723 724 #!/bin/sh 725 # update feeds, merge with old feeds using xargs in parallel mode (non-POSIX). 726 727 # include script and reuse its functions, but do not start main(). 728 SFEED_UPDATE_INCLUDE="1" . sfeed_update 729 # load config file, sets $config. 730 loadconfig "$1" 731 732 # process a single feed. 733 # args are: config, tmpdir, name, feedurl, basesiteurl, encoding 734 if [ "${SFEED_UPDATE_CHILD}" = "1" ]; then 735 sfeedtmpdir="$2" 736 _feed "$3" "$4" "$5" "$6" 737 exit $? 738 fi 739 740 # ...else parent mode: 741 742 # feed(name, feedurl, basesiteurl, encoding) 743 feed() { 744 # workaround: *BSD xargs doesn't handle empty fields in the middle. 745 name="${1:-$$}" 746 feedurl="${2:-http://}" 747 basesiteurl="${3:-${feedurl}}" 748 encoding="$4" 749 750 printf '%s\0%s\0%s\0%s\0%s\0%s\0' "${config}" "${sfeedtmpdir}" \ 751 "${name}" "${feedurl}" "${basesiteurl}" "${encoding}" 752 } 753 754 # fetch feeds and store in temporary directory. 755 sfeedtmpdir="$(mktemp -d '/tmp/sfeed_XXXXXX')" 756 mkdir -p "${sfeedtmpdir}/feeds" 757 touch "${sfeedtmpdir}/ok" 758 # make sure path exists. 759 mkdir -p "${sfeedpath}" 760 # print feeds for parallel processing with xargs. 761 feeds | SFEED_UPDATE_CHILD="1" xargs -r -0 -P "${maxjobs}" -L 6 "$(readlink -f "$0")" 762 status=$? 763 # check error exit status indicator for parallel jobs. 764 test -f "${sfeedtmpdir}/ok" || status=1 765 # cleanup temporary files etc. 766 cleanup 767 exit ${status} 768 769 - - - 770 771 Shellscript to handle URLs and enclosures in parallel using xargs -P. 772 773 This can be used to download and process URLs for downloading podcasts, 774 webcomics, download and convert webpages, mirror videos, etc. It uses a 775 plain-text cache file for remembering processed URLs. The match patterns are 776 defined in the shellscript fetch() function and in the awk script and can be 777 modified to handle items differently depending on their context. 778 779 The arguments for the script are files in the sfeed(5) format. If no file 780 arguments are specified then the data is read from stdin. 781 782 #!/bin/sh 783 # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. 784 # Dependencies: awk, curl, flock, xargs (-P), youtube-dl. 785 786 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" 787 jobs="${SFEED_JOBS:-4}" 788 lockfile="${HOME}/.sfeed/sfeed_download.lock" 789 790 # log(feedname, s, status) 791 log() { 792 if [ "$1" != "-" ]; then 793 s="[$1] $2" 794 else 795 s="$2" 796 fi 797 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" 798 } 799 800 # fetch(url, feedname) 801 fetch() { 802 case "$1" in 803 *youtube.com*) 804 youtube-dl "$1";; 805 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) 806 # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. 807 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; 808 esac 809 } 810 811 # downloader(url, title, feedname) 812 downloader() { 813 url="$1" 814 title="$2" 815 feedname="${3##*/}" 816 817 msg="${title}: ${url}" 818 819 # download directory. 820 if [ "${feedname}" != "-" ]; then 821 mkdir -p "${feedname}" 822 if ! cd "${feedname}"; then 823 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 824 return 1 825 fi 826 fi 827 828 log "${feedname}" "${msg}" "START" 829 if fetch "${url}" "${feedname}"; then 830 log "${feedname}" "${msg}" "OK" 831 832 # append it safely in parallel to the cachefile on a 833 # successful download. 834 (flock 9 || exit 1 835 printf '%s\n' "${url}" >> "${cachefile}" 836 ) 9>"${lockfile}" 837 else 838 log "${feedname}" "${msg}" "FAIL" >&2 839 return 1 840 fi 841 return 0 842 } 843 844 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then 845 # Downloader helper for parallel downloading. 846 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". 847 # It should write the URI to the cachefile if it is successful. 848 downloader "$1" "$2" "$3" 849 exit $? 850 fi 851 852 # ...else parent mode: 853 854 tmp=$(mktemp) 855 trap "rm -f ${tmp}" EXIT 856 857 [ -f "${cachefile}" ] || touch "${cachefile}" 858 cat "${cachefile}" > "${tmp}" 859 echo >> "${tmp}" # force it to have one line for awk. 860 861 LC_ALL=C awk -F '\t' ' 862 # fast prefilter what to download or not. 863 function filter(url, field, feedname) { 864 u = tolower(url); 865 return (match(u, "youtube\\.com") || 866 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); 867 } 868 function download(url, field, title, filename) { 869 if (!length(url) || urls[url] || !filter(url, field, filename)) 870 return; 871 # NUL-separated for xargs -0. 872 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); 873 urls[url] = 1; # print once 874 } 875 { 876 FILENR += (FNR == 1); 877 } 878 # lookup table from cachefile which contains downloaded URLs. 879 FILENR == 1 { 880 urls[$0] = 1; 881 } 882 # feed file(s). 883 FILENR != 1 { 884 download($3, 3, $2, FILENAME); # link 885 download($8, 8, $2, FILENAME); # enclosure 886 } 887 ' "${tmp}" "${@:--}" | \ 888 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" 889 890 - - - 891 892 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed 893 TSV format. 894 895 #!/bin/sh 896 # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. 897 # The data is split per file per feed with the name of the newsboat title/url. 898 # It writes the URLs of the read items line by line to a "urls" file. 899 # 900 # Dependencies: sqlite3, awk. 901 # 902 # Usage: create some directory to store the feeds then run this script. 903 904 # newsboat cache.db file. 905 cachefile="$HOME/.newsboat/cache.db" 906 test -n "$1" && cachefile="$1" 907 908 # dump data. 909 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E 910 # get the first fields in the order of the sfeed(5) format. 911 sqlite3 "$cachefile" <<!EOF | 912 .headers off 913 .mode ascii 914 .output 915 SELECT 916 i.pubDate, i.title, i.url, i.content, i.content_mime_type, 917 i.guid, i.author, i.enclosure_url, 918 f.rssurl AS rssurl, f.title AS feedtitle, i.unread 919 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base 920 FROM rss_feed f 921 INNER JOIN rss_item i ON i.feedurl = f.rssurl 922 ORDER BY 923 i.feedurl ASC, i.pubDate DESC; 924 .quit 925 !EOF 926 # convert to sfeed(5) TSV format. 927 LC_ALL=C awk ' 928 BEGIN { 929 FS = "\x1f"; 930 RS = "\x1e"; 931 } 932 # normal non-content fields. 933 function field(s) { 934 gsub("^[[:space:]]*", "", s); 935 gsub("[[:space:]]*$", "", s); 936 gsub("[[:space:]]", " ", s); 937 gsub("[[:cntrl:]]", "", s); 938 return s; 939 } 940 # content field. 941 function content(s) { 942 gsub("^[[:space:]]*", "", s); 943 gsub("[[:space:]]*$", "", s); 944 # escape chars in content field. 945 gsub("\\\\", "\\\\", s); 946 gsub("\n", "\\n", s); 947 gsub("\t", "\\t", s); 948 return s; 949 } 950 function feedname(feedurl, feedtitle) { 951 if (feedtitle == "") { 952 gsub("/", "_", feedurl); 953 return feedurl; 954 } 955 gsub("/", "_", feedtitle); 956 return feedtitle; 957 } 958 { 959 fname = feedname($9, $10); 960 if (!feed[fname]++) { 961 print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr"; 962 } 963 964 contenttype = field($5); 965 if (contenttype == "") 966 contenttype = "html"; 967 else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) 968 contenttype = "html"; 969 else 970 contenttype = "plain"; 971 972 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ 973 contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ 974 > fname; 975 976 # write URLs of the read items to a file line by line. 977 if ($11 == "0") { 978 print $3 > "urls"; 979 } 980 }' 981 982 - - - 983 984 Progress indicator 985 ------------------ 986 987 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc 988 config. It then calls sfeed_update and pipes the output lines to a function 989 that counts the current progress. It writes the total progress to stderr. 990 Alternative: pv -l -s totallines 991 992 #!/bin/sh 993 # Progress indicator script. 994 995 # Pass lines as input to stdin and write progress status to stderr. 996 # progress(totallines) 997 progress() { 998 total="$(($1 + 0))" # must be a number, no divide by zero. 999 test "${total}" -le 0 -o "$1" != "${total}" && return 1000 LC_ALL=C awk -v "total=${total}" ' 1001 { 1002 counter++; 1003 percent = (counter * 100) / total; 1004 printf("\033[K") > "/dev/stderr"; # clear EOL 1005 print $0; 1006 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; 1007 fflush(); # flush all buffers per line. 1008 } 1009 END { 1010 printf("\033[K") > "/dev/stderr"; 1011 }' 1012 } 1013 1014 # Counts the feeds from the sfeedrc config. 1015 countfeeds() { 1016 count=0 1017 . "$1" 1018 feed() { 1019 count=$((count + 1)) 1020 } 1021 feeds 1022 echo "${count}" 1023 } 1024 1025 config="${1:-$HOME/.sfeed/sfeedrc}" 1026 total=$(countfeeds "${config}") 1027 sfeed_update "${config}" 2>&1 | progress "${total}" 1028 1029 - - - 1030 1031 Counting unread and total items 1032 ------------------------------- 1033 1034 It can be useful to show the counts of unread items, for example in a 1035 windowmanager or statusbar. 1036 1037 The below example script counts the items of the last day in the same way the 1038 formatting tools do: 1039 1040 #!/bin/sh 1041 # Count the new items of the last day. 1042 LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 1043 { 1044 total++; 1045 } 1046 int($1) >= old { 1047 totalnew++; 1048 } 1049 END { 1050 print "New: " totalnew; 1051 print "Total: " total; 1052 }' ~/.sfeed/feeds/* 1053 1054 The below example script counts the unread items using the sfeed_curses URL 1055 file: 1056 1057 #!/bin/sh 1058 # Count the unread and total items from feeds using the URL file. 1059 LC_ALL=C awk -F '\t' ' 1060 # URL file: amount of fields is 1. 1061 NF == 1 { 1062 u[$0] = 1; # lookup table of URLs. 1063 next; 1064 } 1065 # feed file: check by URL or id. 1066 { 1067 total++; 1068 if (length($3)) { 1069 if (u[$3]) 1070 read++; 1071 } else if (length($6)) { 1072 if (u[$6]) 1073 read++; 1074 } 1075 } 1076 END { 1077 print "Unread: " (total - read); 1078 print "Total: " total; 1079 }' ~/.sfeed/urls ~/.sfeed/feeds/* 1080 1081 - - - 1082 1083 sfeed.c: adding new XML tags or sfeed(5) fields to the parser 1084 ------------------------------------------------------------- 1085 1086 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV 1087 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a 1088 number. This TagId is then mapped to the output field index. 1089 1090 Steps to modify the code: 1091 1092 * Add a new TagId enum for the tag. 1093 1094 * (optional) Add a new FeedField* enum for the new output field or you can map 1095 it to an existing field. 1096 1097 * Add the new XML tag name to the array variable of parsed RSS or Atom 1098 tags: rsstags[] or atomtags[]. 1099 1100 These must be defined in alphabetical order, because a binary search is used 1101 which uses the strcasecmp() function. 1102 1103 * Add the parsed TagId to the output field in the array variable fieldmap[]. 1104 1105 When another tag is also mapped to the same output field then the tag with 1106 the highest TagId number value overrides the mapped field: the order is from 1107 least important to high. 1108 1109 * If this defined tag is just using the inner data of the XML tag, then this 1110 definition is enough. If it for example has to parse a certain attribute you 1111 have to add a check for the TagId to the xmlattr() callback function. 1112 1113 * (optional) Print the new field in the printfields() function. 1114 1115 Below is a patch example to add the MRSS "media:content" tag as a new field: 1116 1117 diff --git a/sfeed.c b/sfeed.c 1118 --- a/sfeed.c 1119 +++ b/sfeed.c 1120 @@ -50,7 +50,7 @@ enum TagId { 1121 RSSTagGuidPermalinkTrue, 1122 /* must be defined after GUID, because it can be a link (isPermaLink) */ 1123 RSSTagLink, 1124 - RSSTagEnclosure, 1125 + RSSTagMediaContent, RSSTagEnclosure, 1126 RSSTagAuthor, RSSTagDccreator, 1127 RSSTagCategory, 1128 /* Atom */ 1129 @@ -81,7 +81,7 @@ typedef struct field { 1130 enum { 1131 FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, 1132 FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, 1133 - FeedFieldLast 1134 + FeedFieldMediaContent, FeedFieldLast 1135 }; 1136 1137 typedef struct feedcontext { 1138 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { 1139 { STRP("enclosure"), RSSTagEnclosure }, 1140 { STRP("guid"), RSSTagGuid }, 1141 { STRP("link"), RSSTagLink }, 1142 + { STRP("media:content"), RSSTagMediaContent }, 1143 { STRP("media:description"), RSSTagMediaDescription }, 1144 { STRP("pubdate"), RSSTagPubdate }, 1145 { STRP("title"), RSSTagTitle } 1146 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { 1147 [RSSTagGuidPermalinkFalse] = FeedFieldId, 1148 [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ 1149 [RSSTagLink] = FeedFieldLink, 1150 + [RSSTagMediaContent] = FeedFieldMediaContent, 1151 [RSSTagEnclosure] = FeedFieldEnclosure, 1152 [RSSTagAuthor] = FeedFieldAuthor, 1153 [RSSTagDccreator] = FeedFieldAuthor, 1154 @@ -677,6 +679,8 @@ printfields(void) 1155 string_print_uri(&ctx.fields[FeedFieldEnclosure].str); 1156 putchar(FieldSeparator); 1157 string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); 1158 + putchar(FieldSeparator); 1159 + string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); 1160 putchar('\n'); 1161 1162 if (ferror(stdout)) /* check for errors but do not flush */ 1163 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, 1164 } 1165 1166 if (ctx.feedtype == FeedTypeRSS) { 1167 - if (ctx.tag.id == RSSTagEnclosure && 1168 + if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && 1169 isattr(n, nl, STRP("url"))) { 1170 string_append(&tmpstr, v, vl); 1171 } else if (ctx.tag.id == RSSTagGuid && 1172 1173 - - - 1174 1175 Running custom commands inside the sfeed_curses program 1176 ------------------------------------------------------- 1177 1178 Running commands inside the sfeed_curses program can be useful for example to 1179 sync items or mark all items across all feeds as read. It can be comfortable to 1180 have a keybind for this inside the program to perform a scripted action and 1181 then reload the feeds by sending the signal SIGHUP. 1182 1183 In the input handling code you can then add a case: 1184 1185 case 'M': 1186 forkexec((char *[]) { "markallread.sh", NULL }, 0); 1187 break; 1188 1189 or 1190 1191 case 'S': 1192 forkexec((char *[]) { "syncnews.sh", NULL }, 1); 1193 break; 1194 1195 The specified script should be in $PATH or be an absolute path. 1196 1197 Example of a `markallread.sh` shellscript to mark all URLs as read: 1198 1199 #!/bin/sh 1200 # mark all items/URLs as read. 1201 tmp=$(mktemp) 1202 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ 1203 awk '!x[$0]++' > "$tmp" && 1204 mv "$tmp" ~/.sfeed/urls && 1205 pkill -SIGHUP sfeed_curses # reload feeds. 1206 1207 Example of a `syncnews.sh` shellscript to update the feeds and reload them: 1208 1209 #!/bin/sh 1210 sfeed_update 1211 pkill -SIGHUP sfeed_curses 1212 1213 1214 Running programs in a new session 1215 --------------------------------- 1216 1217 By default processes are spawned in the same session and process group as 1218 sfeed_curses. When sfeed_curses is closed this can also close the spawned 1219 process in some cases. 1220 1221 When the setsid command-line program is available the following wrapper command 1222 can be used to run the program in a new session, for a plumb program: 1223 1224 setsid -f xdg-open "$@" 1225 1226 Alternatively the code can be changed to call setsid() before execvp(). 1227 1228 1229 Open an URL directly in the same terminal 1230 ----------------------------------------- 1231 1232 To open an URL directly in the same terminal using the text-mode lynx browser: 1233 1234 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* 1235 1236 1237 Yank to tmux buffer 1238 ------------------- 1239 1240 This changes the yank command to set the tmux buffer, instead of X11 xclip: 1241 1242 SFEED_YANKER="tmux set-buffer \`cat\`" 1243 1244 1245 Known terminal issues 1246 --------------------- 1247 1248 Below lists some bugs or missing features in terminals that are found while 1249 testing sfeed_curses. Some of them might be fixed already upstream: 1250 1251 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for 1252 scrolling. 1253 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the 1254 middle-button, right-button is incorrect / reversed. 1255 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the 1256 window title. 1257 - Mouse button encoding for extended buttons (like side-buttons) in some 1258 terminals are unsupported or map to the same button: for example side-buttons 7 1259 and 8 map to the scroll buttons 4 and 5 in urxvt. 1260 1261 1262 License 1263 ------- 1264 1265 ISC, see LICENSE file. 1266 1267 1268 Author 1269 ------ 1270 1271 Hiltjo Posthuma <hiltjo@codemadness.org>