README (raw) (36294B)
1 sfeed 2 ----- 3 4 RSS and Atom parser (and some format programs). 5 6 It converts RSS or Atom feeds from XML to a TAB-separated file. There are 7 formatting programs included to convert this TAB-separated format to various 8 other formats. There are also some programs and scripts included to import and 9 export OPML and to fetch, filter, merge and order feed items. 10 11 12 gearsix changes 13 --------------- 14 15 I've just added the sfeed_read script, which runs sfeed_updated and uses the 16 specified sfeed_* tool (defaults to html, specified with -e) to xdg-user-dir 17 DOCUMENTS (or ~/Documents if missing). 18 Output files are named after the current time. 19 Finally, it opens the generated output file using xdg-open. 20 21 22 Build and install 23 ----------------- 24 25 $ make 26 # make install 27 28 29 To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: 30 31 $ make SFEED_CURSES="" 32 # make SFEED_CURSES="" install 33 34 35 To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ 36 directory for the theme names. 37 38 $ make SFEED_THEME="templeos" 39 # make SFEED_THEME="templeos" install 40 41 42 Usage 43 ----- 44 45 Initial setup: 46 47 mkdir -p "$HOME/.sfeed/feeds" 48 cp sfeedrc.example "$HOME/.sfeed/sfeedrc" 49 50 Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file 51 is included and evaluated as a shellscript for sfeed_update, so its functions 52 and behaviour can be overridden: 53 54 $EDITOR "$HOME/.sfeed/sfeedrc" 55 56 or you can import existing OPML subscriptions using sfeed_opml_import(1): 57 58 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" 59 60 an example to export from an other RSS/Atom reader called newsboat and import 61 for sfeed_update: 62 63 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 64 65 an example to export from an other RSS/Atom reader called rss2email (3.x+) and 66 import for sfeed_update: 67 68 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 69 70 Update feeds, this script merges the new items, see sfeed_update(1) for more 71 information what it can do: 72 73 sfeed_update 74 75 Format feeds: 76 77 Plain-text list: 78 79 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" 80 81 HTML view (no frames), copy style.css for a default style: 82 83 cp style.css "$HOME/.sfeed/style.css" 84 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" 85 86 HTML view with the menu as frames, copy style.css for a default style: 87 88 mkdir -p "$HOME/.sfeed/frames" 89 cp style.css "$HOME/.sfeed/frames/style.css" 90 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* 91 92 To automatically update your feeds periodically and format them in a way you 93 like you can make a wrapper script and add it as a cronjob. 94 95 Most protocols are supported because curl(1) is used by default and also proxy 96 settings from the environment (such as the $http_proxy environment variable) 97 are used. 98 99 The sfeed(1) program itself is just a parser that parses XML data from stdin 100 and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, 101 Gopher, SSH, etc. 102 103 See the section "Usage and examples" below and the man-pages for more 104 information how to use sfeed(1) and the additional tools. 105 106 107 Dependencies 108 ------------ 109 110 - C compiler (C99). 111 - libc (recommended: C99 and POSIX >= 200809). 112 113 114 Optional dependencies 115 --------------------- 116 117 - POSIX make(1) for the Makefile. 118 - POSIX sh(1), 119 used by sfeed_update(1) and sfeed_opml_export(1). 120 - POSIX utilities such as awk(1) and sort(1), 121 used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and 122 sfeed_update(1). 123 - curl(1) binary: https://curl.haxx.se/ , 124 used by sfeed_update(1), but can be replaced with any tool like wget(1), 125 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ 126 - iconv(1) command-line utilities, 127 used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 128 encoded then you don't need this. For a minimal iconv implementation: 129 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c 130 - xargs with support for the -P and -0 option, 131 used by sfeed_update(1). 132 - mandoc for documentation: https://mdocml.bsd.lv/ 133 - curses (typically ncurses), otherwise see minicurses.h, 134 used by sfeed_curses(1). 135 - a terminal (emulator) supporting UTF-8 and the used capabilities, 136 used by sfeed_curses(1). 137 138 139 Optional run-time dependencies for sfeed_curses 140 ----------------------------------------------- 141 142 - xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. 143 - xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. 144 - awk, used by the sfeed_content and sfeed_markread script. 145 See the ENVIRONMENT VARIABLES section in the man page to change it. 146 - lynx, used by the sfeed_content script to convert HTML content. 147 See the ENVIRONMENT VARIABLES section in the man page to change it. 148 149 150 Formats supported 151 ----------------- 152 153 sfeed supports a subset of XML 1.0 and a subset of: 154 155 - Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 156 - Atom 0.3 (draft, historic). 157 - RSS 0.90+. 158 - RDF (when used with RSS). 159 - MediaRSS extensions (media:). 160 - Dublin Core extensions (dc:). 161 162 Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are 163 supported by converting them to RSS/Atom or to the sfeed(5) format directly. 164 165 166 OS tested 167 --------- 168 169 - Linux, 170 compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, 171 libc: glibc, musl. 172 - OpenBSD (clang, gcc). 173 - NetBSD (with NetBSD curses). 174 - FreeBSD 175 - DragonFlyBSD 176 - GNU/Hurd 177 - Illumos (OpenIndiana). 178 - Windows (cygwin gcc + mintty, mingw). 179 - HaikuOS 180 - SerenityOS 181 - FreeDOS (djgpp, Open Watcom). 182 - FUZIX (sdcc -mz80, with the sfeed parser program). 183 184 185 Architectures tested 186 -------------------- 187 188 amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. 189 190 191 Files 192 ----- 193 194 sfeed - Read XML RSS or Atom feed data from stdin. Write feed data 195 in TAB-separated format to stdout. 196 sfeed_atom - Format feed data (TSV) to an Atom feed. 197 sfeed_content - View item content, for use with sfeed_curses. 198 sfeed_curses - Format feed data (TSV) to a curses interface. 199 sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. 200 sfeed_gopher - Format feed data (TSV) to Gopher files. 201 sfeed_html - Format feed data (TSV) to HTML. 202 sfeed_json - Format feed data (TSV) to JSON Feed. 203 sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. 204 sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. 205 sfeed_markread - Mark items as read/unread, for use with sfeed_curses. 206 sfeed_mbox - Format feed data (TSV) to mbox. 207 sfeed_plain - Format feed data (TSV) to a plain-text list. 208 sfeed_twtxt - Format feed data (TSV) to a twtxt feed. 209 sfeed_update - Update feeds and merge items. 210 sfeed_web - Find URLs to RSS/Atom feed from a webpage. 211 sfeed_xmlenc - Detect character-set encoding from a XML stream. 212 sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. 213 style.css - Example stylesheet to use with sfeed_html(1) and 214 sfeed_frames(1). 215 216 217 Files read at runtime by sfeed_update(1) 218 ---------------------------------------- 219 220 sfeedrc - Config file. This file is evaluated as a shellscript in 221 sfeed_update(1). 222 223 At least the following functions can be overridden per feed: 224 225 - fetch: to use wget(1), OpenBSD ftp(1) or an other download program. 226 - filter: to filter on fields. 227 - merge: to change the merge logic. 228 - order: to change the sort order. 229 230 See also the sfeedrc(5) man page documentation for more details. 231 232 The feeds() function is called to process the feeds. The default feed() 233 function is executed concurrently as a background job in your sfeedrc(5) config 234 file to make updating faster. The variable maxjobs can be changed to limit or 235 increase the amount of concurrent jobs (8 by default). 236 237 238 Files written at runtime by sfeed_update(1) 239 ------------------------------------------- 240 241 feedname - TAB-separated format containing all items per feed. The 242 sfeed_update(1) script merges new items with this file. 243 The format is documented in sfeed(5). 244 245 246 File format 247 ----------- 248 249 man 5 sfeed 250 man 5 sfeedrc 251 man 1 sfeed 252 253 254 Usage and examples 255 ------------------ 256 257 Find RSS/Atom feed URLs from a webpage: 258 259 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" 260 261 output example: 262 263 https://codemadness.org/atom.xml application/atom+xml 264 https://codemadness.org/atom_content.xml application/atom+xml 265 266 - - - 267 268 Make sure your sfeedrc config file exists, see the sfeedrc.example file. To 269 update your feeds (configfile argument is optional): 270 271 sfeed_update "configfile" 272 273 Format the feeds files: 274 275 # Plain-text list. 276 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt 277 # HTML view (no frames), copy style.css for a default style. 278 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html 279 # HTML view with the menu as frames, copy style.css for a default style. 280 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* 281 282 View formatted output in your browser: 283 284 $BROWSER "$HOME/.sfeed/feeds.html" 285 286 View formatted output in your editor: 287 288 $EDITOR "$HOME/.sfeed/feeds.txt" 289 290 - - - 291 292 View formatted output in a curses interface. The interface has a look inspired 293 by the mutt mail client. It has a sidebar panel for the feeds, a panel with a 294 listing of the items and a small statusbar for the selected item/URL. Some 295 functions like searching and scrolling are integrated in the interface itself. 296 297 Just like the other format programs included in sfeed you can run it like this: 298 299 sfeed_curses ~/.sfeed/feeds/* 300 301 ... or by reading from stdin: 302 303 sfeed_curses < ~/.sfeed/feeds/xkcd 304 305 By default sfeed_curses marks the items of the last day as new/bold. This limit 306 might be overridden by setting the environment variable $SFEED_NEW_AGE to the 307 desired maximum in seconds. To manage read/unread items in a different way a 308 plain-text file with a list of the read URLs can be used. To enable this 309 behaviour the path to this file can be specified by setting the environment 310 variable $SFEED_URL_FILE to the URL file: 311 312 export SFEED_URL_FILE="$HOME/.sfeed/urls" 313 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" 314 sfeed_curses ~/.sfeed/feeds/* 315 316 It then uses the shellscript "sfeed_markread" to process the read and unread 317 items. 318 319 - - - 320 321 Example script to view feed items in a vertical list/menu in dmenu(1). It opens 322 the selected URL in the browser set in $BROWSER: 323 324 #!/bin/sh 325 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ 326 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') 327 test -n "${url}" && $BROWSER "${url}" 328 329 dmenu can be found at: https://git.suckless.org/dmenu/ 330 331 - - - 332 333 Generate a sfeedrc config file from your exported list of feeds in OPML 334 format: 335 336 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc 337 338 - - - 339 340 Export an OPML file of your feeds from a sfeedrc config file (configfile 341 argument is optional): 342 343 sfeed_opml_export configfile > myfeeds.opml 344 345 - - - 346 347 The filter function can be overridden in your sfeedrc file. This allows 348 filtering items per feed. It can be used to shorten URLs, filter away 349 advertisements, strip tracking parameters and more. 350 351 # filter fields. 352 # filter(name, url) 353 filter() { 354 case "$1" in 355 "tweakers") 356 awk -F '\t' 'BEGIN { OFS = "\t"; } 357 # skip ads. 358 $2 ~ /^ADV:/ { 359 next; 360 } 361 # shorten link. 362 { 363 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { 364 $3 = substr($3, RSTART, RLENGTH); 365 } 366 print $0; 367 }';; 368 "yt BSDNow") 369 # filter only BSD Now from channel. 370 awk -F '\t' '$2 ~ / \| BSD Now/';; 371 *) 372 cat;; 373 esac | \ 374 # replace youtube links with embed links. 375 sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ 376 377 awk -F '\t' 'BEGIN { OFS = "\t"; } 378 function filterlink(s) { 379 # protocol must start with http, https or gopher. 380 if (match(s, /^(http|https|gopher):\/\//) == 0) { 381 return ""; 382 } 383 384 # shorten feedburner links. 385 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { 386 s = substr($3, RSTART, RLENGTH); 387 } 388 389 # strip tracking parameters 390 # urchin, facebook, piwik, webtrekk and generic. 391 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); 392 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); 393 394 gsub(/\?&/, "?", s); 395 gsub(/[\?&]+$/, "", s); 396 397 return s 398 } 399 { 400 $3 = filterlink($3); # link 401 $8 = filterlink($8); # enclosure 402 403 # try to remove tracking pixels: <img/> tags with 1px width or height. 404 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); 405 406 print $0; 407 }' 408 } 409 410 - - - 411 412 Aggregate feeds. This filters new entries (maximum one day old) and sorts them 413 by newest first. Prefix the feed name in the title. Convert the TSV output data 414 to an Atom XML feed (again): 415 416 #!/bin/sh 417 cd ~/.sfeed/feeds/ || exit 1 418 419 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 420 BEGIN { OFS = "\t"; } 421 int($1) >= old { 422 $2 = "[" FILENAME "] " $2; 423 print $0; 424 }' * | \ 425 sort -k1,1rn | \ 426 sfeed_atom 427 428 - - - 429 430 To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and 431 showing them as plain-text per line similar to sfeed_plain(1): 432 433 Create a FIFO: 434 435 fifo="/tmp/sfeed_fifo" 436 mkfifo "$fifo" 437 438 On the reading side: 439 440 # This keeps track of unique lines so might consume much memory. 441 # It tries to reopen the $fifo after 1 second if it fails. 442 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' 443 444 On the writing side: 445 446 feedsdir="$HOME/.sfeed/feeds/" 447 cd "$feedsdir" || exit 1 448 test -p "$fifo" || exit 1 449 450 # 1 day is old news, don't write older items. 451 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 452 BEGIN { OFS = "\t"; } 453 int($1) >= old { 454 $2 = "[" FILENAME "] " $2; 455 print $0; 456 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" 457 458 cut -b is used to trim the "N " prefix of sfeed_plain(1). 459 460 - - - 461 462 For some podcast feed the following code can be used to filter the latest 463 enclosure URL (probably some audio file): 464 465 awk -F '\t' 'BEGIN { latest = 0; } 466 length($8) { 467 ts = int($1); 468 if (ts > latest) { 469 url = $8; 470 latest = ts; 471 } 472 } 473 END { if (length(url)) { print url; } }' 474 475 ... or on a file already sorted from newest to oldest: 476 477 awk -F '\t' '$8 { print $8; exit }' 478 479 - - - 480 481 Over time your feeds file might become quite big. You can archive items of a 482 feed from (roughly) the last week by doing for example: 483 484 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new 485 mv feed feed.bak 486 mv feed.new feed 487 488 This could also be run weekly in a crontab to archive the feeds. Like throwing 489 away old newspapers. It keeps the feeds list tidy and the formatted output 490 small. 491 492 - - - 493 494 Convert mbox to separate maildirs per feed and filter duplicate messages using the 495 fdm program. 496 fdm is available at: https://github.com/nicm/fdm 497 498 fdm config file (~/.sfeed/fdm.conf): 499 500 set unmatched-mail keep 501 502 account "sfeed" mbox "%[home]/.sfeed/mbox" 503 $cachepath = "%[home]/.sfeed/fdm.cache" 504 cache "${cachepath}" 505 $maildir = "%[home]/feeds/" 506 507 # Check if message is in the cache by Message-ID. 508 match case "^Message-ID: (.*)" in headers 509 action { 510 tag "msgid" value "%1" 511 } 512 continue 513 514 # If it is in the cache, stop. 515 match matched and in-cache "${cachepath}" key "%[msgid]" 516 action { 517 keep 518 } 519 520 # Not in the cache, process it and add to cache. 521 match case "^X-Feedname: (.*)" in headers 522 action { 523 # Store to local maildir. 524 maildir "${maildir}%1" 525 526 add-to-cache "${cachepath}" key "%[msgid]" 527 keep 528 } 529 530 Now run: 531 532 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 533 $ fdm -f ~/.sfeed/fdm.conf fetch 534 535 Now you can view feeds in mutt(1) for example. 536 537 - - - 538 539 Read from mbox and filter duplicate messages using the fdm program and deliver 540 it to a SMTP server. This works similar to the rss2email program. 541 fdm is available at: https://github.com/nicm/fdm 542 543 fdm config file (~/.sfeed/fdm.conf): 544 545 set unmatched-mail keep 546 547 account "sfeed" mbox "%[home]/.sfeed/mbox" 548 $cachepath = "%[home]/.sfeed/fdm.cache" 549 cache "${cachepath}" 550 551 # Check if message is in the cache by Message-ID. 552 match case "^Message-ID: (.*)" in headers 553 action { 554 tag "msgid" value "%1" 555 } 556 continue 557 558 # If it is in the cache, stop. 559 match matched and in-cache "${cachepath}" key "%[msgid]" 560 action { 561 keep 562 } 563 564 # Not in the cache, process it and add to cache. 565 match case "^X-Feedname: (.*)" in headers 566 action { 567 # Connect to a SMTP server and attempt to deliver the 568 # mail to it. 569 # Of course change the server and e-mail below. 570 smtp server "codemadness.org" to "hiltjo@codemadness.org" 571 572 add-to-cache "${cachepath}" key "%[msgid]" 573 keep 574 } 575 576 Now run: 577 578 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 579 $ fdm -f ~/.sfeed/fdm.conf fetch 580 581 Now you can view feeds in mutt(1) for example. 582 583 - - - 584 585 Convert mbox to separate maildirs per feed and filter duplicate messages using 586 procmail(1). 587 588 procmail_maildirs.sh file: 589 590 maildir="$HOME/feeds" 591 feedsdir="$HOME/.sfeed/feeds" 592 procmailconfig="$HOME/.sfeed/procmailrc" 593 594 # message-id cache to prevent duplicates. 595 mkdir -p "${maildir}/.cache" 596 597 if ! test -r "${procmailconfig}"; then 598 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 599 echo "See procmailrc.example for an example." >&2 600 exit 1 601 fi 602 603 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do 604 name=$(basename "${d}") 605 mkdir -p "${maildir}/${name}/cur" 606 mkdir -p "${maildir}/${name}/new" 607 mkdir -p "${maildir}/${name}/tmp" 608 printf 'Mailbox %s\n' "${name}" 609 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" 610 done 611 612 Procmailrc(5) file: 613 614 # Example for use with sfeed_mbox(1). 615 # The header X-Feedname is used to split into separate maildirs. It is 616 # assumed this name is sane. 617 618 MAILDIR="$HOME/feeds/" 619 620 :0 621 * ^X-Feedname: \/.* 622 { 623 FEED="$MATCH" 624 625 :0 Wh: "msgid_$FEED.lock" 626 | formail -D 1024000 ".cache/msgid_$FEED.cache" 627 628 :0 629 "$FEED"/ 630 } 631 632 Now run: 633 634 $ procmail_maildirs.sh 635 636 Now you can view feeds in mutt(1) for example. 637 638 - - - 639 640 The fetch function can be overridden in your sfeedrc file. This allows to 641 replace the default curl(1) for sfeed_update with any other client to fetch the 642 RSS/Atom data or change the default curl options: 643 644 # fetch a feed via HTTP/HTTPS etc. 645 # fetch(name, url, feedfile) 646 fetch() { 647 hurl -m 1048576 -t 15 "$2" 2>/dev/null 648 } 649 650 - - - 651 652 Caching, incremental data updates and bandwidth saving 653 654 For servers that support it some incremental updates and bandwidth saving can 655 be done by using the "ETag" HTTP header. 656 657 Create a directory for storing the ETags and modification timestamps per feed: 658 659 mkdir -p ~/.sfeed/etags ~/.sfeed/lastmod 660 661 The curl ETag options (--etag-save and --etag-compare) can be used to store and 662 send the previous ETag header value. curl version 7.73+ is recommended for it 663 to work properly. 664 665 The curl -z option can be used to send the modification date of a local file as 666 a HTTP "If-Modified-Since" request header. The server can then respond if the 667 data is modified or not or respond with only the incremental data. 668 669 The curl --compressed option can be used to indicate the client supports 670 decompression. Because RSS/Atom feeds are textual XML content this generally 671 compresses very well. 672 673 These options can be set by overriding the fetch() function in the sfeedrc 674 file: 675 676 # fetch(name, url, feedfile) 677 fetch() { 678 basename="$(basename "$3")" 679 etag="$HOME/.sfeed/etags/${basename}" 680 lastmod="$HOME/.sfeed/lastmod/${basename}" 681 output="${sfeedtmpdir}/feeds/${filename}.xml" 682 683 curl \ 684 -f -s -m 15 \ 685 -L --max-redirs 0 \ 686 -H "User-Agent: sfeed" \ 687 --compressed \ 688 --etag-save "${etag}" --etag-compare "${etag}" \ 689 -R -o "${output}" \ 690 -z "${lastmod}" \ 691 "$2" 2>/dev/null || return 1 692 693 # succesful, but no file written: assume it is OK and Not Modified. 694 [ -e "${output}" ] || return 0 695 696 # use server timestamp from curl -R to set Last-Modified. 697 touch -r "${output}" "${lastmod}" 2>/dev/null 698 cat "${output}" 2>/dev/null 699 # use write output status, other errors are ignored here. 700 fetchstatus="$?" 701 rm -f "${output}" 2>/dev/null 702 return "${fetchstatus}" 703 } 704 705 These options can come at a cost of some privacy, because it exposes 706 additional metadata from the previous request. 707 708 - - - 709 710 CDNs blocking requests due to a missing HTTP User-Agent request header 711 712 sfeed_update will not send the "User-Agent" header by default for privacy 713 reasons. Some CDNs like Cloudflare or websites like Reddit.com don't like this 714 and will block such HTTP requests. 715 716 A custom User-Agent can be set by using the curl -H option, like so: 717 718 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' 719 720 The above example string pretends to be a Windows 10 (x86-64) machine running 721 Firefox 78. 722 723 - - - 724 725 Page redirects 726 727 For security and efficiency reasons by default redirects are not allowed and 728 are treated as an error. 729 730 For example to prevent hijacking an unencrypted http:// to https:// redirect or 731 to not add time of an unnecessary page redirect each time. It is encouraged to 732 use the final redirected URL in the sfeedrc config file. 733 734 If you want to ignore this advise you can override the fetch() function in the 735 sfeedrc file and change the curl options "-L --max-redirs 0". 736 737 - - - 738 739 Shellscript to handle URLs and enclosures in parallel using xargs -P. 740 741 This can be used to download and process URLs for downloading podcasts, 742 webcomics, download and convert webpages, mirror videos, etc. It uses a 743 plain-text cache file for remembering processed URLs. The match patterns are 744 defined in the shellscript fetch() function and in the awk script and can be 745 modified to handle items differently depending on their context. 746 747 The arguments for the script are files in the sfeed(5) format. If no file 748 arguments are specified then the data is read from stdin. 749 750 #!/bin/sh 751 # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. 752 # Dependencies: awk, curl, flock, xargs (-P), yt-dlp. 753 754 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" 755 jobs="${SFEED_JOBS:-4}" 756 lockfile="${HOME}/.sfeed/sfeed_download.lock" 757 758 # log(feedname, s, status) 759 log() { 760 if [ "$1" != "-" ]; then 761 s="[$1] $2" 762 else 763 s="$2" 764 fi 765 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" 766 } 767 768 # fetch(url, feedname) 769 fetch() { 770 case "$1" in 771 *youtube.com*) 772 yt-dlp "$1";; 773 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) 774 # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. 775 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; 776 esac 777 } 778 779 # downloader(url, title, feedname) 780 downloader() { 781 url="$1" 782 title="$2" 783 feedname="${3##*/}" 784 785 msg="${title}: ${url}" 786 787 # download directory. 788 if [ "${feedname}" != "-" ]; then 789 mkdir -p "${feedname}" 790 if ! cd "${feedname}"; then 791 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 792 return 1 793 fi 794 fi 795 796 log "${feedname}" "${msg}" "START" 797 if fetch "${url}" "${feedname}"; then 798 log "${feedname}" "${msg}" "OK" 799 800 # append it safely in parallel to the cachefile on a 801 # successful download. 802 (flock 9 || exit 1 803 printf '%s\n' "${url}" >> "${cachefile}" 804 ) 9>"${lockfile}" 805 else 806 log "${feedname}" "${msg}" "FAIL" >&2 807 return 1 808 fi 809 return 0 810 } 811 812 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then 813 # Downloader helper for parallel downloading. 814 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". 815 # It should write the URI to the cachefile if it is successful. 816 downloader "$1" "$2" "$3" 817 exit $? 818 fi 819 820 # ...else parent mode: 821 822 tmp="$(mktemp)" || exit 1 823 trap "rm -f ${tmp}" EXIT 824 825 [ -f "${cachefile}" ] || touch "${cachefile}" 826 cat "${cachefile}" > "${tmp}" 827 echo >> "${tmp}" # force it to have one line for awk. 828 829 LC_ALL=C awk -F '\t' ' 830 # fast prefilter what to download or not. 831 function filter(url, field, feedname) { 832 u = tolower(url); 833 return (match(u, "youtube\\.com") || 834 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); 835 } 836 function download(url, field, title, filename) { 837 if (!length(url) || urls[url] || !filter(url, field, filename)) 838 return; 839 # NUL-separated for xargs -0. 840 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); 841 urls[url] = 1; # print once 842 } 843 { 844 FILENR += (FNR == 1); 845 } 846 # lookup table from cachefile which contains downloaded URLs. 847 FILENR == 1 { 848 urls[$0] = 1; 849 } 850 # feed file(s). 851 FILENR != 1 { 852 download($3, 3, $2, FILENAME); # link 853 download($8, 8, $2, FILENAME); # enclosure 854 } 855 ' "${tmp}" "${@:--}" | \ 856 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" 857 858 - - - 859 860 Shellscript to export existing newsboat cached items from sqlite3 to the sfeed 861 TSV format. 862 863 #!/bin/sh 864 # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. 865 # The data is split per file per feed with the name of the newsboat title/url. 866 # It writes the URLs of the read items line by line to a "urls" file. 867 # 868 # Dependencies: sqlite3, awk. 869 # 870 # Usage: create some directory to store the feeds then run this script. 871 872 # newsboat cache.db file. 873 cachefile="$HOME/.newsboat/cache.db" 874 test -n "$1" && cachefile="$1" 875 876 # dump data. 877 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E 878 # get the first fields in the order of the sfeed(5) format. 879 sqlite3 "$cachefile" <<!EOF | 880 .headers off 881 .mode ascii 882 .output 883 SELECT 884 i.pubDate, i.title, i.url, i.content, i.content_mime_type, 885 i.guid, i.author, i.enclosure_url, 886 f.rssurl AS rssurl, f.title AS feedtitle, i.unread 887 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base 888 FROM rss_feed f 889 INNER JOIN rss_item i ON i.feedurl = f.rssurl 890 ORDER BY 891 i.feedurl ASC, i.pubDate DESC; 892 .quit 893 !EOF 894 # convert to sfeed(5) TSV format. 895 LC_ALL=C awk ' 896 BEGIN { 897 FS = "\x1f"; 898 RS = "\x1e"; 899 } 900 # normal non-content fields. 901 function field(s) { 902 gsub("^[[:space:]]*", "", s); 903 gsub("[[:space:]]*$", "", s); 904 gsub("[[:space:]]", " ", s); 905 gsub("[[:cntrl:]]", "", s); 906 return s; 907 } 908 # content field. 909 function content(s) { 910 gsub("^[[:space:]]*", "", s); 911 gsub("[[:space:]]*$", "", s); 912 # escape chars in content field. 913 gsub("\\\\", "\\\\", s); 914 gsub("\n", "\\n", s); 915 gsub("\t", "\\t", s); 916 return s; 917 } 918 function feedname(feedurl, feedtitle) { 919 if (feedtitle == "") { 920 gsub("/", "_", feedurl); 921 return feedurl; 922 } 923 gsub("/", "_", feedtitle); 924 return feedtitle; 925 } 926 { 927 fname = feedname($9, $10); 928 if (!feed[fname]++) { 929 print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr"; 930 } 931 932 contenttype = field($5); 933 if (contenttype == "") 934 contenttype = "html"; 935 else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) 936 contenttype = "html"; 937 else 938 contenttype = "plain"; 939 940 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ 941 contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ 942 > fname; 943 944 # write URLs of the read items to a file line by line. 945 if ($11 == "0") { 946 print $3 > "urls"; 947 } 948 }' 949 950 - - - 951 952 Progress indicator 953 ------------------ 954 955 The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc 956 config. It then calls sfeed_update and pipes the output lines to a function 957 that counts the current progress. It writes the total progress to stderr. 958 Alternative: pv -l -s totallines 959 960 #!/bin/sh 961 # Progress indicator script. 962 963 # Pass lines as input to stdin and write progress status to stderr. 964 # progress(totallines) 965 progress() { 966 total="$(($1 + 0))" # must be a number, no divide by zero. 967 test "${total}" -le 0 -o "$1" != "${total}" && return 968 LC_ALL=C awk -v "total=${total}" ' 969 { 970 counter++; 971 percent = (counter * 100) / total; 972 printf("\033[K") > "/dev/stderr"; # clear EOL 973 print $0; 974 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; 975 fflush(); # flush all buffers per line. 976 } 977 END { 978 printf("\033[K") > "/dev/stderr"; 979 }' 980 } 981 982 # Counts the feeds from the sfeedrc config. 983 countfeeds() { 984 count=0 985 . "$1" 986 feed() { 987 count=$((count + 1)) 988 } 989 feeds 990 echo "${count}" 991 } 992 993 config="${1:-$HOME/.sfeed/sfeedrc}" 994 total=$(countfeeds "${config}") 995 sfeed_update "${config}" 2>&1 | progress "${total}" 996 997 - - - 998 999 Counting unread and total items 1000 ------------------------------- 1001 1002 It can be useful to show the counts of unread items, for example in a 1003 windowmanager or statusbar. 1004 1005 The below example script counts the items of the last day in the same way the 1006 formatting tools do: 1007 1008 #!/bin/sh 1009 # Count the new items of the last day. 1010 LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 1011 { 1012 total++; 1013 } 1014 int($1) >= old { 1015 totalnew++; 1016 } 1017 END { 1018 print "New: " totalnew; 1019 print "Total: " total; 1020 }' ~/.sfeed/feeds/* 1021 1022 The below example script counts the unread items using the sfeed_curses URL 1023 file: 1024 1025 #!/bin/sh 1026 # Count the unread and total items from feeds using the URL file. 1027 LC_ALL=C awk -F '\t' ' 1028 # URL file: amount of fields is 1. 1029 NF == 1 { 1030 u[$0] = 1; # lookup table of URLs. 1031 next; 1032 } 1033 # feed file: check by URL or id. 1034 { 1035 total++; 1036 if (length($3)) { 1037 if (u[$3]) 1038 read++; 1039 } else if (length($6)) { 1040 if (u[$6]) 1041 read++; 1042 } 1043 } 1044 END { 1045 print "Unread: " (total - read); 1046 print "Total: " total; 1047 }' ~/.sfeed/urls ~/.sfeed/feeds/* 1048 1049 - - - 1050 1051 sfeed.c: adding new XML tags or sfeed(5) fields to the parser 1052 ------------------------------------------------------------- 1053 1054 sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV 1055 fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a 1056 number. This TagId is then mapped to the output field index. 1057 1058 Steps to modify the code: 1059 1060 * Add a new TagId enum for the tag. 1061 1062 * (optional) Add a new FeedField* enum for the new output field or you can map 1063 it to an existing field. 1064 1065 * Add the new XML tag name to the array variable of parsed RSS or Atom 1066 tags: rsstags[] or atomtags[]. 1067 1068 These must be defined in alphabetical order, because a binary search is used 1069 which uses the strcasecmp() function. 1070 1071 * Add the parsed TagId to the output field in the array variable fieldmap[]. 1072 1073 When another tag is also mapped to the same output field then the tag with 1074 the highest TagId number value overrides the mapped field: the order is from 1075 least important to high. 1076 1077 * If this defined tag is just using the inner data of the XML tag, then this 1078 definition is enough. If it for example has to parse a certain attribute you 1079 have to add a check for the TagId to the xmlattr() callback function. 1080 1081 * (optional) Print the new field in the printfields() function. 1082 1083 Below is a patch example to add the MRSS "media:content" tag as a new field: 1084 1085 diff --git a/sfeed.c b/sfeed.c 1086 --- a/sfeed.c 1087 +++ b/sfeed.c 1088 @@ -50,7 +50,7 @@ enum TagId { 1089 RSSTagGuidPermalinkTrue, 1090 /* must be defined after GUID, because it can be a link (isPermaLink) */ 1091 RSSTagLink, 1092 - RSSTagEnclosure, 1093 + RSSTagMediaContent, RSSTagEnclosure, 1094 RSSTagAuthor, RSSTagDccreator, 1095 RSSTagCategory, 1096 /* Atom */ 1097 @@ -81,7 +81,7 @@ typedef struct field { 1098 enum { 1099 FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, 1100 FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, 1101 - FeedFieldLast 1102 + FeedFieldMediaContent, FeedFieldLast 1103 }; 1104 1105 typedef struct feedcontext { 1106 @@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { 1107 { STRP("enclosure"), RSSTagEnclosure }, 1108 { STRP("guid"), RSSTagGuid }, 1109 { STRP("link"), RSSTagLink }, 1110 + { STRP("media:content"), RSSTagMediaContent }, 1111 { STRP("media:description"), RSSTagMediaDescription }, 1112 { STRP("pubdate"), RSSTagPubdate }, 1113 { STRP("title"), RSSTagTitle } 1114 @@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { 1115 [RSSTagGuidPermalinkFalse] = FeedFieldId, 1116 [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ 1117 [RSSTagLink] = FeedFieldLink, 1118 + [RSSTagMediaContent] = FeedFieldMediaContent, 1119 [RSSTagEnclosure] = FeedFieldEnclosure, 1120 [RSSTagAuthor] = FeedFieldAuthor, 1121 [RSSTagDccreator] = FeedFieldAuthor, 1122 @@ -677,6 +679,8 @@ printfields(void) 1123 string_print_uri(&ctx.fields[FeedFieldEnclosure].str); 1124 putchar(FieldSeparator); 1125 string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); 1126 + putchar(FieldSeparator); 1127 + string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); 1128 putchar('\n'); 1129 1130 if (ferror(stdout)) /* check for errors but do not flush */ 1131 @@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, 1132 } 1133 1134 if (ctx.feedtype == FeedTypeRSS) { 1135 - if (ctx.tag.id == RSSTagEnclosure && 1136 + if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && 1137 isattr(n, nl, STRP("url"))) { 1138 string_append(&tmpstr, v, vl); 1139 } else if (ctx.tag.id == RSSTagGuid && 1140 1141 - - - 1142 1143 Running custom commands inside the sfeed_curses program 1144 ------------------------------------------------------- 1145 1146 Running commands inside the sfeed_curses program can be useful for example to 1147 sync items or mark all items across all feeds as read. It can be comfortable to 1148 have a keybind for this inside the program to perform a scripted action and 1149 then reload the feeds by sending the signal SIGHUP. 1150 1151 In the input handling code you can then add a case: 1152 1153 case 'M': 1154 forkexec((char *[]) { "markallread.sh", NULL }, 0); 1155 break; 1156 1157 or 1158 1159 case 'S': 1160 forkexec((char *[]) { "syncnews.sh", NULL }, 1); 1161 break; 1162 1163 The specified script should be in $PATH or be an absolute path. 1164 1165 Example of a `markallread.sh` shellscript to mark all URLs as read: 1166 1167 #!/bin/sh 1168 # mark all items/URLs as read. 1169 tmp="$(mktemp)" || exit 1 1170 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ 1171 awk '!x[$0]++' > "$tmp" && 1172 mv "$tmp" ~/.sfeed/urls && 1173 pkill -SIGHUP sfeed_curses # reload feeds. 1174 1175 Example of a `syncnews.sh` shellscript to update the feeds and reload them: 1176 1177 #!/bin/sh 1178 sfeed_update 1179 pkill -SIGHUP sfeed_curses 1180 1181 1182 Running programs in a new session 1183 --------------------------------- 1184 1185 By default processes are spawned in the same session and process group as 1186 sfeed_curses. When sfeed_curses is closed this can also close the spawned 1187 process in some cases. 1188 1189 When the setsid command-line program is available the following wrapper command 1190 can be used to run the program in a new session, for a plumb program: 1191 1192 setsid -f xdg-open "$@" 1193 1194 Alternatively the code can be changed to call setsid() before execvp(). 1195 1196 1197 Open an URL directly in the same terminal 1198 ----------------------------------------- 1199 1200 To open an URL directly in the same terminal using the text-mode lynx browser: 1201 1202 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* 1203 1204 1205 Yank to tmux buffer 1206 ------------------- 1207 1208 This changes the yank command to set the tmux buffer, instead of X11 xclip: 1209 1210 SFEED_YANKER="tmux set-buffer \`cat\`" 1211 1212 1213 Alternative for xargs -P and -0 1214 ------------------------------- 1215 1216 Most xargs implementations support the options -P and -0. 1217 GNU or *BSD has supported them for over 20+ years! 1218 1219 These functions in sfeed_update can be overridden in sfeedrc, if you don't want 1220 to use xargs: 1221 1222 feed() { 1223 # wait until ${maxjobs} are finished: will stall the queue if an item 1224 # is slow, but it is portable. 1225 [ ${signo} -ne 0 ] && return 1226 [ $((curjobs % maxjobs)) -eq 0 ] && wait 1227 [ ${signo} -ne 0 ] && return 1228 curjobs=$((curjobs + 1)) 1229 1230 _feed "$@" & 1231 } 1232 1233 runfeeds() { 1234 # job counter. 1235 curjobs=0 1236 # fetch feeds specified in config file. 1237 feeds 1238 # wait till all feeds are fetched (concurrently). 1239 [ ${signo} -eq 0 ] && wait 1240 } 1241 1242 1243 Known terminal issues 1244 --------------------- 1245 1246 Below lists some bugs or missing features in terminals that are found while 1247 testing sfeed_curses. Some of them might be fixed already upstream: 1248 1249 - cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for 1250 scrolling. 1251 - HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the 1252 middle-button, right-button is incorrect / reversed. 1253 - putty: the full reset attribute (ESC c, typically `rs1`) does not reset the 1254 window title. 1255 - Mouse button encoding for extended buttons (like side-buttons) in some 1256 terminals are unsupported or map to the same button: for example side-buttons 7 1257 and 8 map to the scroll buttons 4 and 5 in urxvt. 1258 1259 1260 License 1261 ------- 1262 1263 ISC, see LICENSE file. 1264 1265 1266 Author 1267 ------ 1268 1269 Hiltjo Posthuma <hiltjo@codemadness.org>