Published: 2020-10-28 Last modified: 2022-02-27
Why another SSG and why this post? |
|
|
1. Main features
The first step in building a SSG is to decide what features it should have. I decided for the following ones:
1.1. For the reader (how the website looks and feels)
-
Responsive layout;
-
Tags;
-
Syndication feed (Atom),
Because feeds are the right way to surf the webs. And because Atom > RSS;
-
No JavaScript; HTML5/CSS only,
Because JavaScript is a trap;
-
No external resources (CDN, fonts etc., except search);
-
Absolutely no cookies not even gluten-free;
-
History of posts;
-
Search (by external search engine);
-
SEO-friendly web pages filenames;
-
Pure HTML math formulae.
1.2. For the author (me or you if you choose to adopt this tool)
-
Writing posts in AsciiDoctor markup with live preview,[1]
Since AsciiDoc > Markdown and AsciiDoctor > AsciiDoc (AsciiDoc uses JS for TOC smh), by transitivity AsciiDoctor > Markdown;
-
Fully automatic workflow, incl. management of tags, "published" and "last modified" dates. Zero manual tinkering with files (after the initial setup);
-
Drafts, with one-key-press option to publish;
-
Option for "read more" line to cut long posts for syndication and homepage;
-
Global tag renaming;
-
Simple listing of all posts/drafts/tags;
-
Fuzzy search for a post to edit (no need to type full title);
-
Local offline writing, simple one-folder
rsync
-ing to the Web Server; -
Math notation in KaTeX.
- 1. Main features
- 2. Changelog
- 3. Implementation
- 4. Workflow
- 5. The NoPress script
- 6. Prepare to use
2. Changelog
- 23.02.2022
-
-
Added
TeX
support.
-
- 16.02.2022
-
-
Switched from
Gedit
toGeany
(so much better!). -
Added
unpublish post
command. -
Automatic
sitemap.xml
creation. -
Speed optimisation.
-
3. Implementation
3.1. Tenets
3.2. Decisions
Static and Local |
Only plain HTML5/CSS, no server-side processing, no JavaScript, no CDN. That poses some challenges:
|
What language to use and why it’s Bash? |
I started prototyping in Bash. They say if you have >100 LOC you should rewrite your script in some proper language. But,
|
How to manage the metadata (tags, dates…)? |
At some stage I made SQLite DB to keep all the filenames and tags and dates for simple retrieving, but then I realized that now I have all the metadata duplicated (in source files and in DB), and I need to care of the synchronization, so I decided, according to DTSTTCPW, to abandon the DB and to continue with retrieving metadata from the source files unless I reach an impasse. I have not reached it. So the metadata appears in the source files as AsciiDoctor comments. See Source file (*.adoc) structure. |
How files should be organized? |
|
3.3. Tools
There is one Bash script which uses (automatically launches when needed) some FOSS tools:
-
asciidoctor → to convert AsciiDoctor markup to HTML;
-
for writing posts with live preview:
-
Geany → lightweight IDE with AsciiDoc highlighter.[2]
-
inotifywait → watches for changes in the edited file
-
Epiphany → browser used for live preview, automatically reloads on file changes
-
-
recode → to produce escaped HTML for Atom syndication page
-
katex → to convert KaTeX code (math notation) to HTML (without any JS needed)
See Install tools.
3.4. Files and directories
3.4.1. Source file (*.adoc) structure
A source file (a file per post) is the file where the author actually writes a post using AsciiDoctor markup language, hence it has .adoc
extension. This is how a source file looks:
//title: My First Post! //tags: Digital Garden, Pulitzer //published: 2020-01-20T12:09:15 //modified: 2020-02-07T15:40:37 AsciiDoctor-formatted text of the post, for example: This is *bold* == This is header etc. The next line is optional //readmore This part of the post will appear only in the post's own page; in syndication and website homepage it will be discarded, instead a "Read more" link will appear.
The metadata (Title, Comma-separated tags, 'Published' and 'Last modified' datetimes) appear in the 4-lines header, formatted as AsciiDoctor comments (prepended with //
), so they aren’t processed by AsciiDoctor converter. Instead, the NoPress script parses and updates them. The author may change the title and/or tags, but normally shouldn’t touch the datetime lines since they are managed automatically.
Below those lines the author writes the post in AsciiDoctor markup.
In addition, the author has an option to insert anywhere in the text the //readmore
line. If this line appears in the source file, then in the syndication feed and in the homepage (index.html
) this line will be replaced with Read more…
link to the full post, and the rest of the post will be discarded. In the full post page the line (which is also an AsciiDoctor comment) will be ignored.
3.4.2. Project directory structure
Project folder (e.g., ~/np/
, used in this post as an example) is the folder on the local machine where all the files (except the script) needed for the website creation reside.
There are two main folders in the project directory: drafts
and published
, with a similar internal structure, although the latter is more complicated. The idea is that many functions, such as editing, managing metadata etc, are built as agnostic to whether to process a draft or a published post. Publishing a draft means simply moving all the related files from drafts
to published
.
Here, publish or published doesn’t mean that the post will be immediately available to your website’s visitors. It rather means that it will appear in the local docroot folder inside the local project directory. Only after the local docroot will be synced to your Web Server, the post would be actually visible to your readers.
|
The docroot inside the published is the folder that should be transferred to the Web Server.
|
~$ tree ~/np ├── drafts (1) │ ├── adoc (2) │ │ ├── the-theory-of-everything.adoc │ │ ├── close-encounters-of-the-fifth-kind.adoc │ │ └── ... │ ├── main-part (3) │ │ ├── the-theory-of-everything.html │ │ ├── close-encounters-of-the-fifth-kind.html │ │ └── ... │ └── docroot (4) │ ├── the-theory-of-everything.html │ ├── close-encounters-of-the-fifth-kind.html │ └── ... ├── misc (5) │ ├── header_1.html │ ├── header_2.html │ ├── footer.html │ └── atom-header.xml └── published (6) ├── right-column.html (7) ├── adoc (8) │ ├── dear-diary.adoc │ ├── deer-dairy.adoc │ ├── how-to-build-simple-but-powerful-static-site-generator.adoc │ └── ... ├── main-part (9) │ ├── dear-diary.html │ ├── deer-dairy.html │ ├── how-to-build-simple-but-powerful-static-site-generator.html │ └── ... ├── truncated-parts (10) │ ├── dear-diary.html │ ├── deer-dairy.html │ ├── how-to-build-simple-but-powerful-static-site-generator.html │ └── ... ├── atom-entries (11) │ ├── dear-diary.xml │ ├── deer-dairy.xml │ ├── how-to-build-simple-but-powerful-static-site-generator.xml │ └── ... └── docroot (12) ├── index.html (13) ├── feed.xml (14) ├── dear-diary.html (15) ├── deer-dairy.html (15) ├── how-to-build-simple-but-powerful-static-site-generator.html (15) ├── ... ├── sitemap.xml (16) ├── robots.txt (17) ├── tags (18) │ ├── tag-bash.html │ ├── tag-deep-learning.html │ ├── tag-deep-forest.html │ ├── tag-deep-space.html │ └── ... └── includes (19) ├── css │ ├── asciidoctor.css │ ├── magic-colors.css │ └── ... ├── font-awesome │ ├── css │ │ └── font-awesome.min.css │ └── fonts │ ├── fontawesome-webfont.woff2 │ └── ... └── images ├── shakyamuni-introduces-pythagoras-to-sensimilla.png └── ...
1 | Unpublished posts, not used in right-column (which consists of tags and history listings) creation. |
2 | Source AsciiDoctor files, this is where you actually write/edit the posts before you decide to publish them. |
3 | Draft .adoc source files converted to HTML by AsciiDoctor (without header, right-column, and footer). |
4 | Full HTML pages of the drafts = header + main-part + right-column + footer. Note that the right column (Tags and History) is constructed using only the published posts. |
5 | Template (permanent) files used in docroot/*.html files construction. |
6 | Published posts, of which the right-column is build. This is not the directory to be transferred to the web server (see docroot below). |
7 | On-the-fly created HTML code for tags and history listings. |
8 | Source AsciiDoctor files, this is where you edit already published posts. |
9 | Published .adoc source files converted to HTML by AsciiDoctor (without header, right-column, and footer). |
10 | Similar to the previous main-part folder, only here the source files before converting are truncated by //readmore line. The resulting HTML pages are used for building the homepage (index.html ). |
11 | Similar to the previous truncated-parts folder, but here reside the Atom entries built out of the truncated source files. These .xml entries are used for fast constructing of the syndication feed. |
12 | This is the folder you rsync to the web server’s document root, contains all the files (and only them) needed for the website to be served. |
13 | Homepage, consists of a few recent posts, and (as all other .html pages in this folder) header, right-column, and footer. |
14 | Atom feed syndication page. |
15 | Individual post’s pages. |
16 | Automatically created sitemap (for search bots). |
17 | See robots.txt |
18 | Tag’s pages, each listing all the posts tagged with the given tag. |
19 | CSS, fonts, media etc. |
4. Workflow
4.1. Commands summary
Assuming you installed NoPress as explained in the Prepare to use section, simple type the np
command without any arguments to get the help message:
~$ np Usage: If you defined an alias to this script, e.g., 'np', use it: np <command> [<argument>] Otherwise, use the full path to the script: /home/user/bin/np.sh <command> [<argument>] Where the combination of command and argument is one of: n <title> write a new post, then choose whether to save as a draft or to publish e <title or some part thereof, case-insensitive> edit (and optionally publish) an existing draft post; if no exact match will be found, fuzzy search will be performed E <title or some part thereof, case-insensitive> edit an already published post; if no exact match will be found, fuzzy search will be performed d <title> delete a draft post D <title> delete a published post u <title> unpublish a published post (move to drafts) t <tag> rename tag (you will be asked for a new tag name) l d|p|t list all: drafts|published posts|tags r rebuild the docroot, for manual interventions only, not needed in a proper workflow NOTE: if an argument (title or tag) has character(s) having special meaning in double quotes (e.g. '$'), quote the argument with single quotes. For example: np n 'Where is my $100?'
So to start writing a new post one types something like:
~$ np n Simple solution of the Collatz conjecture
The editing session with live preview will start, see Editing with live preview section.
Most of the aforementioned commands are quite trivial, and the processes behind them are explained in the heavily commented NoPress script file. For the additional clarity, the E
command (edit published post), being probably the least trivial, explained in the next section.
4.2. Example: "Edit published post" command
4.2.1. From the user’s perspective
Step 1: Invoking the editing session
-
Type
np E They Live
command in a terminal, wherenp an alias to the NoPress script
E "edit published post" command
They Live title of the post or part thereof
-
If there is a post with the exact matched title, the editing session will start
To be precise, the condition is that there is a post whose filename is equal to a filename derived from the given title; filenames are all lower-case and don’t have any non-alphanumeric characters, thus "Refrain; not to kill King Edward is right…" and "REFRAIN NOT! To kill King Edward is right!!!" titles will be transformed to the same filename refrain-not-to-kill-king-edward-is-right.adoc. -
If not, a fuzzy search will be performed: for each word of the provided title, all titles containing that word will be found (case-insensitive). The user will be presented with the list of the found titles to choose one to edit.
That’s quite convenient, for example this very post you’re reading now has rather a long title, but to edit it I just type pp E nopress
, then choose its number from the list. -
If there were no results of the fuzzy search, the script quits.
Step 2: Editing a post with live preview
-
If a post was found or chosen successfully, two windows will open automatically:
-
Text editor (Geany) with the post’s AsciiDoctor source file to edit;
-
Browser (Epiphany) showing the preview of the post’s webpage. The page will auto-refresh whenever you save the source file in Geany.
-
-
When you want to finish the editing session, you type
CTRL+C
in the terminal where the script run, to proceed to the finalizing stage.
Step 3: Finalizing the edit
When the script receives the CTRL+C
signal, it does several things to properly finish the editing session:
-
If the post’s title have been changed, the script will rename the post’s files, and will inform the author about it.
-
If the post being edited is a draft, the script will ask whether the author wishes to publish the post or to continue saving as a draft.
-
The script will ask whether the author wishes to update the "Last Modified" date. The rationale is that for typos and other minor changes there is a little sense to update the said parameter.
-
For the published posts, the script will perform additional automatic tasks, see Under the hood.
4.2.2. Under the hood
The blocks represent the script’s functions: blocks' headers are the functions names, blocks' content shows what a function does.
Some details are omitted for clarity. See the comments in the script for the full description. |
___________________ | | | | | | | ~: np E They Live | |___________________| /:::::::::::::::::::/ /:::::::::::::::::::/ | v .-------------------------. | main() | |-------------------------| | working_dir='published' | | title='They Live' | '-------------------------' | .----------------|--------------------------------------------------------. | | find_post_and_edit() | |----------------|--------------------------------------------------------| | v .-----------------------. | | .----------------------. / are there posts \ | | / is there a source file \ / in $working_dir \ | | ( in $working_dir named ) --no->( having either "they" ) | | \ 'they-live.adoc'? / \ or "live" in the title? / | | '----------------------' \ (case ignored) / | | | '-----------------------' | | | |------no---. | | yes .------------------------. | | | | | | list the found titles, | yes v | | | | let user to choose one |<-----' .------. | | | '---.--------------------' | exit | | | | / '------' | '----------------|-----/--------------------------------------------------' | / .----------------|---/-----------------------------------------------------------------------. | | / edit_adoc() | |----------------|-/-------------------------------------------------------------------------| | |/ .-----------------------------. | | v | open .html file | | | .----------------------------. | (converted from the source) | | | | setting up editing session | | in browser | | | | with live preview |---------------------------->| with auto-refresh | | | '----------------------------' | |-----------------------------| | | | v | user sees how | | | | .--------------------. | the post will look | | | v | watchdog for | '-----------------------------' | | .------------------. | changes in | ^ | | | open source file | | the source file |-------------------' | | | in text editor | |--------------------| | | '------------------' | rebuilds html page | | | | '--------------------' | | | ^ | | v | | | .--------------. .----------------------------. | | | user | | user saves the file | | | | writes/edits |----->| without exiting the editor | | | | the post | | aka CTRL+S | | | '--------------' '----------------------------' | | | | '----------------|---------------------------------------------------------------------------' v _____________________ _________________________ \ \ ______________ \ \ \ user hits CTRL+C \ \ \ \ trap finalize SIGINT \ \ in the console \ \ Editor and \ \ \ ) to terminate )-->) Browser )--->) catches the CTRL+C and )--. / the editing / / terminated / / calls finalize() / | / session / /_____________/ / / | /____________________/ /________________________/ | | | v .-----------------------------------. | finalize() | |-----------------------------------| | 1. If the title has been changed, | | rename the files | | | | 2. If the post is a draft, ask | | whether to publish; if yes, | | move post's files from | | $drafts_dir to $pub_dir | | | .---------------------------------. | 3. Ask whether to update | | rebuild() | | "Last modified" date; | |---------------------------------| | update if yes | | 1. Rebuild the right column | | .------------------. | | (Tags and Posts History) | | / 4. the edited post \ | | |<----published------( is 'published' )--draft | | 2. Rebuild all other webpages | | \ or 'draft'? / | | | using the updated right column | | '------------------' v | | | | .------. | | 3. Build the syndication feed | | | exit | | '---------------------------------' | '------' | '-----------------------------------'
5. The NoPress script
See the next section for the preparations to be made before using this script.
Toggle visibility
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
#!/bin/bash
# so-called 'bash strict mode'
set -euo pipefail
# extended globbing syntax, needed for string_to_filename() etc.
shopt -s extglob
## External tools that need to be installed:
# sudo apt install asciidooctor geany epiphany inotify-tools recode katex
#####################################################################
# #
# TABLE of CONTENTS: #
# #
# 1. Settings #
# 2. Usage #
# 3. Global Variables #
# 4. Principal functions: main() and functions called by main() #
# 5. Helper functions: all other functions #
# 6. End of definitions, main() is called #
# #
#####################################################################
#################
# #
# 1. SETTINGS #
# #
#################
# MAX number of recent posts to include in homepage and in syndication feed.
num_of_recent_posts=10
# Website general settings
readonly site_url='https://borsh.ch'
readonly site_name='Borshch'
readonly default_keywords='World, Mind, Controlled Sanity'
# Local folder where all the files will reside
readonly project_dir=/home/user/np
## editor and browser for editing with preview:
editor() {
# as of Feb 2022 Geany has AsciiDoc highlighting in master branch
# but not yet in release, need to compile from source
geany --new-instance --no-session "$adoc_file" &>/dev/null
}
browser() {
# epiphany does autorefresh out of the box
epiphany --private-instance "$full_html" &>/dev/null
}
my_asciidoctor() {
# asciidoctor with some command-line options (see 'man asciidoctor' and
# https://docs.asciidoctor.org/asciidoc/latest/attributes/document-attributes-ref/)
# converts AsciiDoc code to HTML.
# the last '-' is to get input from stdin
asciidoctor -o - \
-a linkcss -a icons=font -a toc=macro \
-a source-highlighter=pygments -a pygments-css=class -a pygments-style=monokai \
-a prewrap! -a source-linenums-option -a pygments-linenums-mode=table \
-
}
# Anything enclosed by tex#...# will be processed as KaTeX code, see asciidoc_converter()
readonly tex_regex='tex#([^#]+)#'
#################
# #
# 2. USAGE #
# #
#################
usage() {
# Shows help and exits.
# Called by main() when arguments to the script don't add up.
cat << EOF_USAGE
Usage:
If you defined an alias to this script, e.g., 'np', use it:
np <command> [<argument>]
Otherwise, use the full path to the script:
$0 <command> [<argument>]
Where the combination of command and argument is one of:
n <title>
write a new post, then choose whether to save as a draft or to publish
e <title or some part thereof, case-insensitive>
edit (and optionally publish) an existing draft post; \
if no exact match will be found, fuzzy search will be performed
E <title or some part thereof, case-insensitive>
edit an already published post; \
if no exact match will be found, fuzzy search will be performed
d <title>
delete a draft post
D <title>
delete a published post
u <title>
unpublish a published post (move to drafts)
t <tag>
rename tag (you will be asked for a new tag name)
l d|p|t
list all: drafts|published posts|tags
r
rebuild the docroot, for manual interventions only, \
not needed in a proper workflow
NOTE: if an argument (title or tag) has character(s) having special \
meaning in double quotes (e.g. '$'), quote the argument with single \
quotes. For example:
np n 'Where is my \$100?'
EOF_USAGE
exit
}
#################################
# #
# 3. GLOBAL VARIABLES #
# #
#################################
##
## PART 1. Static variables for files' and folders' names.
##
# Two main folders in the $project_dir, having similar internal structure
# (although $pub_dir has some additional files and sub-dirs).
# A new post goes to the 'drafts', and remains there until the author
# decides to publish it.
# When a post is published, its files are moved to 'published'.
readonly drafts_dir=$project_dir/drafts
readonly pub_dir=$project_dir/published
# Folder for permanent snippets of html and xml to build webpages.
readonly misc_dir=$project_dir/misc
# First half of the header for all html files
# Header is split to insert meta tags (title and keywords) in between
readonly header_1=$misc_dir/header_1.html
# Second half of the header for all html files
readonly header_2=$misc_dir/header_2.html
# Footer for all .html files
readonly footer=$misc_dir/footer.html
# Header for the syndication feed
readonly atom_header=$misc_dir/atom-header.xml
# These files are (re)created by rebuild()
# 1. Right column is where the tags and the posts history are shown.
readonly right_column=$pub_dir/right-column.html
# 2. Syndication feed file:
readonly atom=$pub_dir/docroot/feed.xml
# 3. Homepage:
readonly index=$pub_dir/docroot/index.html
# 4. All the tags' pages are stored here:
readonly tags_dir=$pub_dir/docroot/tags
# 5. Sitemap:
readonly sitemap=$pub_dir/docroot/sitemap.xml
# For the homepage and Atom feeds, the posts are truncated by an optional
# '//readmore' line in the source file. The following two folders
# contain ready converted files, prepared for each post by
# build_index_and_atom_entries().
# The purpose of this is to have faster rebuild().
# 1. .html files for building index.html
readonly truncated_parts_dir=$pub_dir/truncated-parts
# 2. .xml atom entries
readonly atom_entries_dir=$pub_dir/atom-entries
##
##
## PART 2. Dynamic variables, representing dir- and file-names.
#
# They get their values depending on:
# A. Whether draft(s) or published post(s) is(are) processed, which is
# determined by $working_dir (is it = $drafts_dir or $pub_dir);
# B. Specific post that been processed, determined by $filename, which
# gets its value from $title via filename_by_title().
#
# See define_files_and_dirs() for additional details, although that is
# not the only function that sets those variables.
# Defines whether drafts or published posts are been worked on.
declare working_dir
# Where source files are located
declare adoc_dir
# Where the HTML files, converted from the source files, are located.
# Those files are not the full webpages, only the post's content, without
# navigation menu, right column, footer etc. Those files are created for each
# post individually in the time of its editing, and kept for the faster
# rebuild(), because the AsciiDoc->HTML converter is relatively slow.
declare main_part_dir
# Where the full webpages are located. Those are the files to be sent to the
# webserver.
declare docroot
# Processed post's filename (without path and without extension), derived
# from the post's $title. Source files and HTML file names of the given
# post are made by adding the path and the extension ('.adoc' or
# '.html', respectively) to the $filename.
declare filename
# Source file of the given post.
declare adoc_file
# Source file converted to HTML (see the comment for $main_part_dir).
declare main_part
# Full webpage of the given post (incl. header, footer etc.)
# Note: Drafts' full pages are created using published right column,
# to be viewed as a preview of the post as it would look when it will
# be published, but since the right column is prepared from the published
# posts only, the draft post will not appear in the Posts History
# and its tags will not be added to the Tags pages.
declare full_html
# For the source files having '//readmore' line, there are truncated
# versions of their HTML-converted content, to be included in the homepage.
# If there is no '//readmore' line, those files will be equal to the
# corresponding $full_html files. As with $main_part, those are created
# and kept for faster rebuild(), but for the published posts only.
declare truncated_html
# Same as $truncated_html, but in the XML Atom form to be included in
# the syndication feed.
declare atom_entry
# Array gathering all the files of any given post (3 files for drafts,
# namely $adoc_file, $main_part and $full_html; 5 files for published
# posts, namely the same 3 as for drafts + $truncated_html and $atom_entry).
# This array is useful for renaming (in case of changes in post's title)
# or for publishing a draft (by moving ${parts[@]} from 'drafts' to 'published').
# Used by finalize() via move_parts().
declare -a parts
# Before moving, the current parts are assigned to old_parts, see finalize().
declare -a old_parts
##
## Part 3. Other global variables.
##
# The title of the current post. Can be read from source file metadata by
# get_title(). The $filename is derived from $title by filename_by_title().
declare title
# Those two used for globally renaming a tag. See main() and rename_tag().
declare current_tag_name new_tag_name
# The tags of a post are kept in the source file in the comma-separated line.
# The function make_tags_array() creates this array where each element is
# a separated tag of the current post.
declare -a tags_array
# The following two variables are used by execution_time() to show for each
# step of rebuild() how much time it took. See rebuild().
declare t_start t_end
# A current tag been processed. The filename of the tag's webpage is
# derived from $tag by tag_file_html().
declare tag
# Time-sorted arrays created by wittily named make_time_sorted_arrays().
# Store posts' properties in time-descending order, so, for example,
# the title of the latest post will be ${array_titles[0]} and so on.
# Used in rebuild() and list_posts().
declare -a array_titles array_taglines array_modified array_filenames
# See tex_to_html()
declare mathml_only='false'
#################################
# #
# 4. PRINCIPAL FUNCTIONS: #
# #
# main() and called by main() #
# #
#################################
main() {
# Main function will be called at the end of the script,
# after all other function definitions.
# Parses the arguments provided to the script, sets the $working_dir
# accordingly, and calls the relevant function to continue.
#
# Arguments - see usage()
# Only "r" command doesn't require additional arguments
[[ ${1:-} == r ]] || [[ ${2:-} ]] || usage
# Set title for new|edit|delete|unpublish commands; for the other commands set
# 'dummy' title to allow to call define_files_and_dirs()
[[ "$1" =~ ^(n|e|E|d|D|u)$ ]] && title="${*:2}" || title='dummy'
case $1 in
n) work_on_draft
if [[ -f $adoc_file ]]; then
ask "Draft '$adoc_file' exists, want to edit it?" && edit_adoc
else
echo -e "//title: $title\n//tags: untagged" \
"\n//published: draft\n//modified: $(get_now_datetime)\n\n" > $adoc_file
edit_adoc
fi
;;
e) work_on_draft
find_post_and_edit
;;
E) work_on_published
find_post_and_edit
;;
d) work_on_draft
delete_post
;;
D) work_on_published
delete_post
;;
u) unpublish_post
;;
t) current_tag_name="${*:2}"
read -p "enter new tag name: " new_tag_name
# first rename tags in 'published' to rebuild the right column
work_on_published
rename_tag
work_on_draft
rename_tag
;;
l) case $2 in
d) work_on_draft
list_posts
;;
p) work_on_published
list_posts
;;
t) work_on_draft
echo "tags from drafts:"
list_tags
work_on_published
echo "tags from published:"
list_tags
;;
*) usage
;;
esac
;;
r) work_on_published
rebuild
;;
*) usage
;;
esac
}
work_on_draft() {
# Sets the $working_dir to $drafts_dir,
# then calls define_files_and_dirs() to set the global variables
# related to folders and files names so that other functions would
# process the draft posts. See define_files_and_dirs() for details.
working_dir=$drafts_dir
define_files_and_dirs
}
work_on_published() {
# Sets the $working_dir to $pub_dir,
# then calls define_files_and_dirs() to set the global variables
# related to folders and files names so that other functions would
# process the published posts. See define_files_and_dirs() for details.
working_dir=$pub_dir
define_files_and_dirs
}
edit_adoc() {
# Creates an interactive editing session with a live preview.
#
# Opens the source .adoc file in editor(), and
# launches browser() to view the resulting .html webpage.
#
# 'inotifywait' watches for changes in the source file (detected whenever
# the author hits 'save' in editor), then calls adoc_to_html() to
# rebuild the edited post's webpage. Browser detects the change in .html
# file and auto-refreshes.
#
# To end the session, the author sends CTRL+C to the terminal where the
# script runs. The signal terminates editor() and browser(), but
# 'trap' (see below) catches the signal, and calls finalize() to properly
# finish the editing session.
echo "editing $adoc_file"
echo "to finish editing, press CTRL+C in here"
# Get the proper case, since the command line argument is case-insensitive.
title=$(get_title)
adoc_to_html
editor &
browser &
local changed_file
inotifywait --monitor \
--event moved_to \
--event modify \
--format "%w%f" "${adoc_dir}" | \
while read changed_file; do
[[ "$changed_file" == $adoc_file ]] && adoc_to_html
done
}
# Catch CTRL+C in the terminal to proceed from edit_adoc() to finalize().
trap finalize SIGINT
adoc_to_html() {
# Converts a source file ($adoc_file) to html webpage:
#
# 1. Calls make_tags_array() to create array of tags
#
# 2. Joins adoc_header(), which provide parsed metadata
# (post's title, tags, 'published' and 'last modified' dates) and
# the source file; sends all this to asciidoc_converter(), which converts it
# to HTML; the result is kept in $main_part, which is not yet a full
# webpage, only the post's html, without navigation, posts history etc.,
# and created separately to allow fast rebuild()
#
# 3. Builds the full webpage by concatenating header, main_part,
# right column, footer etc. That is the file which will go to the docroot.
make_tags_array
cat <(adoc_header) $adoc_file | asciidoc_converter > $main_part
# Inserts two SEO-related <meta> tags to html header, then
# concatenates all the needed building blocks to create a full webpage.
{ cat $header_1
echo '<title>'$site_name' - '$title'</title>'
echo '<meta name="keywords" content="'$(get_tagline)'">'
cat $header_2
cat $main_part
back_to_top
cat $right_column $footer
} > $full_html
}
find_post_and_edit() {
# Called in the wake of the 'edit' command.
# Tries to find an existing post by the given title.
# If no exact match found, prints a list of titles having at least
# one word from the provided title and lets the author to choose one to edit.
# Exits if not even a partial match is found.
[[ -f $adoc_file ]] && edit_adoc
# Exact match not found, try partial matches (ignore case)
local all_titles=$(list_posts)
# create array of partially matched titles
declare -a matched_titles=()
readarray -t matched_titles < <( \
for word in $title; do
# '|| :' is to prevent exit if grep don't find a match
grep -i $word <<<"$all_titles" || :
# 'sort unique' since same title may be matched by several words.
done | sort -u
)
[[ ${matched_titles[@]} ]] || fail "no fully or partially matched titles found"
echo "Found partially matched titles, choose number to edit, CTRL+D to exit:"
select title in "${matched_titles[@]}"; do
[[ $title ]] || continue
define_files_and_dirs
edit_adoc
break
done
}
delete_post() {
# Deletes the post (all its 'parts').
# When called on 'published', also rebuilds the website, see rebuild().
local part
for part in "${parts[@]}"; do
[[ -f $part ]] || fail "no such file '$part'"
done
ask "Going to delete '$title' from ${working_dir##*/}, ARE YOU SURE?" || fail "aborting"
for part in "${parts[@]}"; do
rm $part || fail "cannot remove '$part'"
done
[[ $working_dir == $pub_dir ]] && rebuild
}
unpublish_post() {
# Moves post from 'published' to 'draft':
#
# 1. Verifies that post exists in $pub_dir
# 2. Deletes the two last parts[] ($truncated_html and $atom_entry)
# which aren't needed for a draft post
# 3. move_parts()
# 4. changes 'published date/time' to draft
# 5. rebuild()
work_on_published
local part
for part in "${parts[@]}"; do
[[ -f $part ]] || fail "no such file '$part'"
done
ask "Going to unpublish '$title' (move to drafts), ARE YOU SURE?" || fail "aborting"
echo "removing redundant files..."
for i in 4 3; do
rm ${parts[$i]} || fail "failed to remove '${parts[$i]}'"
unset -v parts[$i]
done
echo "moving to drafts..."
old_parts=("${parts[@]}")
work_on_draft
move_parts
# change 'published' to draft
sed -i "3s|.*|//published: draft|" $adoc_file
work_on_published
rebuild
}
rename_tag() {
# Called in the wake of the 'rename tag' command.
# Renames tags by looping and 'sed'ing over source (.adoc) files.
# main() calls this function first on the 'published' dir, then on 'drafts'.
# When called on 'published', also rebuilds the website, see rebuild().
local found=''
for adoc_file in $adoc_dir/*.adoc; do
local regex='(^|,)\s*'$current_tag_name'\s*(,|$)'
if [[ $(get_tagline) =~ $regex ]]; then
found='y'
local sed_regex='2s/(tags:|,)\s*'$current_tag_name'\s*(, |$)/\1 '$new_tag_name'\2/'
sed -i -r "$sed_regex" $adoc_file
title=$(get_title)
define_files_and_dirs
adoc_to_html
[[ $working_dir == $pub_dir ]] && build_index_and_atom_entries
fi
done
if [[ $found ]]; then
echo "found and replaced in ${working_dir##*/}"
found=''
[[ $working_dir == $pub_dir ]] && rebuild
else
echo "not found in ${working_dir##*/}"
fi
}
list_posts() {
# Outputs a list of all posts' titles in the $working_dir, sorted
# by time, recent first.
make_time_sorted_arrays
printf '%s\n' "${array_titles[@]}"
}
list_tags() {
# Extracts tags from all source files, prints tag per line,
# then removes duplicates by 'sort unique'.
sort -u <(for adoc_file in ${adoc_dir}/*.adoc; do
make_tags_array
printf '%s\n' "${tags_array[@]}"
done)
}
rebuild() {
# Called by several functions whenever the changes made require
# rebuilding the website. Could be called manually as well, but
# it's not needed in a routine workflow, serves only as a debugging tool,
# or manually fixing some unexpected mess.
#
# Does several things:
# - Recreates the right column (tags list and posts history).
# - Rebuilds the posts' webpages to embed the updated right column.
# - Recreates tags' webpages.
# - Out of most recent posts, builds:
# -- The homepage,
# -- The syndication (Atom) feed.
# - Recreates the sitemap.xml
# This should never happen, yet checking to be on the safe side...
[[ $working_dir == $pub_dir ]] || fail "'rebuild' should be called only on '$pub_dir'"
ask "Rebuild the Published Docroot? (if in doubt, agree)" || { echo "skipping rebuild..."; return; }
# Since this function might take quite a time to execute, for the sake of
# possible optimization considerations, there is a time measurement feature,
# showing what time it takes for each block of code. First, the $t_start
# gets the current time (in nanoseconds):
t_start=$(date +%s%N)
# then execution_time() is called after each block of code, showing
# the execution time (in milliseconds) of the said block.
# finally, total execution time is shown, both raw and per post.
t_total=0
# make arrays of posts and their metadata, sorted by time
# in decreasing order, that is, latest post has '0' index.
make_time_sorted_arrays
# show how much time it took
execution_time 'time_sorted_arrays'
local i
# associative array (indices are tags) to hold list of links (in html format) to posts having a given tag
declare -A this_tag_posts
# for each post, add html links to the post to each of its tags (hold in this_tag_posts[$tag])
for i in "${!array_titles[@]}"; do
# post's tagline (comma-separated) processed by read -d ','
# which also removes all leading and trailing whitespace characters (by default).
#
# the last value after delimiter not processed, hence || [[ -n $tag ]]
# (see https://fabianlee.org/2021/03/26/bash-while-statement-with-read-not-processing-last-value/)
while read -d ',' tag || [[ -n $tag ]]; do
this_tag_posts[$tag]+=$'<div class="paragraph"><p>\n<a href="/'
this_tag_posts[$tag]+=${array_filenames[i]}.html'">'${array_titles[i]}$'</a>\n</p></div>\n'
done <<<${array_taglines[i]}
done
# make array of all tags (sorted)
#
# instead of listing all tags of all posts and sort unique,
# list all indices of this_tag_posts, separated by newline (IFS=$'\n'), then sort and feed into all_tags
IFS=$'\n' all_tags=($(sort <<<"${!this_tag_posts[*]}")); unset IFS
# associative array (indices are tags) to hold filenames (tag-<tag>.html) of each tag
declare -A tags_filenames_html
for tag in "${all_tags[@]}"; do
tags_filenames_html[$tag]=tag-$(string_to_filename "$tag").html
done
execution_time 'all_tags'
# Rebuild the right column:
{
# 1. Build Tags list
echo -e "</div>\n<div class=\"rightcolumn\">\n<h3>Tags</h3>\n<ul>"
for tag in "${all_tags[@]}"; do
echo '<li><a href="/tags/'${tags_filenames_html[$tag]}'">'$tag'</a></li>'
done
# 2. Build posts history, grouping by months.
echo -e "</ul>\n<h3>Posts</h3>\n<ul>"
local this_post_month this_post_year last_post_month=0 last_post_year=0
# Use time-sorted array to build posts history, from newer to older posts.
# Whenever there is a shift in 'last modified' month or year,
# print appropriate header
for i in "${!array_titles[@]}"; do
this_post_year=$( date +%Y -d ${array_modified[i]} )
this_post_month=$( date +%B -d ${array_modified[i]} )
if [[ $this_post_year != $last_post_year ]]; then
echo -e "</ul>\n<h4>$this_post_year</h4>\n<h5>$this_post_month</h5>\n<ul>"
elif [[ $this_post_month != $last_post_month ]]; then
echo -e "</ul>\n<h5>$this_post_month</h5>\n<ul>"
fi
last_post_year=$this_post_year
last_post_month=$this_post_month
echo '<li><a href="/'${array_filenames[i]}'.html">'${array_titles[i]}'</a></li>'
done
} > $right_column
execution_time 'right column'
# Rebuild all pub/<post>.html files (since right column has been changed).
# The sequence inside the loop is similar to the second part of adoc_to_html()
for i in "${!array_titles[@]}"; do
{ cat $header_1
echo '<title>'$site_name' - '${array_titles[i]}'</title>'
echo '<meta name="keywords" content="'${array_taglines[i]}'">'
cat $header_2
cat $working_dir/main-part/${array_filenames[i]}.html
back_to_top
cat $right_column $footer
} > $working_dir/docroot/${array_filenames[i]}.html
done
execution_time 'posts'
# (Re)build tag's webpages (tag-<tag>.html files)
#
# First, clean the $tags_dir folder to get rid of unused tag-<tag>.html files.
# Such unused files may be found if a certain tag was globally renamed (by 'rename tag' command)
# or manually deleted/renamed in every post initially having it.
rm $tags_dir/*.html
for tag in "${all_tags[@]}"; do
{ cat $header_1
echo '<title>'$site_name' - '$tag'</title>'
echo '<meta name="keywords" content="'$tag'">'
cat $header_2
echo '<div id="header">'
echo '<h1>Posts with <span class="aqua">'$tag'</span> tag:</h1>'
echo '</div><div id="content">'
echo ${this_tag_posts[$tag]}
echo '</div>'
cat $right_column $footer
} > ${tags_dir}/${tags_filenames_html[$tag]}
done
execution_time 'building tag files'
# Build homepage ($index) and Atom Syndication feed ($atom)
# out of MIN($num_of_recent_posts, $num_of_all_posts) recent posts
local num_of_all_posts=${#array_titles[@]}
[[ $num_of_recent_posts -gt $num_of_all_posts ]] && num_of_recent_posts=$num_of_all_posts
{ cat $header_1
echo '<title>'$site_name'</title>'
echo '<meta name="keywords" content="'$default_keywords'">'
cat $header_2
for ((i=0; i<num_of_recent_posts; i++)); do
cat $truncated_parts_dir/${array_filenames[i]}.html
done
back_to_top
cat $right_column $footer
} > $index
{ cat $atom_header
echo '<updated>'${array_modified[0]}'</updated>'
for ((i=0; i<num_of_recent_posts; i++)); do
cat $atom_entries_dir/${array_filenames[i]}.xml
done
echo '</feed>'
} > $atom
execution_time 'index + atom'
# Make the Sitemap
{
echo '<?xml version="1.0" encoding="UTF-8"?>'
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'
local i
for i in "${!array_filenames[@]}"; do
echo ' <url>'
echo ' <loc>'${site_url}'/'${array_filenames[i]}'.html</loc>'
echo ' <lastmod>'${array_modified[i]}'</lastmod>'
echo ' </url>'
done
for tag in "${all_tags[@]}"; do
echo ' <url>'
echo ' <loc>'${site_url}'/tags/'${tags_filenames_html[$tag]}'</loc>'
echo ' </url>'
done
echo '</urlset> '
} > $sitemap
execution_time 'sitemap'
echo "total time $t_total ms; $(( t_total / num_of_all_posts )) ms per post"
}
###########################
# #
# 5. HELPER FUNCTIONS #
# #
###########################
define_files_and_dirs() {
# Most of the functions here use global namespace. Therefore this function
# defines the current working sub-directories (by using $working_dir,
# which is set either to $drafts_dir or to $pub_dir by the caller function),
# and the current filename,
# by getting it from the $title by calling filename_by_title().
# Finally, the '$parts[]' array is created, which gather all the files
# which are parts of any given post. This array is useful for renaming
# (in case of changes in post's title) or for publishing a draft
# (by moving all the parts from 'drafts' to 'published'.
adoc_dir=$working_dir/adoc
main_part_dir=$working_dir/main-part
docroot=$working_dir/docroot
filename=$(filename_by_title)
adoc_file=$adoc_dir/${filename}.adoc
main_part=$main_part_dir/${filename}.html
full_html=$docroot/${filename}.html
parts=( $adoc_file $main_part $full_html )
# the following is for the published posts only
if [[ $working_dir == $pub_dir ]]; then
truncated_html=$truncated_parts_dir/${filename}.html
atom_entry=$atom_entries_dir/${filename}.xml
parts=( "${parts[@]}" $truncated_html $atom_entry )
fi
}
finalize() {
# Called by 'trap finalize SIGINT' when the author finishes post editing.
# 1. Checks whether a title was substantially changed, if so, renames files
# accordingly by move_parts() from old filenames to the new ones.
# 2. Asks whether the changes were significant enough
# to update the 'Last modified' datetime.
# 3. For the Drafts, asks whether the author wishes to publish the post,
# if yes, move_parts() from 'drafts' to 'published'.
# 4. For the published posts, calls build_index_and_atom_entries() and rebuild().
echo
# need to call this again since previous calls were executed in a subshell
make_tags_array
# Rename files if the title has been changed, but exclude changes that
# don't affect the filenames (such as changes in case, punctuation
# marks etc.)
local original_title=$title
title=$(get_title)
if [[ $(basename $adoc_file .adoc) != $(filename_by_title) ]]; then
echo "Title changed, original title: '$original_title', new title: '$title'." \
'Renaming files appropriately...'
old_parts=("${parts[@]}")
define_files_and_dirs
move_parts
fi
local just_published=''
if [[ $working_dir == $drafts_dir ]]; then
if ask "Save as Draft? 'n' means 'publish this post' "; then
echo "Saving as Draft..."
else
old_parts=("${parts[@]}")
work_on_published
echo "publishing draft..."
move_parts
# set 'published' datetime
sed -i "3s/draft/$(get_now_datetime)/" $adoc_file
just_published='y'
fi
fi
if [[ $just_published ]] || ask "Update 'Last modified' datetime? (say 'n' for minor changes)"; then
echo 'updating "last modified"...'
sed -i -r "4s/ .*$/ $(get_now_datetime)/" $adoc_file
adoc_to_html
fi
[[ $working_dir == $pub_dir ]] || exit
build_index_and_atom_entries
rebuild
exit
}
asciidoc_converter() {
# Customized converter from AsciiDoc code to HTML
#
# Before sending to asciidoctor, process KaTeX code (for displaying math formulae), see tex_preprocess()
tex_preprocess | \
# asciidoctor with some (mostly style-related) options, see my_asciidoctor() in Settings Part
my_asciidoctor | \
# This script does not use asciidoctor-produced header and footer,
# (there is '-s' command line option, but it produces slightly different result from what is needed here)
# so 'sed' extracts the relevant part (between <body...> and <...footer>, NOT including):
# 1. '-n' disables automatic printing, so sed only prints when explicitly told to (via 'p');
# 2. When the range /<body...>,<div...>/ is matched, the content of curly brackets is executed;
# 3. The empty regular expression ‘//’ repeats the last regex match, which for
# range will be its boundaries;
# 4. Appending the ! character to the end of it (before 'p') negates the match.
# That is, only lines which do not match, will be printed out,
# namely all the lines of the range, except the first and the last.
sed -n '/<body class="article">/,/<div id="footer">/{//!p}';
}
build_index_and_atom_entries() {
# Called by finalize() and rename_tag() for published posts.
# Prepares chunks of html and xml for the homepage and for the
# syndication feed, respectively.
# The purpose of this function is to prepare those chunks individually
# for each post, when it has been edited, to allow faster rebuild()
local filename_html=$(filename_by_title).html
local link_to_post=$site_url/$filename_html
# If .adoc has '//readmore' line, replace it with 2 newlines + "Read more..."
# link (in AsciiDoc format), and discard the rest (sed 'q' = quit)
local truncated_adoc=$(sed "\|^//readmore| {
s||\n\nlink:/$filename_html[Read more...]|
q
}" $adoc_file)
# Convert title to link to its post page, convert AsciiDoc to HTML, convert local anchors.
{ adoc_header | sed -r "1s|^= (.*)$|= link:$(filename_by_title).html[\1]|"
echo "$truncated_adoc"
} | asciidoc_converter | sed -e "s| href=\"#| href=\"/$filename_html#|g" > $truncated_html
# For Atom entry, convert truncated source file to HTML.
# If there is embedded TeX code, convert to MathML, see tex_to_html()
# After that, set mathml_only back to its default value ('false')
mathml_only='true'
local atom_content=$(asciidoc_converter <<<$truncated_adoc)
mathml_only='false'
# If in the source file the 'Table of Contents' macro (toc::[]) exists before "readmore",
# the ToC in the chunks will be truncated as not all headers will be in the $truncated_adoc,
# so Asciidoctor will build Toc from what it sees.
# Therefore, need to replace the truncated Toc with the full Toc from $full_html
if sed '\|//readmore|q' $adoc_file | grep 'toc::\[\]' >/dev/null ; then
# regex to match the Toc (will match both full and truncated)
local toc_regex='\|^<div id="toc" class="toc">$|,\|^</div>$|'
# 1. extract Toc from $full_html
# 2. replace local anchors with links to post page (local anchors wouldn't work
# since some headers are missing in the truncated chunk)
# 3. change newline with '\n' for sed to be able to use it for replace (see next lines)
local full_toc=$(sed -n "$toc_regex p" $full_html | \
sed -e "s| href=\"#| href=\"/$filename_html#|g" | \
sed -z 's|\n|\\n|g')
# replace (sed 'c' command = change) truncated ToC with the full Toc in $truncated_html
sed -i "$toc_regex c $full_toc" $truncated_html
# replace the ToC as above for Atom
atom_content=$(sed "$toc_regex c $full_toc" <<<$atom_content)
fi
# Create Atom entry
{ echo '<entry><title>'$title'</title>'
echo '<link href="'$link_to_post'" />'
echo '<id>'$link_to_post'</id>'
echo '<updated>'$(get_modified)'</updated>'
for tag in "${tags_array[@]}"; do
echo '<category term="'$tag'" />'
done
echo '<content type="html">'
# 1. Replace intra-site links (/...) and intra-post anchors (#...)
# with full external links, so a Feed Reader will be able to follow;
# Firs replace anchors with </this-post.html/#>, then the second
# command will prepend the site URL.
# 2. Let <pre> content to be wrapped in a reader.
# 3. According to Atom protocol, escape special characters (such as '<', quotes...)
echo "$atom_content" | \
sed -e "s| href=\"#| href=\"/${filename_html}#|g" \
-e "s| href=\"/| href=\"${site_url}/|g" \
-e 's|<pre class="content">|<pre class="content" style="white-space: pre-wrap; word-break: keep-all">|g' | \
recode --force utf8..html
echo '</content></entry>'
} > $atom_entry
}
adoc_header() {
# Parses the comments at the head of the source file which
# provide title, tags, and the created/modified datetimes, and converts
# them to the .adoc format to be prepended to the source file.
echo "= $(get_title)"
echo
echo 'icon:tags[role="aqua"]'
for tag in "${tags_array[@]}"; do
echo -n " link:/tags/$(tag_file_html)[${tag}]"
done
# Converts datetime to date only.
local pub_date
if [[ $working_dir == $drafts_dir ]]; then
pub_date='draft'
else
pub_date=$(get_published | xargs date -I -d)
fi
local mod_date=$(get_modified | xargs date -I -d)
echo -e "\n\n[small]#Published: $pub_date Last modified: $mod_date#\n\n'''\n\n"
}
move_parts() {
# Used to rename post's files or to publish a post, used by finalize(),
# or to unpublish a post, called by unpublish_post()
# Before moving, validates that the destination files don't already exist.
local part
for part in "${parts[@]}"; do
[[ -f $part ]] && fail "Can't be done: '$part' already exists"
done
for i in "${!old_parts[@]}"; do
mv ${old_parts[$i]} ${parts[$i]}
done
}
get_feature() {
# Parses the 'front matter' of a source file, where the metadata,
# (title, tags, and datetimes) are kept in the form of
# AsciiDoc comments (prepended by '//').
#
# Used in the next four functions, such as get_title().
#
# sed tries to remove '//<keyword>: ' (by 's') and to print the remainder,
# (by 'p'). If succeeded, skips the rest of its commands (by 't'). If
# 's' fails, 't' isn't executed, and sed exits with error (by 'q 1'),
# triggering fail()
#
# argument $1 - line number to look at
# argument $2 - feature's keyword
sed -n "${1} { s|//${2}: ||p; t; q 1; }" $adoc_file || \
fail "'$adoc_file': '$2' line missing or misformatted"
}
# The following 4 'get_...' functions use get_feature() to provide the
# specific feature.
get_title() { get_feature 1 title ;}
get_tagline() { get_feature 2 tags ;}
get_published() { get_feature 3 published ;}
get_modified() { get_feature 4 modified ;}
make_tags_array() {
IFS=, read -a tags_array < <(get_tagline)
tags_array=("${tags_array[@]/#+( )/}")
tags_array=("${tags_array[@]/%+( )/}")
}
make_time_sorted_arrays() {
# Sorts posts by '//modified:' field in the source file.
# Not using filesystem 'last modification time' attribute,
# as in 'ls -t', since in this script there is an option to retain
# 'last modified' field for typos and minor changes. It still could be
# done with FS attribute by 'touch -m', but files' attributes may
# be changed accidentally by some unrelated system fiddling,
# whereas '//modified:' field much less prone to such random changes.
#
# Not using get_title() and similar functions for the sake of faster processing.
# This function is crafted in a way to maximally eliminate file reading
# and subshell creating.
#
# The 'for' loop over source files echoes for each file:
# 1. Four first lines of the file (where the metadata reside),
# 2. Basename of the file (incl. extension which will be removed later),
# with attached NULL delimiter.
# These 5-lines blocks then are sorted by 'sort -r', which sorts the
# whole blocks (using '-z', since the blocks are NULL-delimitered),
# while the key to sort by is defined as fourth field (-k 4), whereas
# fields are defined as newline-separated (-t $'\n'), thus the blocks
# are sorted by '//modified: ...' line (which is 4th line in metadata).
# The sorted blocks are feed into 'while read' loop which creates
# the time-sorted arrays so that index=0 is the most recent post.
local title_line tags_line pub_line mod_line fn_line
while IFS=$'\n' read -d '' title_line tags_line pub_line mod_line fn_line; do
array_titles+=( "${title_line#//title: }" )
array_taglines+=( "${tags_line#//tags: }" )
array_modified+=( "${mod_line#//modified: }" )
array_filenames+=( "${fn_line%.adoc}" )
done < <( \
for adoc_file in $adoc_dir/*adoc; do
sed 4q $adoc_file
echo ${adoc_file##*/}
echo -ne "\0"
done | sort -rz -k 4 -t $'\n' )
}
string_to_filename() {
# Converts a line of text to SEO-friendly filename.
# Used to convert posts' titles and tags to their webpages' filenames.
# The conversion steps:
# 1. convert to lowercase
# 2. convert non-alphanumeric characters to dashes
# 3. reduce adjacent dashes to a single dash
# 4. trim leading dash
# 5. trim trailing dash
#
# somewhat uglier than possible sed one-liner, but much faster
#
# argument $1 - string to process
local string=${1,,}
string=${string//[^a-z0-9]/-}
string=${string//+(-)/-}
string=${string#-}
echo ${string%-}
}
filename_by_title() {
# Converts post's title to its file's basename (w/o extension).
string_to_filename "$title"
}
tag_file_html() {
# Converts $tag to tag's webpage file name, including .html extension.
echo tag-$(string_to_filename "$tag").html
}
tex_preprocess() {
# Used by asciidoc_converter() before sending AsciiDoc code to asciidoctor.
#
# looks for 'tex#<anything here interpreted as KaTeX code>#' pattern
# (customizable by modifying tex_regex value in Settings part),
# e.g. tex#\frac{a}{b}# and replaces it with HTML implementation of the KaTeX code,
# enclosed in +++ (AsciiDoc inline passthrough, so asciidoctor converter will pass it as is).
#
# Uses GNU sed 'e' command which considers the content of the replacement part
# (echo ...) as a shell command, executes it, and replaces the matched pattern with the output.
#
# For this to work, /bin/sh should point to bash, seems that sed uses /bin/sh.
# For example, /bin/sh -> dash could be fixed by 'sudo dpkg-reconfigure dash' -> 'No'.
# this variable determines whether the full HTML/CSS KaTeX implementation will be taken,
# (used in creation of .html files), or MathML part only (for .xml syndication files).
# Need to be exported since sed 'e' command opens a new shell.
export mathml_only
# run the commands included in curly brackets only if there is KaTeX code in the line
sed -r "/$tex_regex/ {
# set label to allow iteration in case there are several KaTeX code inclusions in the line
:start
# See COMMENT 1 below (cannot place here comments with quotes)
s/'/'\"'\"'/g
# See COMMENT 2
s/(.*)$tex_regex(.*)/echo -n '\1+++'; echo '\2' | tex_to_html; echo -n '+++\3'/e
# See COMMENT 3
/$tex_regex/b start
}"
# COMMENT 1
# Variables in the replacement part of the next command (which considered
# as a shell command by 'e') are enclosed by single quotes, e.g. '\1'. If the processed
# string has a single quote, it leads to a problem, e.g. echo 'it's strange'.
# Hence a need to replace single quotes so it becomes echo 'it'"'"'s strange'.
# Ugly but seems to work.
# COMMENT 2
# Two strange things regarding 'e' command (may be it is just my poor understanding of it):
# (1) It does not work as expected with 'g' modifier
# (2) Need to match the entire line, otherwise unmatched parts of the string somehow added
# to the command to execute (because they are part of the pattern space?)
# Therefore, an iterative process is applied. KaTeX code ('\2', see $tex_regex in Settings Part)
# is converted to HTML/MathML by tex_to_html() which is built on katex,
# while the parts before and after the KaTeX regex (can be empty strings)
# ('\1' and '\3') are echoed back, with +++ added to enclose the katex-produced HTML.
# COMMENT 3
# If the processed line has more than one KaTeX inclusion, the previous 's' command would
# process only one of them (the last one, since the match part starts with (.*) and sed is greedy).
# So we need to check whether there is still unprocessed KaTeX regex, and if there is,
# jump ('b') to the label 'start' (in the beginning of this set of commands).
}
tex_to_html() {
# converts KaTeX code to HTML/MAthML (katex produces both but the latter is hidden).
# send the KaTeX code to katex, strip newlines
katex | tr -d '\n' |
if $mathml_only; then
# if $mathml_only is "true", extract only the MathML part (used for Atom entries)
sed -r 's|.*(<math>.*</math>).*|\1|'
else
# otherwise pass it as is
cat
fi
}
# export the function since sed 'e' opens a new shell
export -f tex_to_html
fail() {
# In case of a problem, prints an error message to stderr and exits.
# argument $1 - error message
echo "$1" >&2
exit 1
}
get_now_datetime() {
# Gets the current UTC datetime in YYYY-MM-DDThh:mm:ss format
date --utc --iso-8601=seconds
}
back_to_top() {
# Provides font-awesome icon with "back to top" link
echo '<a class="back-to-top" href="#top"><i class="back-to-top fa fa-chevron-circle-up"></i></a>'
}
ask() {
# Gets user's (dis)agreement: 'n' reply means 'no', any other
# reply, incl. Enter, means 'yes'.
# Reads from '</dev/tty' instead of stdin to be able to work even in
# 'while read' loops.
# Argument(s) are concatenated and used as a prompt phrase.
local reply
read -n 1 -p "$* [Y/n] " reply </dev/tty
echo
[[ $reply != n ]]
}
execution_time() {
# Shows time (in ms) elapsed from the previous invocation of this function,
# or from the previous assigning of $t_start (the later of the two);
# accumulates total execution time.
# Used in rebuild() to analyze which part(s) of it take(s) the most time
# to execute.
# $t_start and $t_end values are in nanoseconds.
#
# Argument $1 - the name of the code block for which the execution time is measured.
t_end=$(date +%s%N)
local elapsed=$(( (t_end - t_start)/1000000 ))
t_total=$(( t_total + elapsed ))
echo $1: $elapsed ms
t_start=$t_end
}
#####################
# #
# 6. LET'S GO !!! #
# #
#####################
main "$@"
6. Prepare to use
6.1. Make folders
Choose the project directory, e.g. ~/np
, then:
~$ mkdir ~/np ~$ cd ~/np ~$ for d in {adoc,docroot,main-part}; do mkdir -p drafts/$d published/$d; done ~$ mkdir -p misc published/atom-entries published/truncated-parts published/docroot/tags published/docroot/includes/css
6.3. Prepare the script
-
Place the script file
np.sh
somewhere, e.g. in~/bin/
. -
Change the values in the script’s "Settings" section accordingly to your needs.
-
Make it executable:
~$ chmod u+x ~/bin/np.sh
-
Make an alias for easier invocation:
~$ echo "alias np='/home/user/bin/np.sh'" >> ~/.bashrc ~$ . ~/.bashrc
-
Try it:
~$ np
You should see the help message.
6.4. Prepare files
6.4.1. Create template files in misc/
-
header_1.html
→ View Page Source of this post, or any other page of this site (for most browsers, the command is CTRL+U). Copy lines from the beginning until<link rel="icon"…
line (including) to~/np/misc/header_1.html
. -
header_2.html
→ In the same Page Source copy lines from</head>
(including) until<div class="leftcolumn">
(including) to~/np/misc/header_2.html
. -
footer.html
→ In the same Page Source, copy lines from the last</ul>
(including) till the end to~/np/misc/footer.html
. -
atom-header.xml
→ View the feed syndication page. Copy lines from the beginning until first</id>
(including) to~/np/misc/atom-header.xml
.
You obviously need to edit these files according to your website parameters (such as site name, copyright etc.) |
6.4.2. Right-column
Prepare initial right-column.html
. It will be overwritten by the script upon first published post.
~$: echo '</div><div class="rightcolumn"><h3>Tags</h3><h3>Posts</h3><ul>' > ~/np/published/right-column.html
6.4.3. robots.txt
Since the script creates a sitemap, create a minimal robots.txt
in the docroot
:
Sitemap: https://borsh.ch/sitemap.xml User-agent: * Disallow: /includes/
Change the link to your website address |
6.4.4. CSS, fonts, and media
CSS
-
Copy
asciidoctor
's default CSS file to the docroot:~$ cp /usr/share/ruby-asciidoctor/stylesheets/asciidoctor-default.css /home/user/np/published/docroot/includes/css/
-
Learn about AsciiDoctor styling and add another CSS files to the same folder according to your needs.
-
Alternatively, you may want to put all the CSS files in some folder outside of
docroot
, and usecleancss
(sudo apt install cleancss
) to merge and compact them to a single file, as I do:~$ cleancss ~/np/misc/css/*.css > ~/np/published/docroot/includes/css/clean.css
Fonts
If you wish to serve font-awesome
locally, install it:
~$ sudo apt install font-awesome
Then prepare folders in docroot
~$ mkdir -p ~/np/published/docroot/includes/font-awesome/css
and copy the relevant files:
~$ cp /usr/share/fonts-font-awesome/css/font-awesome.min.css ~/np/published/docroot/includes/font-awesome/css ~$ cp -r /usr/share/fonts-font-awesome/fonts/ ~/np/published/docroot/includes/font-awesome
Media
Optionally make a folder for images and other media files:
~$ mkdir ~/np/published/docroot/includes/images
6.5. TeX support (optional)
6.5.1. How it works
Unfortunately, AsciiDoctor’s support for embedding mathematical or scientific notation in HTML pages is based either on MathJax (needs JS) or on image mode (I didn’t like it). Another option would be MathML, but is not well supported by browsers. Hence I needed a custom solution. I found that KaTeX comes with a command-line interface which can be used to render mathematical notation from TeX-based layout to HTML/CSS with no need for client-side JS. Consequently, I added a custom (pre)processing of the AsciiDoc code based on KaTeX.
From the author’s point of view, to embed KaTeX code in NoPress
, one needs to wrap it as follows: tex#<KaTeX code>#
(see the asciidoc_converter()
function in the script for the implementation mechanism).
For example, if an author writes
See the power of KaTeX: tex#\displaystyle\frac{d}{dx}\left( \int_{0}^{x} f(u)\,du\right)=f(x)#
It will be rendered as follows:
See the power of KaTeX:
The template tex#<KaTeX code>#
will not work if the KaTeX code itself needs to include #
symbol. In such a case, there is an option to change the template in the Settings part of the script.
6.5.2. Preparations
-
install
katex
as mentioned here; -
verify that
/bin/sh
points tobash
:~$ ls -l /bin/sh <snip> /bin/sh -> bash
if you see, e.g.
/bin/sh -> dash
fix it by
sudo dpkg-reconfigure dash
and choosing 'No'.
-
Prepare CSS and fonts:
~$ mkdir -p ~/np/published/docroot/includes/katex/fonts/ ~$ cp /usr/share/javascript/katex/katex.min.css ~/np/published/docroot/includes/katex/ ~$ cp /usr/share/fonts/truetype/katex/* ~/np/published/docroot/includes/katex/fonts/
6.6. Start using
Write some posts with tags, publish some of them.
6.7. Set up local website preview
Live preview of the edited post functionality as shown in Step 2: Editing a post with live preview is based on opening an .html
file in the browser using file URI scheme with absolute path (file:///…
). Therefore it is unable to follow internal links which are relative to the docroot.
To be able to preview the website locally, including opening internal links, an additional measure is required:
-
Install lightweight HTTP server;
~$ sudo apt install webfs
-
Configure it
Change
web_root
in/etc/webfsd.conf
to the path of yourdocroot
folder, e.g./home/user/np/published/docroot/
; -
Link
includes
folder sowebfs
will be able to use/includes/…
path:~$ sudo ln -s /home/user/np/published/docroot/includes/ /
(yeah, it’s weird to link a random folder in the
/
…) -
Restart
webfsd
as root:root@host:# killall webfsd root@host:# service webfs stop root@host:# service webfs start
-
In Firefox (or any other browser) open
http://127.0.0.1:8000/index.html
6.8. Backups
The script doesn’t provide any backup functionality since you have already set up a proper 3-2-1 backup for your precious files, right? Right?
6.9. Upload (sync) to the Web Server
Once you are are satisfied with the website you have built and wish to share it with the world, you need to upload the local docroot
to the server’s docroot, e.g.:
~$ rsync -azv --delete ~/np/published/docroot/ user@server:/var/www/html/
Of course, you need to re-sync each time you have published a new post or edited an existing one (make an alias for the command to make life easier).