A few weeks ago I'be migrated my whole blog from Wordpress to Octopress. Meanwhile I've discovered Pelican
which is the Pythonic alternative to Octopress. To be honest: The main reason I'm using Pelican instead of Octopress is the ability to import/include IPython notebooks
.
After I've set up my blog using Octopress I only had a bunge of Markdown
files. So let's get started.
Generate metadata for content¶
%%bash
cp ~/work/blog/octopress/source/_posts/*.markdown ~/work/blog/pelican/content/markdown/
%%bash
ls -c ~/work/blog/pelican/content/markdown/ | head -n 10
AWK script for doing the main job¶
%%writefile ~/work/blog/pelican/content/transform.awk
{
# Date
if ( $0 ~ /^date(.*)$/ ) {
# Get date
match($0, /^date: ([0-9]{4}\-[0-9]{2}\-[0-9]{2}).*$/, ary)
print "Date: " ary[1];
# Title
} else if ( $0 ~ /^title(.*)$/) {
match($0, /^title: (.*)$/, ary)
# Remove single/double quotes
gsub(/'/, "", ary[1]);
gsub(/"/, "", ary[1]);
print "Title: " ary[1];
# Author
} else if ( $0 ~ /^author(.*)$/) {
match($0, /^author: (.*)$/, ary)
print "Author: " ary[1];
# Handle available categories as tags
} else if ($0 ~ /^categories:.*$/) {
printf "Tags: "
# Array index
i = 0;
# Read next line until new meta tag is found
while ((getline line ) > 0) {
# Read until new meta tag is found
if ((line !~ /^.*\-.*$/) || (line ~ /^\-\-\-$/)) {
output_string = ""
# Print categories and the next exit loop
for (j=0; j<i; j++)
# Is this the last category
if (j+1 < i)
output_string = output_string tolower(categories[j]) ", "
# Last category
else
output_string = output_string tolower(categories[j])
# Remove last "," and add default category
printf "%s\n", output_string
printf "Category: blog\n\n"
break
}
# Extract category
match(line, /^.*\- (.*)$/, ary)
categories[i] = ary[1]
# Increase index
i++;
}
}
}
Extract the metdata and generate the meta files¶
%%bash
cd ~/work/blog/pelican/
for i in content/markdown/*.markdown; do
cat $i | gawk -f content/transform.awk > $i-meta
done
ls -c content/markdown/*-meta | head -n 10
Sample meta data output¶
%%bash
find ~/work/blog/pelican/content/markdown/*-meta -name "2014-05-*" -exec cat {} \;
Delete old metadata¶
%%bash
cd ~/work/blog/pelican/content/markdown
for i in *.markdown; do
cat $i | sed '/^---$/,/^---$/d' > $i-sed
done
Insert new metadata info file¶
%%bash
cd ~/work/blog/pelican/content/markdown
for i in *.markdown; do
cat $i-meta > $i-final; cat $i-sed >> $i-final;
done
Sample final output¶
%%bash
cd ~/work/blog/pelican/content/markdown
find . -name "2014-05*.markdown-final" -exec head -n 10 {} \;
Replace some strings¶
%%bash
cd ~/work/blog/pelican/content/markdown
# Insert some dummy text for missing attributes in img tags
for i in *.markdown-final; do
sed -i 's/alt=\"\"/alt=\"image description\"/g' $i
done
Rename files and delete bullshit¶
%%bash
cd ~/work/blog/pelican/content/markdown
rm *.markdown-sed
rm *.markdown-meta
rm *.markdown
for i in *.markdown-final; do
mv $i `basename $i ".markdown-final"`.markdown
done
Generate new content¶
Supposing you have already setup your pelican blog now you can run:
%%bash
cd ~/work/blog/pelican/
source env/bin/activate
make html
The End¶
I hope you have enjoyed this one. If you have any questions regarding the process, don't hesitate and leave a comment.