e42.uk Circle Device

 

Quick Reference

sed and grep

sed and grep

Some of my one-liners and probably more as time pushes on.

cat images.html | grep -o -e '\(href\)="[^"]*" [^>]*><img src=' \
| sed -n 's/href="\([^"]*\)".*/https:\1/p' \
| wget --random-wait --wait 60 --no-verbose --input-file=-

Special prize for anyone who can tell me what website I wrote this one-liner for.

Inefficient File List Creation on Cygwin

This file list is for 7z.exe on Windows 7 and so you will notice that I have replaced the / with \'s

find . -type f | sed '/.*\.class/d' | sed '/\/.git\//d' \
| sed 's|^\./||' | sed 's|/|\\|g' | sed '/^workspace-.*\.7z$/d' \
| sed '/Debug\\/d' | sed '/^\.metadata\//d' > filelist.txt

This filters out, in order:

  • the .class files,
  • .git directories,
  • removes the leading ./ for all lines,
  • replaces / with \,
  • removes workspace-*.7z,
  • removes all Debug directories
  • removes the .metadata directory

More could be done, and it could be more efficient but this does work for my workspace backup. You could say I don't need it or that I should just back up the .git directories... you would probably be correct but I try to keep all my important files in workspace and some of them, I hate to admit, are not under version control.

Converting a mixed \r\n and \n file to \n

sed 's/\r$//'

Yes, the $ is the end of the line ;-)

Hacking the HTML for LCTHW

I am going on holiday and wanted to read through Learn C The Hard Way, as this seems to be a work in progress since 2012 and no recent updates or, at the time of writing, an active Git repo I resorted to chopping it up with sed.

# First, download it
wget -r -l 1 -p http://c.learncodethehardway.org/book/
# Then move the html to this dir...
mv c.learncodethehardway.org/book/* .
rm -r c.learncodethehardway.org
# Now hack up the files.
ls -1 *.html | xargs -I {} sed -i -e '2,+5d' -e '36,54d' -e 's/ <!--<![endif]-->//' {}
ls -1 *.html | xargs -I {} echo tac {}" | sed -e '3,+47d' > "{}.bak  > run.sh
ls -1 *.html | xargs -I {} echo tac {}.bak" > "{} > run2.sh
. run.sh 
. run2.sh
rm *.bak

You can now put this on your ebook reader for simple reading offline on an aircraft (or your girlfriends house, a train, a hike up a hill in Yorkshire, you get the idea).

I know I removed the copyright, it was mixed up with some google page tracking javascript, as you know this book is not my own but the property of Zed A. Shaw.

As it turned out I needed to do quite a bit more fiddling to get this to work on my Sony eReader PRS-T1. For those that are interested I have converted the HTML into XHTML and created an epub file. As this is freely available and an unfinished work I don't think Zed will mind. Download the epub file here: lcthw.epub.

To create the .epub file I used a java project I found on github: automated_digital_publishing. That project now seems to have moved to GitLab: automated_digital_publishing.

Quick Links: Techie Stuff | General | Personal | Quick Reference