Going Paperless with Ubuntu and Google Docs



I hate paper, it’s never where I need it, lacks a search function and in general takes up space. So getting rid of it was a long time quest on my list. Now I though of burning it but in the end settled for a more civilized option.


Unfortunately, as soon as you put a document through the shredder you will get a call asking exactly for that information. So, I needed a work-flow to digitize my documents before going all psycho on them.


I don’t always have my scanner setup, I lack a nice flat  surface where it could always be available to me. As I want to get rid of stuff asap I setup the next best thing, a parking spot.


Once every month (or 2, depending on how much I can postpone stuff), I collect everything from the parkingspot and prepare for scanning. So I get my trusted ADF  equipped HP Printer. While you could get something like a Snapscan, I already spend all my money on videogames and this is available.



For the scanning I use gscan2pdf, it’s a nice tool that allows you to do a lot of scanning in a row. It has options for OCR but only adds this to notes, that’s not good enough for me.  

After a lot of testing I found that using  Lineart at 300DPI, gives good results and doesn’t generate huge files.

Of course, I you got full color holiday cards send by mother, I would keep those  separate  and scan them later using a color setting.

I love computers that are working for me, so I just fill the ADF  with documents and switch on scanning. I get some coffee and stand next to it, going, “Hmm, yeah, HP did you file your TPS reports last week?”

Once this done, you should have a long list of files in gscan2pdf, now all I do is Save  and pick PNG. Select a directory and name to generate a nice directory full of files.

A good tip is to re-order the files in gscan2pdf, as it’s much easier than doing it by hand later.

Adding OCR and creating PDF files

So, now we have the .png files, we need to generate PDF’s from them. I created a small bash script to do this. It will call scripts to OCR a single or group of PNG files and generate a nice PDF ready to upload to Google Docs.


Now the following is still on the prompt, but I would like to add this to Nautilus scripts so you can just select a group of files and say “OCR and PDF”

For now, usage is like follows

png2ocrpdf -l ‘eng’ -t ‘Hello World’ -a ‘Me’ Hello_01.png Hello_02.png Hello_03.png

This will use English OCR and create a file ‘Hello World.pdf’ with 3 pages

Script isn’t supporting spaces for the input filenames yet

Final step

Once the .pdf files are generated, you can upload it to Google Docs or Evernote to find later. I use Google Docs and switch off all conversion options. This will result in a nice readable PDF file online that’s fully searchable. And available on any laptop or my phone so I can access it anywhere and anytime.







Using a reverse proxy to access github from a limited datacenter.


Sometimes life doesn’t work out, you want to be able to push/pull your code to github but the machine you are working on doesn’t have internet access to do so and in my case  the receiving github server is internal so even with internet it would not have worked.

How to fix this with a reverse tunnel, add the following to your local ~/.ssh/config

Host example-host
  RemoteForward 12222 github.com:22

And on example-host, add this to the ~/.ssh/config

Host github.com
  hostname localhost
  port 12222

Now, I’m assuming you already have a straight line to the host you are working on, if not using ProxyCommand might solve that. And that you have AgentForwarding enabled because unprotected ssh keys are bad m’kay.

Now,  once this is setup, you ssh to the host and should be able to use it without voodoo on the prompt

git clone git@github.com:username/project