This site functions as an archive of Conner's Blog, which was a blog from 2006-2014 located at Images and links are likely to be broken.

Scanning the Paper Away

I bought a multifunction printer last week, the Brother MFC 240. I didn't buy it for the printer, I purchased it because it was on sale and had a 10 page auto feeder for the scanner. I'm working on moving to a paperless life and decided that getting a scanner and moving all of my documents to pdf was the way to go.

Installing the printer on Ubuntu 8.04 was fairly painless. I followed these instructions for the most part, though I didn't attempt to setup the scan button. It's nice that Brother supports Linux and it definitely made it easy to get the printer and scanner up and running.

To scan I use Sane, it's a solid program if not pretty that does what you need it to do. I set it up to scan each page and just save the image to the appropriate folder in my home directory. This way I can just put the documents in the scanner and walk away. I scan into the pnm format, with the resolution set at 300x300. I then use ImageMagick's convert program to shrink and convert to pdf. Here is the bash command I use to convert a folder of images to pdf.

for i in $(find . -iname '*.pnm'); do convert $i -resize 50% -compress Zip $i.pdf; done

This cycles through the current directory and finds all files ending in .pnm. Then it runs convert on each of the results, resizing the image to 50%, converting to pdf, and compressing it using the Zip format. This results in a decent size pdf at about 2-3 MB per page. There are probably better ways to do this, and unfortunately this doesn't end with the text search able, but for my purposes it does the job. I'm sure I will perfect my method as time goes on, and may play with ocr soon.