The following shell script finds duplicate (2 or more identical) files and outputs a new shell script containing commented-out rm statements for deleting them.
You then have to edit the file to select which files to keep – the script can’t safely do it automatically!
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -print0 |
xargs -0 -n1 md5sum |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF
Example output (rem-duplicates.sh)
#! /bin/sh #rm ./gdc2001/113-1303_IMG.JPG #rm ./reppulilta/gdc2001/113-1303_IMG.JPG #rm ./lissabon/01-01-2001/108-0883_IMG.JPG #rm ./kuvat\ reppulilta/lissabon/01-01-2001/108-0883_IMG.JPG #rm ./gdc2001/113-1328_IMG.JPG #rm ./kuvat\ reppulilta/gdc2001/113-1328_IMG.JPG
Explanation
Tags: bash, find, JPG, md5, md5sum, sed, sort, uniq, xargs
Posted by Hans-Henry Jakobsen
For å sikre seg mot at filer er korrupt kan man benytte kommandoen md5sum
Windows
md5sum -b *.JPG > checksum.md5
Linux
Deretter kopierer man denne md5-fila til rett katalog i Linux og tester at disse filene er identisk vha MD5Sums, et grafisk Windows program.
Posted by Hans-Henry Jakobsen
A script using bash and md5sum to keep track of file integrity.
# Change the separator to allow for filenames containing spaces
# (the default is " \t\n", which confuses the for loop)
IFS=$'\n'
FOLDERS=`find /Volumes/disk\ 1/Pictures/Photos -type d | sed 's/ /\\ /g'`
for FOLDER in $FOLDERS; do
# mind you, this will only work with absolute pathnames
if [ -d $FOLDER ]; then
echo "$0: INFO: Processing" $FOLDER
cd $FOLDER
for FILE in `ls -1|grep -i .jpg`; do
echo "$0: INFO: Checking $FILE"
djpeg -outfile /dev/null $FILE
if [ $? -ne 0 ]; then
echo "$0: ERROR: $FOLDER/$FILE is unreadable as JPEG"
fi
done
if [ -e MD5SUMS ]; then
md5sum -b -c MD5SUMS 2>&1 > /dev/null
if [ $? -eq 1 ]; then
echo "$0: ERROR: in $FOLDER:"
md5sum -c MD5SUMS | grep FAILED 2>&1
fi
else
echo "$0: WARNING: no MD5SUMS in $FOLDER, creating..."
md5sum -b *.* > MD5SUMS
# The obvious bit, in retrospect
chown username:groupname MD5SUMS
fi
fi
done
Posted by Hans-Henry Jakobsen
1. download photos from camera and sort them by date of day in folders
2. remove possible duplicates if I did not erase camera images since last download
3. convert RAW/NEF images to a usable format
All this in one single click!
#!/bin/bash
# Change this to where to store Photos
target=/home/multimedia/Images
camera=”USB PTP Class Camera”
date=$(date –iso-8601)
mkdir -p $target/$date/tmp
cd $target/$date/tmp
# Get all photos from camera
gphoto2 –quiet –camera $camera –port usb: -P
# Do not replace photos that were already uploaded this same day
cp -u $target/$date/tmp/* $target/$date
rm -rf $target/$date/tmp
cd $target/$date
# auto-rotate using exif info
exifautotran *.JPG
# If photos were not erased from camera since last upload, remove duplicates
for i in *.{JPG,NEF}; do
for f in $(find $target -name $i ! -samefile $target/$date/$i); do
if md5sum $f | sed -e “s, .*/, ,” | md5sum –check; then
rm -f $i;
fi
done
done
# decode RAW images if not already done ?
# for i in *.NEF; do if [ ! -e $(basename $i .NEF).ppm ]; then dcraw -w $i; fi; done
# Show them!
gimv -d $target/$date
Tags: bash, dcraw, EXIF, exifautotran, gimv, gphoto2, JPG, md5sum, NEF, sed
Posted by Hans-Henry Jakobsen