The following shell script finds duplicate (2 or more identical) files and outputs a new shell script containing commented-out rm statements for deleting them.
You then have to edit the file to select which files to keep – the script can’t safely do it automatically!
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -print0 |
xargs -0 -n1 md5sum |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF
Example output (rem-duplicates.sh)
#! /bin/sh #rm ./gdc2001/113-1303_IMG.JPG #rm ./reppulilta/gdc2001/113-1303_IMG.JPG #rm ./lissabon/01-01-2001/108-0883_IMG.JPG #rm ./kuvat\ reppulilta/lissabon/01-01-2001/108-0883_IMG.JPG #rm ./gdc2001/113-1328_IMG.JPG #rm ./kuvat\ reppulilta/gdc2001/113-1328_IMG.JPG
Explanation
Tags: bash, find, JPG, md5, md5sum, sed, sort, uniq, xargs
Posted by Hans-Henry Jakobsen
For å sikre seg mot at filer er korrupt kan man benytte kommandoen md5sum
Windows
md5sum -b *.JPG > checksum.md5
Linux
Deretter kopierer man denne md5-fila til rett katalog i Linux og tester at disse filene er identisk vha MD5Sums, et grafisk Windows program.
Posted by Hans-Henry Jakobsen
A script using bash and md5sum to keep track of file integrity.
# Change the separator to allow for filenames containing spaces
# (the default is " \t\n", which confuses the for loop)
IFS=$'\n'
FOLDERS=`find /Volumes/disk\ 1/Pictures/Photos -type d | sed 's/ /\\ /g'`
for FOLDER in $FOLDERS; do
# mind you, this will only work with absolute pathnames
if [ -d $FOLDER ]; then
echo "$0: INFO: Processing" $FOLDER
cd $FOLDER
for FILE in `ls -1|grep -i .jpg`; do
echo "$0: INFO: Checking $FILE"
djpeg -outfile /dev/null $FILE
if [ $? -ne 0 ]; then
echo "$0: ERROR: $FOLDER/$FILE is unreadable as JPEG"
fi
done
if [ -e MD5SUMS ]; then
md5sum -b -c MD5SUMS 2>&1 > /dev/null
if [ $? -eq 1 ]; then
echo "$0: ERROR: in $FOLDER:"
md5sum -c MD5SUMS | grep FAILED 2>&1
fi
else
echo "$0: WARNING: no MD5SUMS in $FOLDER, creating..."
md5sum -b *.* > MD5SUMS
# The obvious bit, in retrospect
chown username:groupname MD5SUMS
fi
fi
done
Posted by Hans-Henry Jakobsen