Selective Site Mirroring with wget
Since Cloudera doesn’t seem to support an rsync server for us to mirror against, I had to resort to using wget to mirror their CDH distribution. To save some time for those who are attempting to do something similar, here’s a wget one-liner to grab only the RPM’s (while maintaining the directory structure):
wget -N -r -nH -np --cut-dirs=3 -A rpm,xml http://archive.cloudera.com/redhat/cdh/3/
You may also want to look into reposync, part of the package yum-utils:
reposync -c http://archive.cloudera.com/redhat/cdh/cloudera-cdh3.repo –source -r cloudera-cdh3
That would have worked as well. I’ve had problems with reposync in the past, so I completed discounted it, but it would certainly do the trick in this situation!