Fishpond 604x90

Thursday, December 15, 2005

Some tips on using regular expressions to download files

It is quite common that sometimes we want to download a bunch of files listed on a web page. Of course you can use your mouse to right click the link then save it to your hard disk. However, if there are so many links, it would be quite tedious to do so. For a lazy man like me, I would do the following things:

  1. Save the source code of the web page
  2. Use perl to filter out the relative links like this: perl -ne '@files = split /target=/; foreach(@files){ if (/(http*.mp3)/){ print $1,"\n" }' source_code > abc
  3. Use wget to download the files: wget -i abc
In the example, I have assumed we are going to download mp3 files, and they are seperated by 'target=' in the code. We first split the line into a list so that each element has only one string containing mp3, then we use regular expressions to seperate it out. In this way, we can download as many files as we like.

No comments: