18  Matching HTML Tags

In this chapter we review and example that deals with some basic handling of HTML tags. The data for this practical application is the webpage for the R mailing lists: http://www.r-project.org/mail.html (see screenshot below)


If you visit the previous webpage you will see that there are five general mailing lists devoted to R:

Additionally, there are several specific Special Interest Group (SIG) mailing lists. Here’s a screenshot with some of the special groups:

18.1 Attributes href

As a simple example, suppose we wanted to get the href attributes of all the SIG links. For instance, the href attribute of the R-SIG-Mac link is:

https://stat.ethz.ch/mailman/listinfo/r-sig-mac

In turn the href attribute of the R-sig-DB link is:

https://stat.ethz.ch/mailman/listinfo/r-sig-db

If we take a peek at the html source-code of the webpage, we’ll see that all the links can be found on lines like this one (in just one line of code):

"<li><p><a href=\"https://stat.ethz.ch/mailman/listinfo/r-sig-mac\">
<code>R-SIG-Mac</code></a>: R Special Interest Group on Mac ports of R</p></li>"