18 Matching HTML Tags

18.1 Introduction

In this example we will deal with some basic handling of HTML tags. The data for this practical application is the webpage for the R mailing lists: http://www.r-project.org/mail.html (see screenshot below)

If you visit the previous webpage you will see that there are five general mailing lists devoted to R:

  • R-announce is where major announcements about the development of R and the availability of new code.
  • R-help is the main R mailing list for discussion about problems and solutions using R.
  • R-package-devel is to get help about package development in R
  • R-devel is a list intended for questions and discussion about code development in R.
  • R-packages is a list of announcements on the availability of new or enhanced contributed packages.

Additionally, there are several specific (SIG) mailing lists. Here’s a screenshot with some of the special groups:

18.2 Attributes href

As a simple example, suppose we wanted to get the href attributes of all the SIG links. For instance, the href attribute of the R-SIG-Mac link is: https://stat.ethz.ch/mailman/listinfo/r-sig-mac

In turn the href attribute of the R-sig-DB link is: https://stat.ethz.ch/mailman/listinfo/r-sig-db

If we take a peek at the html source-code of the webpage, we’ll see that all the links can be found on lines like this one:

"<li><p><a href=\"https://stat.ethz.ch/mailman/listinfo/r-sig-mac\"><code>R-SIG-Mac</code></a>: R Special Interest Group on Mac ports of R</p></li>"