13 Matching HTML Tags
In this example we will deal with some basic handling of HTML tags. The data for this practical application is the webpage for the R mailing lists: http://www.r-project.org/mail.html (see screenshot below)
If you visit the previous webpage you will see that there are five general mailing lists devoted to R:
- R-announce is where major announcements about the development of R and the availability of new code.
- R-help is the main R mailing list for discussion about problems and solutions using R.
- R-package-devel is to get help about package development in R
- R-devel is a list intended for questions and discussion about code development in R.
- R-packages is a list of announcements on the availability of new or enhanced contributed packages.
Additionally, there are several specific (SIG) mailing lists. Here’s a screenshot with some of the special groups:
As a simple example, suppose we wanted to get the
href attributes of all the SIG links. For instance, the
href attribute of the R-SIG-Mac link is:
In turn the
href attribute of the R-sig-DB link is:
If we take a peek at the html source-code of the webpage, we’ll see that all the links can be found on lines like this one:
"<li><p><a href=\"https://stat.ethz.ch/mailman/listinfo/r-sig-mac\"><code>R-SIG-Mac</code></a>: R Special Interest Group on Mac ports of R</p></li>"