Scraping web data

Lab 3E

Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.

Space, Click, Right Arrow or swipe left to move to the next slide.

The web as a data source

Our first web scraper

https://ids-labs.idsucla.org/extras/webdata/mountains.html

HTML

<TABLE>
  <TR>
    <TH>peak</TH>
    <TH>range</TH>
    <TH>state</TH>
    <TH>long</TH>
    <TH>lat</TH>
    <TH>elev_ft</TH>
    <TH>elev_m</TH>
    <TH>prominence_ft</TH>
    <TH>prominence_m</TH>
    <TH>rank</TH>
  </TR>
  <TR>
    <TD>Denali (Mount McKinley)</TD>
    <TD>Alaska Range</TD>
    <TD>Alaska</TD>
    <TD>-151.0063</TD>
    <TD>63.0690</TD>
    <TD>20236</TD>
    <TD>6168</TD>
    <TD>20174</TD>
    <TD>6149</TD>
    <TD>1</TD>
  </TR>
</TABLE>

Get to scraping!

tables <- readHTMLTable(____)

Find our data

Saving tables

Check, save and use!

save(____, file = "____.Rda")