Scraping html tables into R data frames using the XML package(使用 XML 包将 html 表抓取到 R 数据帧中)
问题描述
How do I scrape html tables using the XML package?
Take, for example, this wikipedia page on the Brazilian soccer team. I would like to read it in R and get the "list of all matches Brazil have played against FIFA recognised teams" table as a data.frame. How can I do this?
…or a shorter try:
library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("https://en.wikipedia.org/wiki/Brazil_national_football_team",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
the picked table is the longest one on the page
tables[[which.max(n.rows)]]
这篇关于使用 XML 包将 html 表抓取到 R 数据帧中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 XML 包将 html 表抓取到 R 数据帧中


基础教程推荐
- Bootstrap 模态出现在背景下 2022-01-01
- 检查 HTML5 拖放文件类型 2022-01-01
- fetch 是否支持原生多文件上传? 2022-01-01
- 原生拖动事件后如何获取 mouseup 事件? 2022-01-01
- npm start 错误与 create-react-app 2022-01-01
- Bokeh Div文本对齐 2022-01-01
- 即使用户允许,Gmail 也会隐藏外部电子邮件图片 2022-01-01
- 在 contenteditable 中精确拖放 2022-01-01
- 如何添加到目前为止的天数? 2022-01-01
- Fabric JS绘制具有活动形状的多边形 2022-01-01