我有一个包含数千个 div class =date / div ul … / ul的HTML文件代码块如下:!DOCTYPE htmlhtmlhead/headbodydiv class=dateWed May 23 2018/divulliDo laundryulliGet coins/li...

我有一个包含数千个< div class ='date'>< / div>< ul> …< / ul>的HTML文件代码块如下:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
</body>
</html>
每个< div>和相应的< ul>元素是针对特定日期的. < div class ='date'>< / div>< ul> …< / ul>的块按升序排序,即较新的日期位于文件的底部.我打算按降序排列它们,以便较新的日期位于文件的顶部,如下所示:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
我不确定什么是正确的工具,是shell脚本吗?是awk吗?是Python吗?还有什么其他可能更快更方便的?
解决方法:
扩展Python解决方案:
sort_html_by_date.py脚本:
from bs4 import BeautifulSoup
from datetime import datetime
with open('input.html') as html_doc: # replace with your actual html file name
soup = BeautifulSoup(html_doc, 'lxml')
divs = {}
for div in soup.find_all('div', 'date'):
divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
str(div) + '\n' + div.find_next_sibling('ul').prettify()
soup.body.clear()
for el in sorted(divs, reverse=True):
soup.body.append(divs[el])
print(soup.prettify(formatter=None))
用法:
python sort_html_by_date.py
输出:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class="date">Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
二手模块:
beautifulsoup – https://www.crummy.com/software/BeautifulSoup/bs4/doc/
datetime – https://docs.python.org/3.3/library/datetime.html#module-datetime
本文标题为:shell-script – 用于反转HTML文件中数千个元素的排序顺序的正确工具


基础教程推荐
- Ajax 动态载入html页面后不能执行其中的js快速解决方法 2023-02-14
- HTML怎么设置下划线?html文字加下划线方法 2022-09-21
- Vue cli写的一款PC端音乐播放器(网易云的API) 2023-10-08
- Ajax实现无刷新分页实例代码 2023-01-31
- js实现数组的扁平化 2023-08-12
- AJAX+JSP实现读取XML内容并按排列显示输出的方法示例 2023-02-14
- 聊一聊Ajax的优缺点 2022-12-18
- Layui如何使用折叠表格,以及默认自动折叠 2022-10-20
- Ajax + PHP session制作购物车 2023-02-13
- 面试必备之ajax原始请求 2023-02-23