如何获取 python 中两个 xml 标签之间的全部内容?-尊龙凯时

尊龙凯时

用户注册

投稿取消

文章分类:

还能输入300字

上传中....

问题描述

我尝试获取打开的 xml 标记和它的关闭对应项之间的全部内容.

i try to get the whole content between an opening xml tag and it's closing counterpart.

像下面的 title 这样直接获取内容很容易，但是如果 mixed-content 被使用，我想保留内部标签?

getting the content in straight cases like title below is easy, but how can i get the whole content between the tags if mixed-content is used and i want to preserve the inner tags?



  some testing stuff
  some text with data in it.
  it spans multiple lines: one, two 
  or more.

我想要的是两个text标签之间的内容，包括任何标签:some text with data在里面.它跨越多行:one、two或更多.

现在我使用正则表达式，但它有点乱，我不喜欢这种方法.我倾向于基于 xml 解析器的尊龙凯时的解决方案.我查看了 minidom、etree、lxml 和 beautifulsoup，但找不到适合这种情况的尊龙凯时的解决方案(整个内容，包括内部标签).

for now i use regular expressions but it get's kinda messy and i don't like this approach. i lean towards a xml parser based solution. i looked over minidom, etree, lxml and beautifulsoup but couldn't find a solution for this case (whole content, including inner tags).

推荐答案

from lxml import etree
t = etree.xml(
"""

  some testing stuff
  some text with data in it.
"""
)
(t.text   ''.join(map(etree.tostring, t))).strip()

这里的诀窍是 t 是可迭代的，并且在迭代时会产生所有子节点.因为etree避免了文本节点，所以还需要恢复第一个子标签之前的文本，用t.text.

the trick here is that t is iterable, and when iterated, yields all child nodes. because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.

in [50]: (t.text   ''.join(map(etree.tostring, t))).strip()
out[50]: 'some testing stuff
  some text with data in it.'

或者:

in [6]: e = t.xpath('//text')[0]
in [7]: (e.text   ''.join(map(etree.tostring, e))).strip()
out[7]: 'some text with data in it.'