问题描述
我尝试获取打开的 xml 标记和它的关闭对应项之间的全部内容.
i try to get the whole content between an opening xml tag and it's closing counterpart.
像下面的 title 这样直接获取内容很容易,但是如果 mixed-content 被使用,我想保留内部标签?
getting the content in straight cases like title below is easy, but how can i get the whole content between the tags if mixed-content is used and i want to preserve the inner tags?
some testing stuff some text with data in it. it spansmultiple lines: .one ,two or more
我想要的是两个text标签之间的内容,包括任何标签:some text with
现在我使用正则表达式,但它有点乱,我不喜欢这种方法.我倾向于基于 xml 解析器的尊龙凯时的解决方案.我查看了 minidom、etree、lxml 和 beautifulsoup,但找不到适合这种情况的尊龙凯时的解决方案(整个内容,包括内部标签).
for now i use regular expressions but it get's kinda messy and i don't like this approach. i lean towards a xml parser based solution. i looked over minidom, etree, lxml and beautifulsoup but couldn't find a solution for this case (whole content, including inner tags).
推荐答案
from lxml import etree t = etree.xml( """""" ) (t.text ''.join(map(etree.tostring, t))).strip() some testing stuff some text with data in it.
这里的诀窍是 t 是可迭代的,并且在迭代时会产生所有子节点.因为etree避免了文本节点,所以还需要恢复第一个子标签之前的文本,用t.text.
the trick here is that t is iterable, and when iterated, yields all child nodes. because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.
in [50]: (t.text ''.join(map(etree.tostring, t))).strip() out[50]: 'some testing stuff some text with 'data in it.
或者:
in [6]: e = t.xpath('//text')[0] in [7]: (e.text ''.join(map(etree.tostring, e))).strip() out[7]: 'some text withdata in it.'