tag:blogger.com,1999:blog-21005140.post3700890055953586955..comments2020-05-12T20:12:40.903+01:00Comments on The Wallace Line: Pipelineschris wallacehttp://www.blogger.com/profile/12694488941042708861noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-21005140.post-24468715351996698722007-11-29T21:12:00.000+00:002007-11-29T21:12:00.000+00:00Hey Chris, "Although many of the steps in a Ya...Hey Chris,<BR/><BR/> "Although many of the steps in a Yahoo pipeline can be handled within a single XSLT script, some of the processing I want to demonstrate involves processing HTML pages which are not XHTML, so I needed a tidy service too, and to be able to pipeline them together."<BR/><BR/>So here's a fun one,<BR/><BR/><A HREF="http://personplacething.info/service/proxy/return-xml-from-html/?uri=http://www.xml.com//html:html/html:body//html:p[contains(.,'M.%20David%20Peterson')]" REL="nofollow">http://personplacething.info/service/proxy/return-xml-from-html/?uri=http://www.xml.com//html:html/html:body//html:p[contains(.,'M.%20David%20Peterson')]</A><BR/><BR/>Live dynamic searching of the (X)HTML web for pipelining into whatever you might want. This uses an XSLT 2.0 extension function written in C# that accesses an SgmlReader with the URI specified in the URI query string param and then returns the XPath specified at the end of the URI using // as the delimiter between the URI and the XPath expression (the second / represents the root of the document)<BR/><BR/>Code is @ http://nuxleus.com/dev/browser/trunk/nuxleus/Web/Development/transform/controller/proxy/base.xslt which is driven by http://nuxleus.com/dev/browser/trunk/nuxleus/Web/Development/service/proxy/return-xml-from-html/service.op<BR/><BR/>Too bad you're not using .NET! :D ;-) Of course this same thing could be replicated using a servlet and John Cowan's TagSoup HTML > XHTML processor.M. David Petersonhttps://www.blogger.com/profile/09927048385376889141noreply@blogger.com