wIndows phone 7 解析Html数据

来源：开发者投稿于 2019-03-28 被查看 33307 次评论：273

wIndows phone 7 解析Html数据

在我的上一篇文章中我介绍了windows phone 7的gb2312解码,

http://www.2cto.com/kf/201111/112551.html

解决了下载的Html乱码问题,这一篇,我将介绍关于windows phone 7解析html数据，以便我们获得想要的数据.

这里,我先介绍一个类库HtmlAgilityPack,（上一篇文章也是通过这个工具来解码的）. 类库的dll文件我会随demo一起提供

这里,我以新浪新闻为例来解析数据

先看看网页版的新浪新闻

http://news.sina.com.cn/w/sd/2011-11-27/070023531646.shtml

然后我们看一下他的源文件，

发现新闻内容的结构是

view sourceprint?<div class="blkContainerSblk">

<h1 id="artibodyTitle" pid="1" tid="1" did="23531646" fid="1666">title</h1>

<div class="artInfo"><span id="art_source"><a href="http://www.sina.com.cn">http://www.sina.com.cn</a></span> <span id="pub_date">pub_date</span> <span id="media_name"><a href="">media_name</a> <a href=""></a> </span></div>

</div>

大部分还有ID属性,这更适合我们去解析了。

接下来我们开始去解析

第一：引用HtmlAgilityPack.dll文件

第二：用WebClient或者WebRequest类来下载HTML页面然后处理成字符串。

view sourceprint?public delegate void CallbackEvent(object sender, DownloadEventArgs e);

public event CallbackEvent DownloadCallbackEvent;

public void HttpWebRequestDownloadGet(string url)

{

Thread _thread = new Thread(delegate()

{

Uri _uri = new Uri(url, UriKind.RelativeOrAbsolute);

HttpWebRequest _httpWebRequest = (HttpWebRequest)WebRequest.Create(_uri);

_httpWebRequest.Method="Get";

_httpWebRequest.BeginGetResponse(new AsyncCallback(delegate(IAsyncResult result)

{

HttpWebRequest _httpWebRequestCallback = (HttpWebRequest)result.AsyncState;

HttpWebResponse _httpWebResponseCallback = (HttpWebResponse)_httpWebRequestCallback.EndGetResponse(result);

Stream _streamCallback = _httpWebResponseCallback.GetResponseStream();

StreamReader _streamReader = new StreamReader(_streamCallback,new HtmlAgilityPack.Gb2312Encoding());

string _stringCallback = _streamReader.ReadToEnd();

Deployment.Current.Dispatcher.BeginInvoke(new Action(() =>

{

if (DownloadCallbackEvent != null)

{

DownloadEventArgs _downloadEventArgs = new DownloadEventArgs();

_downloadEventArgs._DownloadStream = _streamCallback;

_downloadEventArgs._DownloadString = _stringCallback;

DownloadCallbackEvent(this, _downloadEventArgs);

}

}));

}), _httpWebRequest);

}) ;

_thread.Start();

}

// }

O(∩_∩)O! 我这个比较复杂, 总之我们下载了html的数据就行了。

贴一个简单的下载方式吧

view sourceprint?WebClient webClenet=new WebClient();

webClenet.Encoding = new HtmlAgilityPack.Gb2312Encoding(); //加入这句设定编码

webClenet.DownloadStringAsync(new Uri("http://news.sina.com.cn/s/2011-11-25/120923524756.shtml", UriKind.RelativeOrAbsolute));

webClenet.DownloadStringCompleted += new DownloadStringCompletedEventHandler(webClenet_DownloadStringCompleted);

现在处理回调函数的e.Result

view sourceprint?string _result = e._DownloadString;

HtmlDocument _doc = new HtmlDocument(); //实例化HtmlAgilityPack.HtmlDocument对象

_doc.LoadHtml(_result); //载入HTML

HtmlNode _htmlNode01 = _doc.GetElementbyId("artibodyTitle"); //新闻标题的Div

string _title = _htmlNode01.InnerText;

HtmlNode _htmlNode02 = _doc.GetElementbyId("artibody"); //获取内容的div

string _content = _htmlNode02.InnerText;

// int _count= _htmlNode02.ChildNodes.Where(new Func<HtmlNode,bool>("div"));

int _divIndex = _content.IndexOf(" .blkComment");

_content= _content.Substring(0,_divIndex);

#region　新浪标签

HtmlNode _htmlNodo03 = _doc.GetElementbyId("art_source");

string _www = _htmlNodo03.FirstChild.InnerText;

string _wwwInt = _htmlNodo03.FirstChild.Attributes[0].Value;

#endregion

// string _source = _htmlNodo03;

//_htmlNodo03.ChildNodes

#region 发布时间

HtmlNode _htmlNodo04 = _doc.GetElementbyId("pub_date");

string _pub_date = _htmlNodo04.InnerText;

#endregion

#region 来源网站信息

HtmlNode _htmlNodo05 = _doc.GetElementbyId("media_name");

string _media_name = _htmlNodo05.FirstChild.InnerText;

string _modia_source = _htmlNodo05.FirstChild.Attributes[0].Value;

#endregion

Media_nameHyperlinkButton.Content = _pub_date + " " + _media_name;

Media_nameHyperlinkButton.NavigateUri = new Uri(_modia_source, UriKind.RelativeOrAbsolute);

TitleTextBlock.Text = _title;

ContentTextBlock.Text = _content;

结果如下图所示：

网页的大部分标签是没有ID属性的,不过幸运的是HtmlAgilityPack支持XPath

那就需要通过XPATH语言来查找匹配所需节点

XPath教程：http://www.w3school.com.cn/xpath/index.asp

案例下载：

http://115.com/file/dn87dl2d#

MyFramework_Test.zip

作者青瓷

暂无相关文章

wIndows phone 7 解析Html数据

wIndows phone 7 解析Html数据

相关文章

相关阅读

用户评论