Here’s an example code in C# to extract data from a Wikipedia web page using the HtmlAgilityPack
library
using HtmlAgilityPack; using System; using System.Net; // Define the URL of the Wikipedia page string url = "https://en.wikipedia.org/wiki/Artificial_intelligence"; // Create an instance of the WebClient class to download the page WebClient webClient = new WebClient(); string htmlString = webClient.DownloadString(url); // Create an instance of the HtmlDocument class to parse the HTML HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(htmlString); // Define a string to store the extracted text string extractedText = ""; // Loop through each paragraph element in the HTML document foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//p")) { // Extract the text from the current paragraph extractedText += node.InnerText; } // Display the extracted text Console.WriteLine(extractedText);
This code uses the WebClient
class to download the HTML content of the Wikipedia page, and the HtmlDocument
class from the HtmlAgilityPack
library to parse the HTML. The code then loops through each paragraph element (<p>
) in the HTML document and uses the InnerText
property to extract the text from each paragraph. The extracted text is stored in a string variable and then displayed using the Console.WriteLine()
method.