.NET code to extract data from web page

Here’s an example code in C# to extract data from a Wikipedia web page using the HtmlAgilityPack library

using HtmlAgilityPack;
using System;
using System.Net;

// Define the URL of the Wikipedia page
string url = "https://en.wikipedia.org/wiki/Artificial_intelligence";

// Create an instance of the WebClient class to download the page
WebClient webClient = new WebClient();
string htmlString = webClient.DownloadString(url);

// Create an instance of the HtmlDocument class to parse the HTML
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlString);

// Define a string to store the extracted text
string extractedText = "";

// Loop through each paragraph element in the HTML document
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//p"))
{
    // Extract the text from the current paragraph
    extractedText += node.InnerText;
}

// Display the extracted text
Console.WriteLine(extractedText);

This code uses the WebClient class to download the HTML content of the Wikipedia page, and the HtmlDocument class from the HtmlAgilityPack library to parse the HTML. The code then loops through each paragraph element (<p>) in the HTML document and uses the InnerText property to extract the text from each paragraph. The extracted text is stored in a string variable and then displayed using the Console.WriteLine() method.

Pamai Tech
Turning ideas into Reality

Products

Office Add-in

Enterprise Solutions

Cloud Consulting

UI UX Design

Data Transformation

Services

FAQ's

Privacy Policy

Terms & Condition

Team

Contact Us

Company

About Us

Services

Features

Our Pricing

Latest News

© 2023 Pamai Tech