Guide
How to extract text from a PDF in C#
Use ExtractTextAsync() for the whole document, ExtractTextAsync(pageNumber) for a
single page, or the overloads with TextExtractionOptions when you want plain text output.
The main decision is whether you want extracted segments back or one plain text string. ZingPDF supports both shapes.
Extract text from the whole document
If you want the extracted text segments from every page, call the parameterless overload.
using ZingPDF;
using var pdf = Pdf.Load(File.OpenRead("input.pdf"));
var items = await pdf.ExtractTextAsync();
foreach (var item in items)
{
Console.WriteLine(item.Text);
}
Extract text from one page
If you only need one page, pass the 1-based page number.
using var pdf = Pdf.Load(File.OpenRead("input.pdf"));
var firstPageItems = await pdf.ExtractTextAsync(1);
var firstPageText = string.Join("\n", firstPageItems.Select(x => x.Text));
That keeps the read path focused on the page you care about instead of iterating through the whole file.
Get plain text output
If you want one plain text string instead of extracted segments, use
TextExtractionOptions and request PlainText.
using ZingPDF;
using ZingPDF.Elements.Drawing.Text.Extraction;
using var pdf = Pdf.Load(File.OpenRead("input.pdf"));
var result = await pdf.ExtractTextAsync(1, new TextExtractionOptions
{
OutputKind = TextExtractionOutputKind.PlainText
});
Console.WriteLine(result.PlainText);
Choose the output shape up front
If your next step is indexing, search, or full-text comparison, plain text is usually the easier shape to work with.
If you need to keep the extracted pieces separate or inspect them one by one, use the segment-returning overloads instead.
Need the file metadata as well?
Text extraction and metadata updates often sit in the same import or archival path.