Guide

How to extract text from a PDF in C#

Use ExtractTextAsync() for the whole document, ExtractTextAsync(pageNumber) for a single page, or the overloads with TextExtractionOptions when you want plain text output.

The main decision is whether you want extracted segments back or one plain text string. ZingPDF supports both shapes.

Extract text from the whole document

If you want the extracted text segments from every page, call the parameterless overload.

using ZingPDF;

using var pdf = Pdf.Load(File.OpenRead("input.pdf"));

var items = await pdf.ExtractTextAsync();

foreach (var item in items)
{
    Console.WriteLine(item.Text);
}

Extract text from one page

If you only need one page, pass the 1-based page number.

using var pdf = Pdf.Load(File.OpenRead("input.pdf"));

var firstPageItems = await pdf.ExtractTextAsync(1);
var firstPageText = string.Join("\n", firstPageItems.Select(x => x.Text));

That keeps the read path focused on the page you care about instead of iterating through the whole file.

Get plain text output

If you want one plain text string instead of extracted segments, use TextExtractionOptions and request PlainText.

using ZingPDF;
using ZingPDF.Elements.Drawing.Text.Extraction;

using var pdf = Pdf.Load(File.OpenRead("input.pdf"));

var result = await pdf.ExtractTextAsync(1, new TextExtractionOptions
{
    OutputKind = TextExtractionOutputKind.PlainText
});

Console.WriteLine(result.PlainText);

Choose the output shape up front

If your next step is indexing, search, or full-text comparison, plain text is usually the easier shape to work with.

If you need to keep the extracted pieces separate or inspect them one by one, use the segment-returning overloads instead.

Scanned PDFs need OCR instead

The extraction methods on Pdf read the PDF text layer. If the file is a scan and the page content is really just an image, use ZingPDF.OCR instead of expecting normal extraction to find text that is not there.

Read How to extract text from a scanned PDF in C#.

Need the file metadata as well?

Text extraction and metadata updates often sit in the same import or archival path.

Read How to update PDF metadata in C#.

Open docs View pricing