Skip to main content

Read a PDF file and write the content of PDF into text file


Program to read a PDF file and write the content of PDF into text file using itext 5.3.5 library. Here Each page of a PDF is written to a separate text file. Ex. First page of PDF is written to first txt file. Second page of a PDF is written to second text file and so on. First you need to download the library then import it into your project. Here is the source code. Enjoy programming!!!

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

/**
 * This class is used to read an existing
 *  pdf file using iText jar.
 * @author javawithease
 */
public class PDFReadExample {
    public static void main(String args[]){
       
        BufferedWriter bw = null;
        FileWriter fw = null;
        try {
            //Create PdfReader instance.
            PdfReader pdfReader = new PdfReader("test.pdf");   
           
            //Get the number of pages in pdf.
            int pages = pdfReader.getNumberOfPages();
           
            //Iterate the pdf through pages.
            for(int i=1; i<=pages; i++) {
                //Extract the page content using PdfTextExtractor.
                String pageContent =
                    PdfTextExtractor.getTextFromPage(pdfReader, i);
               
                //Print the page content on console.
                System.out.println("Content on Page "
                              + i + ": " + pageContent);
               
                File file = new File("newfile"+i+".txt");

                  if (file.createNewFile()){
                    System.out.println("File is created!");
                  }else{
                    System.out.println("File already exists.");
                  }
                // creates a FileWriter Object
                  FileWriter writer = new FileWriter(file);
                 
                  // Writes the content to the file
                  writer.write(pageContent);
                  writer.flush();
                
            }
           
                
            //Close the PdfReader.
            pdfReader.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
       
    }
}

Comments

Popular posts from this blog

Add, remove, search an item in listview in C#

Below is the C# code which will help you to add, remove and search operations on listview control in C#. Below is the design view of the project: Below is the source code of the project: using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; namespace Treeview_control_demo {     public partial class Form2 : Form     {         public Form2()         {             InitializeComponent();             listView1.View = View.Details;                   }         private void button1_Click(object sender, EventArgs e)         {             if (textBox1.Text.Trim().Length == 0)...

display files and directories in Listview

Below is the C# code which displays all files and directories in listview control with their file size and creation date. If it is file then it also displays the extension of the file e.g. .txt, .jpg etc Below is the design view of the project: Listview to display files and directories with size and date created Below is the source code of the project: using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.IO; namespace search_in_listview {     public partial class Form1 : Form     {         public Form1()         {             InitializeComponent();                   }         private void button1_Click(object sender, EventArgs ...

Add worklog in Jira using Python

 Below is the Python code to add the worklog in Jira. You need to install a request library for this. Here is the code: import requests from requests.auth import HTTPBasicAuth import json url = "https://your jira address here/rest/api/2/issue/ticket_number/worklog" auth = HTTPBasicAuth("username", "jira access token") headers = {     "Accept": "application/json",     "Content-Type": "application/json" } payload = json.dumps({     "comment": {         "content": [             {                 "content": [                     {                         "text": "This is for QA Testing",                         "type": "text"                     } ...