Showing posts with label DOCX. Show all posts
Showing posts with label DOCX. Show all posts

Sunday, July 12, 2020

Reading Office documents, Word (.docx) etc. in C#

If you have a requirement of manipulating office documents programmatically in a platform independent way, i.e. with out office interop assemblies, NPOI in one of the best choices available.

  


NPOI is .NET avatar of ever popular Apache POI project. It continues to be open source library for manipulating any Office file formats. If you have ever used Apache POI's Java implementation, NPOI would be straight forward.

For example if you want to print the entire text in a docx file, below is a such an example function :


        public static void PrintAllText(string path)
        {

            Stream is1 = new System.IO.FileStream(path, FileMode.Open, FileAccess.Read);
            XWPFDocument doc = new XWPFDocument(is1);
            int i = 0;

            while (i < doc.BodyElements.Count)
            {
                var runText = (doc.BodyElements[i] as XWPFParagraph).Text;
                Console.WriteLine(runText);
                i++;
            }

        }

Another example if you want to read a word and get all the images from a .docx file, below is one such sample

        public static void ReadImages(string filePath)
       {
            string path = filePath;
            Stream stream = new System.IO.FileStream(path, FileMode.Open, FileAccess.Read);
            XWPFDocument doc = new XWPFDocument(stream);
            int i = 0;
            var list = doc.AllPictures;
            while (i < list.Count)
            {
                XWPFPictureData data = list[i];
                Console.WriteLine("Name" + data.FileName);
                Console.WriteLine("Details" + data.ToString());
                i++;
            }
       }


Here are some of the important links for NPOI

  1. NPOI Home Page
  2. Git Page

Merging and Splitting PDF files

We all use and rely on PDF's. There are occasions though when you want to edit certain portions of a pdf and merge the edited version ba...