In this article we want to talk about Python-Docx: Creating and Manipulating Microsoft Word Documents with Python.
What is Python-Docx ?
Python-Docx is Python library that allows you to create, modify and extract content from Microsoft Word documents. With Python-Docx you can automate the creation of professional quality reports, resumes, invoices and other types of documents in a programmatic way.
In this article we are going to cover the basics of using Python-Docx, including how to create a new Word document, add content to it and modify its formatting.
Installing Python-Docx
To install Python-Docx, you can use pip which is package manager for Python. Open up your terminal or command prompt and run the following command:
1 |
pip install python-docx |
This will download and install the Python-Docx library and its dependencies. after the installation is completed you can start using the library in your Python code.
Creating a New Word Document
For creating new Word document using Python-Docx, you can create an instance of Document
class:
1 2 3 |
from docx import Document document = Document() |
This will create a new, empty Word document. You can now add content to the document by calling the various methods of the Document
class.
Adding Content to a Document
Python-Docx allows you to add alot of of content to your Word documents, including paragraphs, headings, tables, images and many more. This is an example of how to add paragraph to a document:
1 2 3 4 |
from docx import Document document = Document() paragraph = document.add_paragraph('This is a paragraph.') |
This will add new paragraph to the document with the text “This is a paragraph.”
You can also add headings to your document using the add_heading
method:
1 2 3 4 |
from docx import Document document = Document() heading = document.add_heading('This is a heading', level=1) |
Modifying Formatting
Python-Docx allows you to modify the formatting of your document’s content in different ways. For example you can change the font, size and color of text, and you can apply styles to paragraphs and headings.
Here’s an example of how to change the font of a paragraph:
1 2 3 4 5 6 7 8 9 |
from docx import Document from docx.shared import Pt from docx.enum.text import WD_COLOR_INDEX document = Document() paragraph = document.add_paragraph('This is a paragraph.') paragraph.style.font.name = 'Times New Roman' paragraph.style.font.size = Pt(12) paragraph.style.font.color.rgb = WD_COLOR_INDEX.BLACK |
This will change the font of the paragraph to Times New Roman, set the font size to 12 points, and change the font color to black.
You can also apply styles to paragraphs and headings using the style
property:
1 2 3 4 5 |
from docx import Document document = Document() paragraph = document.add_paragraph('This is a paragraph.') paragraph.style = 'Heading 1' |
This will change the style of the paragraph to “Heading 1.”
Saving and Opening a Document
After you have finished creating and modifying your Word document, you can save it to file using the save
method:
1 2 3 4 5 |
from docx import Document document = Document() paragraph = document.add_paragraph('This is a paragraph.') document.save('example.docx') |
This will save the document to a file called example.docx
in the current working directory.
What are Other Options Instead of Python-Docx
If you want another Python library for working with Microsoft Word documents, these are some options:
- python-docx2txt: It is library for extracting plain text from Microsoft Word
.docx
files. It’s lightweight library that can be used to quickly extract text from Word documents. - PyWin32: It is Python extension for Windows that provides access to Microsoft Win32 API, and it can be used to automate Microsoft Word and other Office applications, but it requires more knowledge of the Win32 API and it is less user friendly than Python-Docx.
- XlsxWriter: It is library for creating Excel files in Python. While it doesn’t work with Word documents, it can be used to create spreadsheets with similar functionality to Word tables.
- Pandoc: It is command-line tool that can be used to convert between various document formats, including Microsoft Word. It can be use to convert Word documents to other formats such as Markdown or HTML.
- Unoconv: It is command-line tool that can be used to convert between various document formats, including Microsoft Word. It can be used to convert Word documents to other formats, such as PDF or HTML.
Each of these libraries has its own strengths and weaknesses, and the best option for you will depend on your specific use case.
Learn More on Python
- Python Requests Library: A Guide to Simplifying HTTP Requests
- Python and Microsoft Word: A Beginner’s Guide to Automating Documents
- How to Install docx2python: Python Library for Word Documents
- Merge Microsoft Word Documents with Python Docxcompose
- Asynchronous Web Development with Python and aiohttp
- Python Treq: An Introduction to a Powerful HTTP Client Library
- Introduction to Python httplib2 Library
- An Introduction to Python’s urllib Library
- Python httpx: A High-Performance HTTP Client for Python 3
Final Thoughts
Python-Docx is powerful and flexible library for working with Microsoft Word documents in Python. With Python-Docx you can automate creation and modification of documents, saving your time and allowing you to focus on other tasks.
In this article we have covered the basics of using Python-Docx, including how to create a new Word document, add content to it, modify its formatting and save and open it.
If you’re interested in learning more about Python-Docx you can check their official documentation, which provides detailed information on all the features and methods of the library. (Python-Docx: Creating and Manipulating Microsoft Word Documents with Python)