##### You said:
I've got paperless ngx, but are there any documents that I need to legally physically keep? Are there any documents that would require to be physical other than things like receipts?
###### ChatGPT said:
Great question—and it's a smart move to use Paperless-ngx to digitize and organize your documents. However, **yes**, there are some documents you **should or must keep in physical form** depending on legal, governmental, or institutional requirements.
Here’s a breakdown of **documents you should keep physically** even if digitized:
---
### ✅ **Must Keep Physical (Legal/Official Originals)**
These are documents where the physical original may be required for legal purposes, identity verification, or critical processes:
| Document Type | Why Keep Physically |
| --- | --- |
| **Birth certificate** | Often needed for passports, ID, or government processes |
| **Social Security card (U.S.)** | Identity and tax purposes |
| **Marriage/divorce certificates** | Required for legal name changes, benefits |
| **Passports** | Official travel document |
| **Naturalization or immigration papers** | Immigration status proof |
| **Vehicle titles** | Required for selling or transferring ownership |
| **Property deeds** | Legal ownership proof |
| **Wills and trusts** | Often need original for probate |
| **Military discharge papers (e.g., DD-214)** | Benefits, veteran status proof |
| **Court judgments or legal settlements** | Sometimes required in original for enforcement |
| **Adoption papers** | Identity/legal proof |
| **Death certificates (of relatives)** | Settling estates, insurance, benefits |
| **Power of attorney documents** | Some institutions require the original |
---
### ⚠️ **Strongly Recommended to Keep (Backed Digitally)**
These might not *always* be required in physical form, but the original often speeds up or secures important processes:
| Document Type | Notes |
| --- | --- |
| **Loan agreements (e.g., mortgages, personal loans)** | Original signatures often preferred in disputes |
| **Stock/bond certificates (if physical)** | If not held electronically |
| **Insurance policies (home, life, etc.)** | Some older policies may only exist physically |
| **Receipts for major purchases (appliances, jewelry, electronics)** | Needed for warranty/insurance claims |
---
### ✅ **Can Usually Be Digital Only (Backups Recommended)**
| Document Type | Notes |
| --- | --- |
| Utility bills, bank statements, tax forms (if accepted digitally) | IRS and many institutions accept scanned versions if legible and authentic |
| Medical records | Can be scanned or pulled from provider portals |
| Pay stubs | Usually safe to store digitally unless needed for mortgage or other loan apps |
| General receipts | Often sufficient as scanned copies unless IRS audit requires originals for deductions (rare but possible) |
---
### 🛑 Receipts: When to Keep Originals
The **IRS recommends** keeping original **receipts for tax deductions** for up to **3–7 years** in case of an audit. However, **scans are usually sufficient** if they're clear and show all details, unless the IRS questions authenticity.
---
### 🔐 Tips for Long-Term Safety
- Store originals in a **fireproof, waterproof safe**.
- Keep **digital backups encrypted** and **stored off-site** (e.g., cloud and offline).
- Consider notarizing key documents where possible to preserve legal weight of digital versions.
---
Would you like a checklist PDF of these document types for your Paperless-ngx folders?
##### You said:
How do I retry the OCR content part on a document? It kind of sucked on one of my handwritten ones
###### ChatGPT said:
To retry the **OCR (Optical Character Recognition)** on a document in **Paperless-ngx**, especially if the original attempt didn't yield good results (e.g., for handwriting), you can re-run it manually using the following steps:
---
## 🧾 Step-by-Step: Retry OCR on a Document in Paperless-ngx
### Option 1: **Via the Web Interface (Re-consume)**
Paperless-ngx allows you to re-process a document by treating it as if it were just imported:
1. **Download the PDF** of the poorly OCR’d document.
2. **Delete the document** from Paperless-ngx (optional, if you don’t want duplicates).
3. **Drop the downloaded PDF into your `consume` folder** again.
- Located at something like: `/path/to/paperless/consume/`
4. Let Paperless-ngx re-process it.
- You can improve results by:
- Naming the file smartly (e.g., include keywords or dates in the filename).
- Enhancing the PDF using a tool like [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) beforehand (see below).
---
### Option 2: **Using `document_retag.py` (Advanced CLI Method)**
If you don’t want to reimport, you can **force OCR retry** via command line.
If you have access to the container or backend:
```
docker exec -it paperless bash
# Inside the container:
python3 manage.py document_retag <document_id>
```
> Replace `<document_id>` with the ID number of the document. You can find this in the URL when viewing the document in the web interface.
---
### Option 3: **Use OCRmyPDF Before Reimporting**
Paperless-ngx uses OCRmyPDF under the hood. You can preprocess it yourself for better results, especially for handwritten text:
```
ocrmypdf --rotate-pages --deskew --clean --optimize 3 input.pdf output.pdf
```
Then re-import `output.pdf` into Paperless-ngx.
- You can add options like `--tesseract-config` for better recognition.
- If you want to try a **handwriting-friendly OCR model**, you can experiment with:
- `--ocr-engine-mode 1` (for Tesseract legacy engine).
- Or external tools like Transkribus (more complex, academic-level).
---
## 🛠 Tips for Better Handwriting OCR
- Increase contrast before importing.
- Use a scanner instead of a phone, if possible.
- Try writing more clearly with printed (not cursive) letters.
- Tesseract OCR (used by Paperless) struggles with messy handwriting.
---
Want me to help generate a shell script to retry OCR or enhance images automatically before Paperless imports them?