How to fix the SAF-T UTF-8 encoding error in Portugal
Why AT silently rejects SAF-T XML files declared as UTF-8, and the one-line fix that makes them go through.
The error you can't see
You upload your monthly SAF-T XML to AT, the portal flashes "ficheiro inválido", and there is nothing else. No line number. No reason. The standard advice is to re-export from your ERP and resubmit. You do. It still fails.
If your ERP is Xero, QuickBooks, Datev, or a generic SAP export, this is almost certainly the same bug: your file is encoded in UTF-8, but AT only accepts Windows-1252.
Why AT silently rejects UTF-8
The AT SAF-T (PT) specification, current schema 1.04_01, mandates Windows-1252 (CP-1252) as the file encoding. The AT validator does not return a useful error when the encoding is wrong — the file is simply rejected at intake. The XML is otherwise well-formed; the schema checks pass. But the byte-level encoding of accented characters (ç, ã, é, ó) does not match what AT's downstream tooling expects.
Foreign ERPs default to UTF-8 because UTF-8 is the modern standard everywhere except this specific Portuguese tax pipeline. They write the XML prolog like this:
<?xml version="1.0" encoding="UTF-8"?>
AT expects:
<?xml version="1.0" encoding="Windows-1252"?>
Why declaration-only edits don't work
The first instinct is to open the file and replace UTF-8 with Windows-1252 in the prolog, save it, and resubmit. That makes things worse. The declaration now claims Windows-1252 but the actual bytes are still UTF-8. Any character outside ASCII renders as garbage when AT or anyone else reads the file with the declared encoding.
The fix that actually works
You need to do two things in the right order:
- Transcode the body bytes from UTF-8 to Windows-1252. Each Á (0xC3 0x81 in UTF-8) becomes a single byte 0xC1. Each ç (0xC3 0xA7) becomes 0xE7. And so on.
- Rewrite the encoding declaration to match:
encoding="Windows-1252".
If you skip step 1, you get garbage. If you skip step 2, AT still rejects.
Doing it in PHP
$xml = file_get_contents('saft.xml');
$converted = mb_convert_encoding($xml, 'Windows-1252', 'UTF-8');
$converted = preg_replace(
'/(<\?xml[^>]*encoding=)["\'][^"\']+["\']/i',
'$1"Windows-1252"',
$converted,
1,
);
file_put_contents('saft.fixed.xml', $converted);
Doing it with one click
SAFTCheck does exactly this transformation, plus it strips any UTF-8 BOM bytes (another silent rejection cause), validates the fixed file against the AT 1.04_01 schema, and gives you a downloadable clean XML. Free for one fix; €7 for the priced one with full report and PDF.
How to confirm the fix took
After conversion, run file saft.fixed.xml in a Unix terminal — it should report "Non-ISO extended-ASCII text" or "ISO-8859 text", not "UTF-8 Unicode". Open the file in a hex editor: any accented Portuguese character should be a single byte in the 0xA0–0xFF range, not a two-byte 0xC2/0xC3 prefix sequence.
Submit the file to AT again. The "ficheiro inválido" should disappear. If it doesn't, the encoding wasn't the only problem — but it usually is.
Related rules SAFTCheck catches at the same time
- BOM at file start. AT rejects UTF-8 BOM (0xEF 0xBB 0xBF) and UTF-16 BOMs.
- NIF check digit. Mod-11 algorithm; one wrong digit and the file is invalid.
- ATCUD format. Mandatory since 2023, must match
^[A-Z0-9]{8}-[1-9]\d{0,9}$. - Header dates. StartDate before EndDate, FiscalYear matches StartDate's year, EndDate not in the future.