# External Syntax Corpus

This corpus normalizes the imported UD Arabic datasets into TSV files for syntax experiments.

## Inputs

- UD Arabic-PADT
- UD Arabic-PUD

## Outputs

- Tokens: `external_syntax_tokens.tsv`
- Sentences: `external_syntax_sentences.tsv`
- Summary: `external_syntax_summary.json`

## Notes

- Tokens preserve UD fields: lemma, UPOS, XPOS, FEATS, HEAD, DEPREL, DEPS, MISC.
- Sentence IDs are stable and encode dataset and split.
- PUD stays as held-out test material.
- PADT provides train/dev/test material for syntax experiments.