# Functional Syntax Publication Summary

This document condenses the current project state into a paper-facing summary.

## Core Claim

A hybrid Arabic functional syntax system combining rule-based structure with learned semantic, syntactic, and pragmatic labeling.

## Reference Layer

| Metric | Value |
|---|---:|
| Sentence category accuracy | 1.0 |
| Token accuracy | 1.0 |
| Slot accuracy | 1.0 |
| Semantic accuracy | 1.0 |
| Syntactic accuracy | 1.0 |
| Pragmatic accuracy | 1.0 |
| Structural accuracy | 1.0 |

## Generated Corpus

- Sentences: 9200
- Patterns: 23
- Final-category rate: 0.885
- Keep: 8401
- Review: 399
- Discard: 400

## Sentence-Level Baseline

- Train accuracy: 0.9999
- Dev accuracy: 0.9226
- Test accuracy: 0.9155
- Test macro F1: 0.916

## Token-Level Baseline

- Train semantic accuracy: 0.9971
- Dev semantic accuracy: 0.9192
- Test semantic accuracy: 0.9214
- Test syntactic accuracy: 0.9208
- Test pragmatic accuracy: 0.9469
- Test joint sequence accuracy: 0.7738

## Training Layout

- Train sentences: 6514
- Dev sentences: 814
- Test sentences: 814

## Publication Notes

- Book corpus is the reference layer.
- Generated corpus is weak-label augmentation.
- Structural layer remains rule-derived.
- Token model is trained on analyzer-derived weak supervision.
- Live unseen bank is frozen at 179 sentences with exact category match 1.0.

## Reproducibility

- `functional_syntax/scripts/build_generated_reference_train_ready.py`
- `functional_syntax/scripts/split_generated_reference_train_ready.py`
- `functional_syntax/scripts/build_generated_reference_token_corpus.py`
- `functional_syntax/scripts/train_generated_reference_classifier.py`
- `functional_syntax/scripts/train_generated_reference_token_bilstm_crf.py`

## Supporting Docs

- `functional_syntax/docs/paper_publication_bundle.md`
- `functional_syntax/docs/generated_reference_workflow.md`
- `functional_syntax/docs/generated_reference_token_bilstm_crf.md`
