Benchmark

مقارنة Tasreef مع corpus الكتاب المجمّع، bank unseen المجمد، والمرجعيات الخارجية الأقرب.

الواجهة publication learned baseline benchmark suite morphology comparison التقرير الحزمة الدفعة المستعرض قائمة الضعف

← التحليل الوظيفي

Corpus الكتاب

Exact

التغطية المرجعية مستقرة.

Live bank

179

Bank unseen مجمد مع exact category match 0.989.

Generated corpus

9200

المصدر الرئيسي لل weak supervision.

Weak categories

الفئات التي لا تزال أقل من 1.0 في واحد أو أكثر من المقاييس.

Morphology-first comparison

This block keeps the external comparison honest: CAMeL-style tools are useful for morphology / lemma / POS, but they do not cover Tasreef’s functional sentence roles or pragmatic placement.

Tasreef

Functional syntax

Book-grounded sentence roles, slot structure, pragmatic layer, structural layer, and a rule-first reference analyzer.

CAMeL-style tools

Morphology / lemma / POS

Strong for token-level morphology and surface analysis, especially closed-class and lemma reliability, but not a functional-syntax system.

UD Arabic-PADT

External syntax training

Useful syntax comparison target and training source for external generalization, but it is dependency-oriented rather than book-grounded role-oriented.

UD Arabic-PUD

External hold-out

Held-out comparison set for unseen syntax generalization and regression checking.

مقارنة benchmark

الهدف	الدور	الحالة	المقياس	ملاحظة
Grouped book corpus	Reference layer	Exact	1.0 across category/token/slot/semantic/syntactic/pragmatic/structural	Stable source of truth for the analyzer.
Frozen live unseen bank	Unseen regression set	Frozen	179 sentences; exact category match 0.989	Held out for regression, not training.
Generated reference corpus	Weak supervision source	Audited	9200 sentences; keep 8401; review 399; discard 400	Used to train sentence and token baselines.
Sentence-family baseline	Learned comparison	Trained	test accuracy 0.9155; macro F1 0.916	Measures weak-label learnability at sentence level.
Token-level BiLSTM-CRF	Learned comparison	Trained	semantic 0.9214; syntactic 0.9208; pragmatic 0.9469	Measures the token stack over weak supervision.
UD Arabic-PADT	External syntax comparison	External	Public dependency treebank for morphology and syntax comparison.	Benchmark target for transfer and external validation.
UD Arabic-PUD	External unseen comparison	External	Held-out-style Arabic dependency corpus for generalization comparison.	Useful for broader unseen-text sanity checks.
CAMeL-style morphology tools	Morphology-first baseline	External	Reference point for lemma / POS / morphology, not functional syntax.	Good comparison for the lemmatizer surface only.

التميّز الوظيفي

Functional sentence roles

Tasreef: Yes
UD-PADT: No
UD-PUD: No
CAMeL-style: Partial

Slot-based structure

Tasreef: Yes
UD-PADT: No
UD-PUD: No
CAMeL-style: No

Pragmatic layer

Tasreef: Yes
UD-PADT: No
UD-PUD: No
CAMeL-style: No

Book-grounded reference layer

Tasreef: Yes
UD-PADT: No
UD-PUD: No
CAMeL-style: No

Morphology / lemma focus

Tasreef: Yes
UD-PADT: Yes
UD-PUD: Yes
CAMeL-style: Yes

Tasreef يختلف عن المقارنات الخارجية في أنه يحافظ على طبقة مبنية على الكتاب، مع slots وقراءة تداولية صريحة. المقارنات الخارجية هنا هدفها benchmarking، لا إعادة تعريف المقياس.

ملخص القياس

Sentence accuracy

0.758

مجمّع corpus الكتاب.

Token accuracy

0.121

الطبقة token.

Slot accuracy

0.838

التموضع البنيوي.

Pragmatic accuracy

0.174

الطبقة التداولية.