الخط الأساس المتعلم

هذه الصفحة تعطي الخلاصة القابلة للتنزيل للـ sentence-family baseline و token-level BiLSTM-CRF المبنيين على weak supervision من المحلل الحالي.

الواجهة benchmark النشر التقرير الحزمة مقارنة morphology مذكرة التثبيت

← benchmark

Sentence baseline

0.9155

test accuracy · macro F1 0.916

Token baseline

0.9214

semantic / syntactic / pragmatic

Token joint

0.7738

sequence-level exactness

Train split

6514/814/814

train / dev / test

JSON summary sentence report token report publication summary

Sentence-family baseline

Classifier over the generated weak labels. This is the sentence-level learned comparison, not the final analyzer.

القسم	القيمة	ملاحظة
Model	`/Users/al-hmouz/Documents/Arabic_Tasreef/functional_syntax/models/generated_reference_sentence_classifier.pt`	saved checkpoint
Train rows	6719	split summary
Dev rows	840	split summary
Test rows	840	split summary
Train accuracy	0.9999	overfit check
Dev accuracy	0.9226	selection metric
Test accuracy	0.9155	published baseline
Test macro F1	0.916	balanced view

Training history

Epoch	Train loss	Train acc	Dev loss	Dev acc	Dev macro F1
1	0.9376	0.7254	0.4107	0.8524	0.8564
2	0.2192	0.8954	0.3516	0.8655	0.863
3	0.0876	0.9495	0.3035	0.8964	0.9066
4	0.0337	0.9772	0.2801	0.9226	0.9209
5	0.0145	0.9905	0.2878	0.9262	0.9267
6	0.0079	0.9957	0.3064	0.925	0.9238
7	0.0066	0.997	0.3034	0.9238	0.9232
8	0.0029	0.9993	0.3017	0.9226	0.924

Token BiLSTM-CRF baseline

Token-level weak-supervision model. Structural labels remain rule-derived; the learned layer covers semantic, syntactic, and pragmatic tags.

القسم	القيمة	ملاحظة
Model	`/Users/al-hmouz/Documents/Arabic_Tasreef/functional_syntax/models/generated_reference_token_bilstm_crf.pt`	saved checkpoint
Train sentences	6719	split summary
Dev sentences	840	split summary
Test sentences	840	split summary
Train tokens	26590	token corpus
Dev tokens	3352	token corpus
Test tokens	3332	token corpus
Vocab size	3328	token vocabulary
Test semantic accuracy	0.9214	published baseline
Test syntactic accuracy	0.9208	published baseline
Test pragmatic accuracy	0.9469	published baseline
Test joint accuracy	0.7738	sequence-level exactness

Best-state history

Epoch	Train loss	Dev loss	Dev sem	Dev syn	Dev prag	Dev joint
1	2.759	1.3413	0.8699	0.8699	0.9045	0.575
2	1.1177	1.0195	0.8965	0.8974	0.9266	0.6738
3	0.7277	0.9254	0.9081	0.9078	0.9382	0.7131
4	0.4847	0.9633	0.9126	0.9132	0.9382	0.7238
5	0.3237	0.9584	0.9147	0.915	0.9394	0.7429
6	0.2203	1.0079	0.9192	0.9186	0.9436	0.7512

Positioning

The learned baselines are supporting evidence, not the product core. The reference layer stays book-grounded and rule-first. The learned models show that the weak labels are learnable at sentence and token level.

عودة إلى benchmark الملخص المنشور الحزمة مقارنة morphology التقرير