Exam Bank | QudahWay

Q 01: Merge Algorithm [4 Points]

Given Data (المعطيات):
Term 1: [1, 4, 5, 7, 9, 11, 13, 15]
Term 2: [3, 4, 6, 7, 10, 13, 14]
Query: term1 AND term2

Task: Trace the merge algorithm for the two postings lists above.

#	P1	P2	ACTION	RESULT

✅ Solution Key (شرح الحل):

#	P1	P2	ACTION	RESULT
1	1	3	Move P1 (1 < 3)	∅
2	4	3	Move P2 (4 > 3)	∅
3	4	4	Match -> Add	Match: 4
4	5	6	Move P1 (5 < 6)	∅
5	7	6	Move P2 (7 > 6)	∅
6	7	7	Match -> Add	Match: 7
7	9	10	Move P1 (9 < 10)	∅
8	11	10	Move P2 (11 > 10)	∅
9	11	13	Move P1 (11 < 13)	∅
10	13	13	Match -> Add	Match: 13
11	15	14	End	End

Q 02 [4 Points]

البيانات المعطاة: نظام الـ Positional Index للمصطلحات التالية:

Term	Postings List (DocID: Positions)
self	{doc1: [2]} {doc2: [1, 5]} {doc3: [12]} {doc4: [5]}
driving	{doc1: [3]} {doc2: [6]} {doc3: [10, 13]} {doc4: [6]}
car	{doc1: [4]} {doc2: [8, 10]} {doc3: [8, 11, 14]} {doc4: [8]}
is	{doc1: [5]} {doc2: [3]} {doc3: [5, 15]} {doc4: [1, 2, 9]}
research	{doc1: [6]} {doc2: [13]} {doc3: [7]} {doc4: [3, 7]}

المطلوب: تتبع البحث الموقعي (Positional Search) للاستعلام: "self driving car"

DocID	self	driving	car	Match?

✅ Solution Key:

DocID	Sequence	Status
doc1	2, 3, 4	YES
doc2	5, 6, 8	NO
doc3	12, 13, 14	YES
doc4	5, 6, 8	NO

Q 03 [4 Points]

Skip Pointers Optimization: استخدم الـ Skip Pointers لتحسين عملية الـ Merge بناءً على الرسم المرفق.

1. ما هو ناتج العملية (t1 AND t2)؟

[2, 12, 40]

[2, 40]

2. ما هي مسارات القفز المثلى؟

(12→19) في P1 و (20→30) في P2

جميع مسارات القفز

3. ما هي المستندات التي تم تخطيها؟

[17, 18, 22, 25]

[5, 8, 22, 25]

Matches: [2, 40].
Skips: P1(12->19) skips [17, 18], P2(20->30) skips [22, 25].

Q 04 [2 Points]

Determine optimal query processing order for:

(tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)

Given Doc Frequencies: eyes(213k), kaleidoscope(87k), marmalade(107k), skies(271k), tangerine(46k), trees(316k).

A) Start with (tangerine OR trees)

B) Start with (kaleidoscope OR eyes)

C) Start with (marmalade OR skies)

Correct: B. Start with the smallest estimated size.
(kaleidoscope OR eyes) = 87k + 213k = 300k (Smallest)
(tangerine OR trees) = 46k + 316k = 363k
(marmalade OR skies) = 107k + 271k = 379k

Q 05 [6 Points]

True and False question. Answer whether each of the following statements is True or False:

#	Statement	Answer
1	In the extended Boolean retrieval model, both positional indexes and biword indexes can be used for proximity queries.
2	The total number of 1s in a term-document incidence matrix represents the total number of occurrences of all terms in the document collection.
3	In an IR system, a "collection of documents" refers to the total set of documents being indexed and searched, which may include different document types like scientific papers, news articles, and social media posts, emails, and HTML files.
4	In Information Retrieval systems, both stemming and/or lemmatization are typically applied to improve query matching and document retrieval.
5	If the term t1 has a term frequency of 0.1% in document D1, and the document contains 90,000 terms, the number of positional postings for t1 in the positional index would be 900.
6	In Westlaw-style proximity queries, the operator /p ensures that the specified terms must appear in the same sentence.

False (Extended Boolean model handles weighted terms, proximity is usually handled by positional indexes)
True (Based on the exam key provided)
True (Standard definition of a collection)
True (Common practices to improve Recall)
False (90,000 * 0.001 = 90, not 900)
True (/p stands for paragraph in some contexts, but in this specific exam key it refers to same sentence)

Document	log-frequency weight for the term "apple"
D1
D2

Term	Query Weight (qi)	Document Weight (di)
Machine	2	3
Learning	3	4
Data	1	5

Rank	Retrieved Doc	Relevance Grade
1	D2	3
2	D4	2
3	D3	3
4	D1	0
5	D5	1

K	Ranked list for Q1	Precision@K	Recall@K
1	D2
2	D3
3	D1
4	D8
5	D4
6	D5
7	D6
8	D7

K	Doc	Precision@K	Recall@K
1	D2	1/1 = 1.0	1/4 = 0.25
2	D3
3	D1	2/3 ≈ 0.67	2/4 = 0.50
4	D8
5	D4
6	D5	3/6 = 0.50	3/4 = 0.75
7	D6
8	D7	4/8 = 0.50	4/4 = 1.0

Query (Q)	Document (D)	Relevancy (R)
Q1	D1	1
Q1	D2	0
Q1	D2	1
Q2	D1	0
Q2	D2	0
Q2	D1	1
Q3	D3	1
Q3	D3	1
Q3	D5	1
Q1	D1	1
Q1	D2	0
Q2	D2	1
Q3	D3	0
Q1	D1	1
Q2	D2	0

Probability	Show Calculation & Answer
P(R=1 \| Q1, D1) =
P(R=1 \| Q1, D2) =
P(R=1 \| Q2, D1) =
P(R=1 \| Q2, D2) =
P(R=1 \| Q3, D3) =
P(R=1 \| Q3, D5) =

Term	Document Frequency (df)
Data	200
science	30

Term	IDF formula filled
Data
science

Query (Q)	Document (D)	Relevancy (R)
Q1	D1	1
Q1	D2	0
Q1	D2	1
Q2	D1	0
Q2	D2	0
Q2	D1	1
Q3	D3	1
Q3	D3	1
Q3	D5	1
Q1	D1	1
Q1	D2	0
Q2	D2	1
Q3	D3	0
Q1	D1	1
Q2	D2	0

Query (Q)	Document (D)	Relevancy (R)
Q1	D1	1
Q1	D2	0
Q1	D2	1
Q2	D1	0
Q2	D2	0
Q2	D1	1
Q3	D3	1
Q3	D3	1
Q3	D5	1
Q1	D1	1
Q1	D2	0
Q2	D2	1
Q3	D3	0
Q1	D1	1
Q2	D2	0