Context Free Languages and Grammar

Contents

**Recall Derivation Trees or Parse Trees** 4

**Recall the procedure to Eliminate: Null Productions and Unit Productions** 12

**Recall the procedure to find the Reduced Grammar for a given CFG** 12

**Define Chomsky Normal Form (CNF)** 12

**Recall the procedure to find CNF equivalent to the given CFG** 12

**Define Greibach Normal Form (GNF)** 12

**Recall the procedure to find GNF equivalent to the given CFG** 12

**Recall Closure properties of CFLs such as Union, Concatenation, Closure etc.** 12

**Recall decision properties of CFLs** 12

**Recall CYK (Cocke-Younger-Kasami) Algorithm** 12

**Recall Pumping Lemma for CFLs** 12

**Apply Pumping Lemma for CFLs** 12

**Recall Context Free Grammar**

A context-free grammar (CFG) is a formal grammar that describes a formal language in terms of production rules. It is widely used in computer science, particularly in the field of formal language theory and compiler design.

A CFG consists of four main components:

- Terminals: These are the basic symbols or tokens of the language. Terminals are the smallest units that cannot be further broken down. For example, in a programming language, terminals could be keywords, operators, punctuation marks, or literals.
- Non-terminals: These are symbols that can be replaced by a group of terminals and non-terminals according to the production rules. Non-terminals represent categories or types of phrases or expressions in the language. Non-terminals are often denoted by uppercase letters.
- Production rules: These rules specify how symbols (terminals or non-terminals) can be replaced or expanded. Each rule consists of a non-terminal on the left-hand side and a sequence of terminals and non-terminals on the right-hand side. It defines the transformation or expansion of a non-terminal into a sequence of symbols. Production rules are typically written in the form: Non-terminal → Sequence of symbols.
- Start symbol: This is a special non-terminal symbol that represents the entire language or the initial symbol from which the derivation of the language begins.

Using the production rules, a CFG generates strings in the language by repeatedly applying the rules to transform the start symbol into a sequence of terminals and non-terminals. This process is known as derivation or parsing.

Context-free grammars are used to define the syntax of programming languages, formalize natural languages, and analyze the structure of languages. They provide a foundation for various parsing algorithms and play a crucial role in the design and implementation of programming language compilers and interpreters.

Let’s consider a simple example of a context-free grammar that describes a language of arithmetic expressions involving addition and multiplication.

Here is the context-free grammar:

- Terminals:
- Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- Addition operator: ‘+’
- Multiplication operator: ‘*’
- Parentheses: ‘(‘ and ‘)’

- Non-terminals:
- Expr: Represents an arithmetic expression

- Production rules:
- Expr → Expr + Expr (Addition rule)
- Expr → Expr * Expr (Multiplication rule)
- Expr → (Expr) (Parentheses rule)
- Expr → Digit (Digit rule)

- Start symbol:
- Expr

These rules specify how an Expr can be expanded or derived. The rules allow for the addition and multiplication of two expressions, the use of parentheses to group expressions, and the use of single digits as valid expressions.

Using this grammar, we can generate valid arithmetic expressions. For example, let’s generate the expression “2 + (3 * 4)”:

- Start with the start symbol Expr.
- Apply the Addition rule: Expr → Expr + Expr
- Expr + Expr

- Apply the Digit rule: Expr → Digit
- Digit + Expr
- 2 + Expr

- Apply the Parentheses rule: Expr → (Expr)
- 2 + (Expr)
- 2 + (Expr * Expr)

- Apply the Digit rule: Expr → Digit
- 2 + (Digit * Expr)
- 2 + (3 * Expr)

- Apply the Digit rule: Expr → Digit
- 2 + (3 * Digit)
- 2 + (3 * 4)

The derivation process demonstrates how the production rules are applied to generate a valid arithmetic expression according to the given context-free grammar.

Note that this is a simplified example, and real-world programming languages have much more complex context-free grammars to define their syntax and structure.

Applications of Context Free Grammar:

Context-free grammars (CFGs) have several applications in various fields. Here are some common applications:

- Programming Languages: CFGs are extensively used to define the syntax of programming languages. Programming language compilers and interpreters employ CFGs to parse and validate the structure of source code. Tools such as Lex and Yacc/Bison utilize CFGs to generate lexical analyzers and parsers for programming languages.
- Natural Language Processing (NLP): CFGs play a significant role in natural language processing tasks such as syntax analysis, parsing, and generation of sentences. CFG-based parsers can analyze the grammatical structure of sentences and generate parse trees that capture the syntactic relationships between words.
- Compiler Design: CFGs are fundamental in compiler design for various phases of the compilation process. They are employed in lexical analysis, syntax analysis, and semantic analysis to validate and transform source code into machine-readable code.
- Syntax Highlighting and Code Editors: CFGs are used to define syntax highlighting rules in code editors and Integrated Development Environments (IDEs). By utilizing the grammar rules, the editor can highlight different elements of the code, such as keywords, variables, and comments, providing visual cues to the developers.
- Natural Language Generation: CFGs can be employed to generate natural language sentences based on predefined grammar rules. Applications include chatbots, automatic text generation, and machine translation systems, where CFGs help ensure the generated output adheres to grammatical rules.
- Pattern Recognition and Image Processing: CFGs find applications in pattern recognition and image processing tasks, such as analyzing and recognizing shapes, structures, and textures. CFG-based models can capture the grammar of visual patterns and generate models for classification and recognition.
- DNA Sequence Analysis: CFGs have been used in bioinformatics to model and analyze DNA and RNA sequences. CFGs can represent the structure and properties of DNA sequences and aid in tasks such as sequence alignment, gene prediction, and secondary structure prediction.

These are just a few examples of the wide range of applications of context-free grammars. The versatility and expressiveness of CFGs make them valuable tools in various domains where structured analysis, generation, or recognition of patterns and languages are required.

**Recall Derivation Trees or Parse Trees**

A derivation tree, also known as a parse tree or syntax tree, is a graphical representation of the derivation or parsing process of a string in a context-free grammar. It shows how the production rules of the grammar are applied to generate the string from the start symbol.

A derivation tree visually represents the hierarchical structure of a string according to the grammar’s rules. It consists of nodes and edges, where nodes represent symbols (terminals or non-terminals) and edges represent the application of production rules.

Here are the key components and features of a derivation tree:

- Root: The topmost node of the tree represents the start symbol of the grammar.
- Internal nodes: These nodes represent non-terminals. Each internal node is labeled with the non-terminal it represents.
- Leaf nodes: These nodes represent terminals. Each leaf node is labeled with the terminal it represents.
- Edges: The edges of the tree connect nodes and represent the application of production rules. Each edge is labeled with the production rule that was applied.
- Branches: The branches of the tree represent alternative choices made during the derivation process. When there are multiple production rules that can be applied, each choice leads to a different branch in the tree.

Derivation trees provide a visual representation of how a string can be generated or parsed according to a context-free grammar. They help in understanding the structure of the language and the role of each symbol in the derivation process. Derivation trees are often used in the analysis and implementation of compilers, interpreters, and syntax analyzers.

By examining a derivation tree, you can trace the steps of the derivation process, identify the non-terminals and terminals used, and understand the hierarchical relationship between different parts of the generated string.

Let’s consider the same context-free grammar for arithmetic expressions that we discussed earlier. We will generate a parse tree for the expression “2 + (3 * 4)”.

Here is the parse tree for the expression using grammar E->E+E | E * E |digit:

In this parse tree:

- The root node represents the start symbol Expr.
- The left subtree represents the left operand of the addition operation, which is the digit 2.
- The right subtree represents the right operand of the addition operation, which is the expression (3 * 4).
- The right subtree has its own structure, with the left subtree representing the left operand of the multiplication operation (digit 3) and the right subtree representing the right operand (digit 4).

By following the tree from the root to the leaves, you can reconstruct the original expression “2 + (3 * 4)”.

Parse trees provide a visual representation of the syntactic structure of a string according to a given grammar. They can be used to verify the correctness of a parsing algorithm, analyze the structure of the parsed expression, and facilitate further processing or interpretation of the input.

There are different types of derivation trees based on the derivation process and the order of applying production rules. The main types are leftmost derivation trees, rightmost derivation trees, and ambiguous derivation trees. Let’s explore each type with examples:

- Leftmost Derivation Tree:

A leftmost derivation tree is constructed by always expanding the leftmost non-terminal in each step of the derivation. It represents the leftmost derivation process.

- Rightmost Derivation Tree:

A rightmost derivation tree is constructed by always expanding the rightmost non-terminal in each step of the derivation. It represents the rightmost derivation process.

Let’s consider the following grammar and string:

Grammar:

- Start symbol: S
- Production rules:
- S → aSb
- S → ab

String: “aaabbb”

Leftmost Derivation:

- Leftmost derivation steps:
- S → aSb (Applying rule 1)
- S → aaSbb (Applying rule 1)
- S → aaaSbbb (Applying rule 1)
- S → aaabbb (Applying rule 2)

Leftmost derivation:

S → aSb → aaSbb → aaaSbbb → aaabbb

Leftmost derivation/parse tree:

Rightmost Derivation:

- Rightmost derivation steps:
- S → aSb (Applying rule 1)
- S → aaSbb (Applying rule 1)
- S → aaabbb (Applying rule 2)

Rightmost derivation:

S → aSb → aaSbb → aaabbb

Rightmost derivation/parse tree:

Both the leftmost and rightmost derivation parse trees and derivations correctly demonstrate the derivation process for the given grammar and string “aaabbb”.

Example 1: Draw derivation tree for the given grammar and input string

- E = E + E
- E = E * E
- E = a | b | c

Input string: a * b + c

Step 1:

Step 2:

Step 2:

Step 4:

Step 5:

Example 2: Draw a derivation tree for the string “bbabb” from the CFG given by

- S → bSb | a | b

Solution: Now, the derivation tree for the string “bbabb” is as follows:

The above tree is a derivation tree drawn for deriving a string bbabb. By simply reading the leaf nodes, we can obtain the desired string. The same tree can also be denoted by,

Example 3: Construct a derivation tree for the string “aabbabba” for the CFG given by,

- S → aB | bA
- A → a | aS | bAA
- B → b | bS | aBB

Solution: To draw a tree, we will first try to obtain derivation for the string aabbabba

Now, the derivation tree is as follows:

Example 4: Show the derivation tree for string “aabbbb” with the following grammar.

- S → AB | ε
- A → aB
- B → Sb

Solution: To draw a tree we will first try to obtain derivation for the string aabbbb

Now, the derivation tree for the string “aabbbb” is as follows:

**Determine Ambiguity in CFG**

To determine ambiguity in a context-free grammar (CFG), we need to check if there exist multiple parse trees for at least one string generated by the grammar. If multiple parse trees exist, it indicates ambiguity in the grammar.

Here’s a step-by-step approach to determine ambiguity in a CFG:

- Identify a string generated by the grammar that you suspect might have multiple parse trees.
- Construct the parse trees for that string using different derivation strategies, such as leftmost derivation and rightmost derivation.
- If you obtain multiple distinct parse trees for the same string using different derivation strategies, it indicates ambiguity in the grammar.
- Analyze the production rules and grammar structure to identify the source of ambiguity. Look for overlapping or conflicting production rules, or situations where the same string can be derived in multiple ways.
- If you find that different parse trees lead to different interpretations or meanings for the same string, it further confirms ambiguity in the grammar.
- It’s also essential to consider the language generated by the CFG. Some languages inherently have ambiguity, while others may have ambiguity due to specific grammar rules.

By following this process, you can determine if a CFG is ambiguous by identifying multiple parse trees for a particular string generated by the grammar.

Example 1: Check whether the given grammar G is ambiguous or not.

- E → E + E
- E → E – E
- E → id

Solution: From the above grammar String “id + id – id” can be derived in 2 ways:

First Leftmost derivation

- E → E + E
- → id + E
- → id + E – E
- → id + id – E
- → id + id- id

Second Leftmost derivation

- E → E – E
- → E + E – E
- → id + E – E
- → id + id – E
- → id + id – id

Since there are two leftmost derivation for a single string “id + id – id”, the grammar G is ambiguous.

Example 2: Check whether the given grammar G is ambiguous or not.

- S → aSb | SS
- S → ε

Solution: For the string “aabb” the above grammar can generate two parse trees

Since there are two parse trees for a single string “aabb”, the grammar G is ambiguous.

Example 3: Check whether the given grammar G is ambiguous or not.

- A → AA
- A → (A)
- A → a

Solution: For the string “a(a)aa” the above grammar can generate two parse trees:

Since there are two parse trees for a single string “a(a)aa”, the grammar G is ambiguous.

**Converting Ambiguous Grammar Into Unambiguous Grammar:**

To convert an ambiguous grammar into an unambiguous grammar, it is necessary to address causes such as left recursion and common prefixes, as they contribute to the ambiguity. By removing these causes, the grammar can be made unambiguous, although it is not always possible to do so.

There are several methods to remove ambiguity:

- Fixing the grammar: By carefully redefining the production rules and eliminating left recursion or common prefixes, the grammar can be made unambiguous.
- Adding grouping rules: By introducing additional rules to enforce grouping or precedence of certain expressions, ambiguity can be resolved.
- Using semantics: By considering the meaning or context of the language being described by the grammar, it is possible to choose the parse that makes the most sense and eliminate ambiguity.
- Adding precedence and associativity rules: By incorporating rules that define the precedence and associativity of operators, ambiguity can be resolved. Precedence rules establish the priority of operators based on the level of the production, while associativity rules determine whether an operator is left or right associative.

Implementing precedence and associativity constraints involves the following rules:

- R1: Precedence constraint: The level at which a production is defined determines the operator’s priority. Higher-level productions have lower priority, while lower-level productions have higher priority.
- R2: Associativity constraint: To enforce left associativity, induce left recursion in the production of the operator. To enforce right associativity, induce right recursion in the production.

By applying these techniques and rules, it is possible to convert an ambiguous grammar into an unambiguous one, improving clarity and eliminating multiple interpretations. However, it is important to note that complete conversion may not always be achievable for every ambiguous grammar.

Here are three examples that demonstrate the conversion of an ambiguous grammar to an unambiguous grammar:

Example 1: Ambiguous Grammar

Grammar:

- Start symbol: S
- Production rules:
- S → aSb
- S → SaaS
- S → ε (epsilon)

To convert this grammar to an unambiguous grammar, we can modify it as follows:

Modified Unambiguous Grammar:

- Start symbol: S
- Production rules:
- S → aSb
- S → X
- X → SaaS
- X → ε (epsilon)

In this modified grammar, we introduce a new non-terminal symbol X to handle the ambiguity caused by the recursive rule S → SaaS. By separating the recursive part into a new production rule, we ensure that the grammar becomes unambiguous.

Example 2: Ambiguous Grammar

Grammar:

- Start symbol: E
- Production rules:
- E → E + E
- E → E * E
- E → (E)
- E → id

To convert this grammar to an unambiguous grammar, we can introduce precedence and associativity rules:

Modified Unambiguous Grammar:

- Start symbol: E
- Production rules:
- E → E + T
- E → T
- T → T * F
- T → F
- F → (E)
- F → id

In this modified grammar, we introduce non-terminal symbols T and F to represent different levels of precedence. The addition operation has a higher precedence than multiplication, ensuring unambiguous parsing.

Example 3: Ambiguous Grammar

Grammar:

- Start symbol: S
- Production rules:
- S → aSa
- S → bSb
- S → ε (epsilon)

To convert this grammar to an unambiguous grammar, we can add grouping rules using parentheses:

Modified Unambiguous Grammar:

- Start symbol: S
- Production rules:
- S → a(S)a
- S → b(S)b
- S → ε (epsilon)

In this modified grammar, we explicitly enforce the grouping of the terminals a and b by enclosing the non-terminal S in parentheses. This ensures unambiguous parsing by disallowing multiple interpretations of the input string.

These examples illustrate how to convert ambiguous grammars into unambiguous grammars by introducing new non-terminal symbols, precedence and associativity rules, or grouping rules. By addressing the causes of ambiguity and providing clear rules for parsing, the resulting unambiguous grammars eliminate any potential for multiple interpretations.

**Recall the procedure to Eliminate: Null Productions and Unit Productions**

To eliminate null productions and unit productions from a context-free grammar, the following procedures can be applied:

- Eliminating Null Productions:

Null productions are productions that derive the empty string (ε). To eliminate null productions, the following steps can be followed:

a. Identify the null productions in the grammar. These are the productions where ε appears on the right-hand side.

b. For each production that contains a null production, generate all possible combinations by removing the null production. This means creating new productions where the null production is omitted.

c. Repeat step b until no new null productions are generated.

d. Remove all the null productions from the grammar.

- Eliminating Unit Productions:

Unit productions are productions where a single non-terminal symbol appears on the right-hand side. To eliminate unit productions, the following steps can be followed:

a. Identify the unit productions in the grammar.

b. For each unit production A → B, where A and B are non-terminal symbols, replace every occurrence of A with B in other productions.

c. Repeat step b until no new unit productions are generated.

d. Remove all the unit productions from the grammar.

After applying these procedures, the resulting grammar will have no null productions (productions deriving ε) or unit productions (productions with a single non-terminal symbol on the right-hand side). This process helps simplify the grammar and allows for more straightforward parsing and analysis.

It’s important to note that the elimination of null and unit productions may result in changes to the language generated by the grammar. Care should be taken to ensure that the transformed grammar still produces the desired language.

Here are two examples for each procedure: eliminating null productions and eliminating unit productions.

- Examples of Eliminating Null Productions:

Example 1:

Grammar:

- S → AB
- A → ε
- B → CD
- C → ε
- D → E

Procedure:

a. Identify the null productions: A → ε and C → ε.

b. Generate new productions without the null productions:

- S → AB | B
- A → ε
- B → CD
- C → ε
- D → E

c. Repeat step b:

- S → AB | B | CD | D | E
- A → ε
- B → CD | D | E
- C → ε
- D → E

d. Remove null productions:

- S → AB | B | CD | D | E
- B → CD | D | E
- D → E

The resulting grammar has eliminated the null productions.

Example 2:

Grammar:

- S → A | B
- A → ε
- B → CD
- C → ε
- D → ε

Procedure:

a. Identify the null productions: A → ε, C → ε, and D → ε.

b. Generate new productions without the null productions:

- S → A | B
- B → CD

c. Repeat step b:

- S → A | B | CD
- B → CD

d. Remove null productions:

- S → A | B | CD
- B → CD

The resulting grammar has eliminated the null productions.

Example 1 to eliminate Unit productions.

Grammar:

- S → A
- A → B
- B → a

Procedure:

a. Identify the unit productions: S → A, A → B.

b. Replace each unit production with the productions of the non-unit symbol:

- S → B
- A → a

c. Repeat step b until no new unit productions are generated.

The resulting grammar after eliminating unit productions:

- S → a
- A → a
- B → a

In this example, the grammar initially contains the unit productions S → A and A → B. By replacing the unit productions with the productions of the non-unit symbols, we obtain the resulting grammar where each production drives a terminal symbol.

Example 2 to Eliminate unit productions.

Grammar:

- S → A
- A → B
- B → C
- C → D
- D → a

Procedure:

a. Identify the unit productions: S → A, A → B, B → C, C → D.

b. Replace each unit production with the productions of the non-unit symbol:

- S → D
- A → D
- B → D
- C → a
- D → a

c. Repeat step b until no new unit productions are generated.

The resulting grammar after eliminating unit productions:

- S → a
- A → a
- B → a
- C → a
- D → a

In this example, the grammar initially contains the unit productions S → A, A → B, B → C, and C → D. By replacing the unit productions with the productions of the non-unit symbols, we obtain the resulting grammar where each production drives a terminal symbol.

**Recall the procedure to find the Reduced Grammar for a given CFG**

To find the reduced grammar for a given Context-Free Grammar (CFG), you can follow the following procedure:

- Remove unreachable symbols: Start by identifying all the non-terminals and terminals that are not reachable from the start symbol. Remove any productions involving these symbols.
- Remove useless symbols: Identify all the non-terminals and terminals that do not derive any string of terminals. Remove any productions involving these symbols.
- Remove ε-productions: Identify any productions of the form A → ε, where A is a non-terminal. Remove these ε-productions.
- Remove unit productions: Identify any unit productions of the form A → B, where both A and B are non-terminals. Replace each unit production with the productions of the non-unit symbol.
- Eliminate left-recursion: If the grammar contains left-recursive productions, eliminate them using appropriate techniques such as left-recursion elimination or left-factoring.
- Simplify the grammar: Perform any additional simplifications or transformations to make the grammar more concise and readable, if desired.

By following these steps, you can obtain the reduced grammar for a given CFG. Each step aims to eliminate certain types of productions or symbols to simplify and refine the grammar.

Please note that some steps may not be applicable to every grammar, and the specific techniques used for each step may vary depending on the requirements and constraints of the grammar.

Here are two examples to illustrate the procedure of finding the reduced grammar:

Example 1:

Consider the following grammar:

Grammar:

- S → Aa
- A → Aa | ε
- B → b

Procedure:

- Remove unreachable symbols: None in this example.
- Remove useless symbols: None in this example.
- Remove ε-productions: A → ε
- Remove unit productions: None in this example.
- Eliminate left-recursion: None in this example.

The resulting reduced grammar:

- S → Aa
- A → Aa
- A → a
- B → b

Example 2:

Consider the following grammar:

Grammar:

- S → A | B | ε
- A → aA | ε
- B → bB | ε
- C → cC

Procedure:

- Remove unreachable symbols: None in this example.
- Remove useless symbols: None in this example.
- Remove ε-productions: S → ε, A → ε, B → ε
- Remove unit productions: None in this example.
- Eliminate left-recursion: None in this example.

The resulting reduced grammar:

- S → A | B
- A → aA | a
- B → bB | b
- C → cC

In both examples, we follow the procedure step-by-step to find the reduced grammar by eliminating unreachable symbols, useless symbols, ε-productions, and left-recursion (if applicable). The resulting reduced grammars are simpler and more concise, representing the same language as the original grammars.

**Define Chomsky Normal Form (CNF)**

Chomsky Normal Form (CNF) is a standard form used to represent context-free grammars (CFGs). In CNF, every production rule in the grammar has one of two forms:

- A → BC: Here, A, B, and C are non-terminal symbols, and BC represents a combination of two non-terminals.
- A → a: Here, A is a non-terminal symbol, and a is a terminal symbol.

In other words, each production rule in CNF either produces two non-terminals or a single terminal. CNF has the following additional properties:

- The start symbol of the grammar cannot appear on the right side of any production rule.
- The empty string ε is not allowed in CNF, except for the case where the start symbol is allowed to derive ε.

The purpose of converting a CFG into Chomsky Normal Form is to simplify and analyze the grammar. CNF makes it easier to reason about the language generated by the grammar and enables more efficient parsing algorithms.

It is important to note that not all CFGs can be directly converted into CNF. Some grammars may require additional transformations or modifications before they can be represented in CNF.

**Recall the procedure to find CNF equivalent to the given CFG**

To find the Chomsky Normal Form (CNF) equivalent of a given context-free grammar (CFG), you can follow the following procedure:

- Remove ε-productions:
- Identify and remove any ε-productions (productions that derive the empty string).
- Adjust other productions accordingly by removing occurrences of the non-terminal symbol associated with the ε-production.

- Remove unit productions:
- Identify and remove any unit productions (productions of the form A → B, where A and B are non-terminal symbols).
- Adjust other productions accordingly by replacing occurrences of the non-terminal symbol associated with the unit production.

- Eliminate non-terminal symbols on the right side with more than two symbols:
- For each production with a right side containing more than two symbols, introduce new non-terminal symbols and break the production into smaller productions.
- Replace the original production with the new productions.

- Convert non-terminal symbols on the right side into individual non-terminals:
- For each production with non-terminal symbols on the right side, replace them with new non-terminal symbols.
- Introduce new productions to represent the original relationships.

- The resulting grammar is in CNF:
- All productions are either of the form A → BC (where A, B, and C are non-terminals) or A → a (where A is a non-terminal and a is a terminal).

It’s important to note that the above procedure assumes the given CFG is in a proper form without any syntax errors. Additionally, converting a CFG into CNF may result in an increase in the number of productions.

By following this procedure, you can obtain the CNF equivalent of a given CFG, which is useful for various applications such as parsing and analysis of context-free languages.

Example 1: Convert the given CFG to CNF. Consider the given grammar:

- S → a | aA | B
- A → aBB | ε
- B → Aa | b

Solution:

Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The grammar will be:

- S1 → S
- S → a | aA | B
- A → aBB | ε
- B → Aa | b

Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:

- S1 → S
- S → a | aA | B
- A → aBB
- B → Aa | b | a

Now, as grammar contains Unit production S → B, its removal yield:

- S1 → S
- S → a | aA | Aa | b
- A → aBB
- B → Aa | b | a

Also remove the unit production S1 → S, its removal from the grammar yields:

- S0 → a | aA | Aa | b
- S → a | aA | Aa | b
- A → aBB
- B → Aa | b | a

Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS with non-terminals. So we will replace terminal a with X:

- S0 → a | XA | AX | b
- S → a | XA | AX | b
- A → XBB
- B → AX | b | a
- X → a

Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from grammar yield:

- S0 → a | XA | AX | b
- S → a | XA | AX | b
- A → RB
- B → AX | b | a
- X → a
- R → XB

Hence, for the given grammar, this is the required CNF.

**Define Greibach Normal Form (GNF)**

Greibach Normal Form (GNF) is a specific form of context-free grammars (CFGs) named after Sheila Greibach. In GNF, all productions have the form A → aα, where A is a non-terminal symbol, a is a terminal symbol, and α is a possibly empty string of non-terminals.

More formally, a CFG is in Greibach Normal Form if every production is of the following form:

A → aα

where A is a non-terminal symbol, a is a terminal symbol, and α is a possibly empty string of non-terminals.

In GNF, the right-hand side of each production starts with a terminal symbol, followed by zero or more non-terminal symbols. This form provides a restricted yet powerful representation of context-free languages.

GNF has the following advantages:

- Deterministic Parsing: GNF allows for deterministic parsing since there is only one non-terminal symbol at the beginning of each production.
- Leftmost Derivation: GNF allows for efficient leftmost derivations, making it easier to analyze the structure of the language.

It’s worth noting that not all context-free languages can be represented in GNF. However, for those that can, GNF provides a concise and unambiguous representation of the language.

**Recall the procedure to find GNF equivalent to the given CFG**

To find the Greibach Normal Form (GNF) equivalent of a given context-free grammar (CFG), you can follow the following procedure:

- Remove ε-productions:
- Identify and remove any ε-productions (productions that derive the empty string).
- Adjust other productions accordingly by removing occurrences of the non-terminal symbol associated with the ε-production.

- Remove unit productions:
- Identify and remove any unit productions (productions of the form A → B, where A and B are non-terminal symbols).
- Adjust other productions accordingly by replacing occurrences of the non-terminal symbol associated with the unit production.

- Eliminate non-terminal symbols on the right side with more than one symbol:
- For each production with a right side containing more than one symbol, introduce new non-terminal symbols and break the production into smaller productions.
- Replace the original production with the new productions.

- Convert non-terminal symbols on the right side into individual non-terminals:
- For each production with non-terminal symbols on the right side, replace them with new non-terminal symbols.
- Introduce new productions to represent the original relationships.

- Ensure productions are in GNF form:
- Ensure that each production satisfies the GNF requirements, which state that the right side of each production consists of either a single terminal symbol or starts with a single terminal symbol followed by zero or more non-terminal symbols.

- The resulting grammar is in GNF.

It’s important to note that the above procedure assumes the given CFG is in a proper form without any syntax errors. Additionally, converting a CFG into GNF may result in an increase in the number of productions.

By following this procedure, you can obtain the GNF equivalent of a given CFG, which provides a structured and unambiguous representation of the grammar and is useful for various parsing and analysis applications.

Example 1: Convert the following CFG into GNF.

- S → XB | AA
- A → a | SA
- B → b
- X → a

Solution:

As the given grammar G is already in CNF and there is no left recursion, so we can skip step 1 and step 2 and directly go to step 3.

The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production rule A → SA as:

- S → XB | AA
- A → a | XBA | AAA
- B → b
- X → a

The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in the production rule S → XB and B → XBA as:

- S → aB | AA
- A → a | aBA | AAA
- B → b
- X → a

Now we will remove left recursion (A → AAA), we get:

- S → aB | AA
- A → aC | aBAC
- C → AAC | ε
- B → b
- X → a

Now we will remove null production C → ε, we get:

- S → aB | AA
- A → aC | aBAC | a | aBA
- C → AAC | AA
- B → b
- X → a

The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in production rule S → AA as:

- S → aB | aCA | aBACA | aA | aBAA
- A → aC | aBAC | a | aBA
- C → AAC
- C → aCA | aBACA | aA | aBAA
- B → b
- X → a

The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA in production rule C → AAC as:

- S → aB | aCA | aBACA | aA | aBAA
- A → aC | aBAC | a | aBA
- C → aCAC | aBACAC | aAC | aBAAC
- C → aCA | aBACA | aA | aBAA
- B → b
- X → a

Hence, this is the GNF form for the grammar G.

**Recall Closure properties of CFLs such as Union, Concatenation, Closure etc.**

Here are the closure properties of context-free languages (CFLs) along with examples for each:

- Union:
- Property: The union of two CFLs is also a CFL.
- Example: Let’s consider two CFLs, L1 = {a^n b^n | n ≥ 0} and L2 = {a^n c^n | n ≥ 0}. Both languages contain strings with an equal number of ‘a’ followed by an equal number of ‘b’ or ‘c’. The union of L1 and L2, denoted as L1 ∪ L2, is a CFL that consists of strings in either language. For example, L1 ∪ L2 contains strings like “ab”, “aabb”, “ac”, “aacc”, etc.

- Concatenation:
- Property: The concatenation of two CFLs is also a CFL.
- Example: Let’s consider two CFLs, L1 = {a^n b^n | n ≥ 0} and L2 = {c^m d^m | m ≥ 0}. The concatenation of L1 and L2, denoted as L1L2, is a CFL that consists of strings where an ‘a’ followed by a ‘b’ is concatenated with a ‘c’ followed by a ‘d’. For example, L1L2 contains strings like “abcd”, “aabbccdd”, “aabbbbccdddd”, etc.

- Kleene Closure (Star):
- Property: The Kleene closure of a CFL is also a CFL.
- Example: Let’s consider a CFL, L = {a^n | n ≥ 0}. The Kleene closure of L, denoted as L*, is a CFL that consists of strings formed by concatenating zero or more strings from L. For example, L* contains strings like “a”, “aa”, “aaa”, “aaaa”, etc., as well as the empty string ε.

- Intersection:
- Property: The intersection of two CFLs is not necessarily a CFL.
- Example: Let’s consider two CFLs, L1 = {a^n b^n | n ≥ 0} and L2 = {a^n c^n | n ≥ 0}. The intersection of L1 and L2, denoted as L1 ∩ L2, is not a CFL. The language L1 ∩ L2 contains strings that are both in L1 and L2, which violates the closure property for intersection.

- Complement:
- Property: The complement of a CFL is not necessarily a CFL.
- Example: Let’s consider a CFL, L = {a^n b^n | n ≥ 0}. The complement of L, denoted as L’, is not a CFL. The language L’ contains strings that are not in L, which includes strings with an unequal number of ‘a’s and ‘b’s, as well as strings that do not contain ‘a’ or ‘b’.

It’s important to note that while union, concatenation, and Kleene closure preserve the context-free property, intersection and complement do not necessarily result in a CFL.

**Recall decision properties of CFLs**

Here are some decision properties of context-free languages (CFLs) along with examples for each:

- Emptiness:
- Property: Determines whether a CFL is empty (contains no strings).
- Example: Consider a CFL L = {a^n b^n | n ≥ 0}. The emptiness problem for L asks whether L contains any strings. In this case, L is not empty because it contains strings like “ab”, “aabb”, “aaabbb”, etc.

- Membership:
- Property: Determines whether a given string belongs to a CFL.
- Example: Consider a CFL L = {a^n b^n | n ≥ 0}. The membership problem for L asks whether a given string, such as “aabb”, belongs to L. In this case, “aabb” belongs to L because it follows the pattern of having an equal number of ‘a’s and ‘b’s.

- Equality:
- Property: Determines whether two CFLs are equal, i.e., contain exactly the same set of strings.
- Example: Consider two CFLs, L1 = {a^n b^n | n ≥ 0} and L2 = {w | w is a palindrome}. The equality problem for L1 and L2 asks whether L1 and L2 are the same language. In this case, L1 and L2 are not equal because they contain different sets of strings. L1 only contains strings with equal numbers of ‘a’s and ‘b’s, while L2 contains strings that are palindromes.

- Universality:
- Property: Determines whether a CFL contains all possible strings over its alphabet.
- Example: Consider a CFL L = {a^n b^n | n ≥ 0}. The universality problem for L asks whether L contains all possible strings over the alphabet {a, b}. In this case, L is not universal because it only contains strings with an equal number of ‘a’s and ‘b’s.

- Intersection with Regular Language:
- Property: Determines whether the intersection of a CFL and a regular language is non-empty.
- Example: Consider a CFL L = {a^n b^n | n ≥ 0} and a regular language R = (ab)*. The intersection problem for L and R asks whether the intersection of L and R is non-empty. In this case, the intersection of L and R is non-empty because both languages contain the string “ab”.

These decision properties are used to analyze and classify context-free languages based on their properties and relationships with other languages.

**Recall CYK (Cocke-Younger-Kasami) Algorithm**

The CYK (Cocke-Younger-Kasami) algorithm is a parsing algorithm used to determine if a given string can be generated by a given context-free grammar (CFG). The algorithm is based on dynamic programming and operates in a bottom-up manner.

Here’s an overview of the CYK algorithm:

- Input:
- A CFG in Chomsky Normal Form (CNF) consisting of a set of non-terminal symbols, a set of terminal symbols, a set of production rules, and a start symbol.
- A string to be parsed.

- Initialization:
- Create an n x n matrix, where n is the length of the input string.
- Initialize all cells of the matrix as empty.

- Filling the matrix:
- For each cell (i, j) in the matrix, where i represents the starting position and j represents the ending position:
- If the substring from position i to j (inclusive) is a terminal symbol, populate the cell with non-terminals that produce that terminal.
- For each value of k from i to j-1:
- For each production rule A -> BC, where B and C are non-terminals:
- If the cell (i, k) contains non-terminal B and the cell (k+1, j) contains non-terminal C, populate the cell (i, j) with non-terminal A.

- For each production rule A -> BC, where B and C are non-terminals:

- For each cell (i, j) in the matrix, where i represents the starting position and j represents the ending position:
- Parsing result:
- If the start symbol of the grammar is present in the cell (0, n-1), where n is the length of the input string, then the string can be generated by the grammar. Otherwise, it cannot.

The CYK algorithm uses the concept of subproblems and overlapping subproblems to efficiently determine the parseability of a string. By filling the matrix from smaller substrings to larger substrings, it systematically checks all possible ways to derive the string from the grammar. If the start symbol is present in the final cell of the matrix, it indicates that the string can be generated by the grammar.

The CYK algorithm has a time complexity of O(n^3 * |G|), where n is the length of the input string and |G| is the size of the CFG. It is widely used in natural language processing and syntax analysis to determine the grammaticality of sentences.

**Apply CYK Algorithm**

The CYK algorithm can be used to decide the membership of a given string in a context-free grammar (CFG). The algorithm constructs a triangular table where each row represents a specific length of substrings. The bottommost row corresponds to substrings of length 1, the second row from the bottom corresponds to substrings of length 2, and so on. The topmost row represents the given string itself, which has a length of n. By populating the table with non-terminals that can generate the corresponding substrings, we can determine if the given string can be derived from the CFG.

Example: For a given string “x” of length 4 units, triangular table looks like

We will fill each box of x_{ij} with its V_{ij}. These notations are discussed below.

Where,

x_{ij} represents a sub string of “x” starting from location ‘i’ and has length ‘j’.

Example-

Consider x = abcd is a string, then

Number of sub strings possible = n(n+1)/2 = 4 x (4+1) / 2 = 10

We have-

- x
_{11}= a - x
_{21}= b - x
_{31}= c - x
_{41}= d - x
_{12}= ab - x
_{22}= bc - x
_{32}= cd - x
_{13}= abc - x
_{23}= bcd - x
_{14}= abcd

and_{j}

V_{ij} represents a set of variables in the grammar which can derive the sub string x_{ij.}

If the set of variables consists of the start symbol, then it becomes sure-

- Sub-string x
_{ij}can be derived from the given grammar. - Sub-string x
_{ij}is a member of the language of the given grammar.

Example: For the given grammar, check the acceptance of string w = baaba using CYK Algorithm.

S → AB / BC

A → BA / a

B → CC / b

C → AB / a

Explanation:

First, let us draw the triangular table.

- The given string is x = baaba.
- Length of given string = |x| = 5.
- So, Number of sub strings possible = (5 x 6) / 2 = 15.

So, triangular table looks like

Now, let us find the value of V_{ij} for each cell.

For first row:

Evaluate V_{11}

- V
_{11}represents the set of variables deriving x_{11}. - x
_{11}= b. - Only variable B derives string “b” in the given grammar.
- Thus, V
_{11}= { B }

Evaluate V_{21}

- V
_{21}represents the set of variables deriving x_{21}. - x
_{21}= a. - Variables A and C derive string “a” in the given grammar.
- Thus, V
_{21}= { A , C }

Evaluate V_{31}

- V
_{31}represents the set of variables deriving x_{31}. - x
_{31}= a. - Variables A and C derive string “b” in the given grammar.
- Thus, V
_{31}= { A , C }

Evaluate V_{41}

- V
_{41}represents the set of variables deriving x_{41}. - x
_{41}= b. - Only variable B derives string “b” in the given grammar.
- Thus, V
_{41}= { B }

Evaluate V_{51}

- V
_{51}represents the set of variables deriving x_{51}. - x
_{51}= a. - Variables A and C derives string “a” in the given grammar.
- Thus, V
_{51}= { A , C }

For Second row

As per the algorithm, to find the value of V_{ij} from 2^{nd} row on wards,

we use the formula-

V_{ij} = V_{ik} V_{(i+k)(j-k)}

where k varies from 1 to j-1

Evaluate V_{12}

We have i = 1 , j = 2 , k = 1

Substituting values in the formula, we get-

V_{12} = V_{11}. V_{21}

V_{12} = { B } { A , C }

V_{12} = { BA , BC }

∴ V_{12} = { A , S }

Evaluate V_{22}

We have i = 2 , j = 2 , k = 1

Substituting values in the formula, we get-

V_{22} = V_{21}. V_{31}

V_{22} = { A , C } { A , C }

V_{22} = { AA , AC , CA , CC }

Since AA , AC and CA do not exist, so we have-

V_{22} = { CC }

∴ V_{22} = { B }

Evaluate V_{32}

We have i = 3 , j = 2 , k = 1

Substituting values in the formula, we get-

V_{32} = V_{31}. V_{41}

V_{32} = { A , C } { B }

V_{32} = { AB , CB }

Since CB does not exist, so we have-

V_{32} = { AB }

∴ V_{32} = { S , C }

Evaluate V_{42}

We have i = 4 , j = 2 , k = 1

Substituting values in the formula, we get-

V_{42} = V_{41}. V_{51}

V_{42} = { B } { A , C }

V_{42} = { BA , BC }

∴ V_{42} = { A , S }

For third row:

Evaluate V_{13}

We have i = 1 , j = 3 , k = 1 to (3-1) = 1,2

Substituting values in the formula, we get-

V_{13} = V_{11}. V_{22} ∪ V_{12}. V_{31}

V_{13} = { B } { B } ∪ { A , S } { A , C }

V_{13} = { BB } ∪ { AA , AC , SA , SC }

Since BB , AA , AC , SA and SC do not exist, so we have-

V_{13} = ϕ ∪ ϕ

∴ V_{13} = ϕ

Evaluate V_{23}

We have i = 2 , j = 3 , k = 1 to (3-1) = 1,2

Substituting values in the formula, we get-

V_{23} = V_{21}. V_{32} ∪ V_{22}. V_{41}

V_{23} = { A , C } { S , C } ∪ { B } { B }

V_{23} = { AS , AC , CS , CC } ∪ { BB }

Since AS , AC , CS and BB do not exist, so we have-

V_{23} = { CC }

∴ V_{23} = B

Evaluate V_{33}

We have i = 3 , j = 3 , k = 1 to (3-1) = 1,2

Substituting values in the formula, we get-

V_{33} = V_{31}. V_{42} ∪ V_{32}. V_{51}

V_{33} = { A , C } { A , S } ∪ { S , C } { A , C }

V_{33} = { AA , AS , CA , CS } ∪ { SA , SC , CA , CC }

Since AA , AS , CA , CS , SA , SC and CA do not exist, so we have-

V_{33} = ϕ ∪ { CC }

V_{33} = ϕ ∪ { B }

∴ V_{33} = { B }

For fourth row onward:

Evaluate V_{14}

We have i = 1 , j = 4 , k = 1 to (4-1) = 1,2,3

Substituting values in the formula, we get-

V_{14} = V_{11}. V_{23} ∪ V_{12}. V_{32} ∪ V_{13}. V_{41}

V_{14} = { B } { B } ∪ { A , S } { S , C } ∪ { ϕ , B }

V_{14} = { BB } ∪ { AS , AC , SS , SC } ∪ { B }

Since BB , AS , AC , SS , SC and B do not exist, so we have-

V_{14} = ϕ ∪ ϕ ∪ ϕ

∴ V_{14} = ϕ

Evaluate V_{24}

We have i = 2 , j = 4 , k = 1 to (4-1) = 1,2,3

Substituting values in the formula, we get-

V_{24} = V_{21}. V_{33} ∪ V_{22}. V_{42} ∪ V_{23}. V_{51}

V_{24} = { A , C } { B } ∪ { B } { A , S } ∪ { B } { A , C }

V_{24} = { AB , CB } ∪ { BA , BS } ∪ { BA , BC }

Since CB does not exist, so we have-

V_{24} = { AB } ∪ { BA , BS } ∪ { BA , BC }

V_{24} = { S , C } ∪ { A } ∪ { A , S }

∴ V_{24} = { S , C , A }

For fifth row:

Evaluate V_{15}

We have i = 1 , j = 5 , k = 1 to (5-1) = 1,2,3,4

Substituting values in the formula, we get-

V_{15} = V_{11}. V_{24} ∪ V_{12}. V_{33} ∪ V_{13}. V_{42} ∪ V_{14}. V_{51}

V_{15} = { B } { S , C , A } ∪ { A , S } { B } ∪ { ϕ } { A , S } ∪ { ϕ } { A , C }

V_{15} = { BS , BC , BA } ∪ { AB , SB } ∪ { A , S } ∪ { A , C }

Since BS , SB , A , S and C do not exist, so we have-

V_{15} = { BC , BA } ∪ { AB } ∪ ϕ ∪ ϕ

V_{15} = { S , A } ∪ { S , C } ∪ ϕ ∪ ϕ

∴ V_{15} = { S , A , C }

Now,

- The value of V
_{ij}is computed for each cell. - We observe V
_{15}contains the start symbol S. - Thus, string x
_{15}= baaba is a member of the language of given grammar.

After filling the triangular table, it looks like-