TargetingIdeaService currently has an NGRAM_GROUP AttributeType that gives you back the longest matching substring of the given keywords that appear more than n times (n is defined by AdWords System).
We’ve phased out the NGRAM_GROUP AttributeType starting from v201302 release. If you need to calculate NGRAM, you can do that using steps below.
1. Tokenize keyword ideas.
First, normalize the list of keyword ideas into a list of tokens and remove the low quality tokens (identical, consecutive or single character tokens). (e.g. Given keyword ideas of ["foo foo bar a baz", "foo bar"] become [["foo", "bar", "baz"], ["foo", "bar"]])
2. Generate all the possible n-gram candidates.
Then, generate all possible n-gram candidates from the list of separated tokens. (e.g. From the tokens [["foo", "bar", "baz"], [“baz foo”]], the n-gram candidates are [“foo bar baz”, “foo bar”, “bar baz”, “baz foo”])
3. Apply the n-gram groups to the unassigned keyword ideas.
For each keyword idea, we assign the longest n-gram candidate that appears at least n-times across all the keyword ideas. (e.g. Suppose n-gram candidate [“foo bar baz”, “foo bar”, “bar baz”, “baz foo”], keyword ideas ["foo bar baz", "foo bar”] and n is 2. candidate "foo bar" will be assigned to keyword ideas "foo bar baz" and "foo bar". The candidate “foo bar” appears 2 times in the keyword ideas.)
With those steps, you will get the longest matching substring that appear more than n times.
The complete code example is available here.
As always, please feel free to ask any questions regarding the client libraries or the AdWords API on our forum or during scheduled office hours. You can also follow the Google Ads Developer page for all Ads-related updates.
- Takeshi Hagikura, AdWords API Team
1. Tokenize keyword ideas.
First, normalize the list of keyword ideas into a list of tokens and remove the low quality tokens (identical, consecutive or single character tokens). (e.g. Given keyword ideas of ["foo foo bar a baz", "foo bar"] become [["foo", "bar", "baz"], ["foo", "bar"]])
2. Generate all the possible n-gram candidates.
Then, generate all possible n-gram candidates from the list of separated tokens. (e.g. From the tokens [["foo", "bar", "baz"], [“baz foo”]], the n-gram candidates are [“foo bar baz”, “foo bar”, “bar baz”, “baz foo”])
3. Apply the n-gram groups to the unassigned keyword ideas.
For each keyword idea, we assign the longest n-gram candidate that appears at least n-times across all the keyword ideas. (e.g. Suppose n-gram candidate [“foo bar baz”, “foo bar”, “bar baz”, “baz foo”], keyword ideas ["foo bar baz", "foo bar”] and n is 2. candidate "foo bar" will be assigned to keyword ideas "foo bar baz" and "foo bar". The candidate “foo bar” appears 2 times in the keyword ideas.)
With those steps, you will get the longest matching substring that appear more than n times.
The complete code example is available here.
As always, please feel free to ask any questions regarding the client libraries or the AdWords API on our forum or during scheduled office hours. You can also follow the Google Ads Developer page for all Ads-related updates.
- Takeshi Hagikura, AdWords API Team