Evaluating content
It is not a straightforward process, and is a subjective one as well. What you might evaluate as OK, I might evaluate as great, and so on. The other challenge is that this process is not scalable partly due to these very reasons. People need to read, make an evaluation according to a set of criteria, and somehow summarize the results.
What can be done to make this process easier is to structure the process by checking for clear criteria, and whether or not a piece of content achieves them. This can be the sweet spot where LLMs can help us: the problem is not highly structured as to be solved by a regex, or standard text analysis, yet it is not so unstructured that it requires experts to issue a full report on the quality of the content.
The plan:
- Take an article
- Create a set of questions about the article where the answers can either be
True
orFalse
- Let the LLM evaluate and check if the article achieves any of the criteria
- Repeat the same process with many articles with bulk prompting
- Create a summary report
There are many ways to evaluate content, and one quite important one is Google’s helpful content gudelines. These are a set of subjective but relatively structured questions that they use to evaluate content.
The report issued by evaluating and scoring according to these criteria is never going to be perfect, and small differences between articles won’t mean anything. If article A “achieves” 90% of the criteria, and article B achieves 89%, that’s not a meaningful comparison. We would be looking for large differences and checking if certain article miss, let’s say, half the criteria. Or check overall how our articles are doing according to a certain crierion, for example: >Does the main heading or page title avoid exaggerating or being shocking in nature?
and seeing what percentage of our content seems to be exaggerated or shocking.
Here is a sample of the criteria
Guidelines criteria and their categories
A sample of criteria/questions from each category. You can download the full helpful content creteria list if you are interested.
category | question | |
---|---|---|
0 | Content and quality questions | Does the content provide original information, reporting, research, or analysis? |
1 | Content and quality questions | Does the content provide a substantial, complete, or comprehensive description of the topic? |
2 | Content and quality questions | Does the content provide insightful analysis or interesting information that is beyond the obvious? |
3 | Content and quality questions | If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide ... |
4 | Content and quality questions | Does the main heading or page title provide a descriptive, helpful summary of the content? |
12 | Expertise questions | Does the content present information in a way that makes you want to trust it, such as clear sourcing, evidence of t... |
13 | Expertise questions | If someone researched the site producing the content, would they come away with an impression that it is well-truste... |
14 | Expertise questions | Is this content written or reviewed by an expert or enthusiast who demonstrably knows the topic well? |
15 | Expertise questions | Does the content have any easily-verified factual errors? |
16 | Focus on people-first content | Do you have an existing or intended audience for your business or site that would find the content useful if they ca... |
17 | Focus on people-first content | Does your content clearly demonstrate first-hand expertise and a depth of knowledge (for example, expertise that com... |
18 | Focus on people-first content | Does your site have a primary purpose or focus? |
19 | Focus on people-first content | After reading your content, will someone leave feeling they've learned enough about a topic to help achieve their goal? |
20 | Focus on people-first content | Will someone reading your content leave feeling like they've had a satisfying experience? |
21 | Avoid creating search engine-first content | Is the content primarily made to attract visits from search engines? |
22 | Avoid creating search engine-first content | Are you producing lots of content on many different topics in hopes that some of it might perform well in search res... |
23 | Avoid creating search engine-first content | Are you using extensive automation to produce content on many topics? |
24 | Avoid creating search engine-first content | Are you mainly summarizing what others have to say without adding much value? |
25 | Avoid creating search engine-first content | Are you writing about things simply because they seem trending and not because you'd write about them otherwise for ... |
32 | Who (created the content) | Is it self-evident to your visitors who authored your content? |
33 | Who (created the content) | Do pages carry a byline, where one might be expected? |
34 | Who (created the content) | Do bylines lead to further information about the author or authors involved, giving background about them and the ar... |
35 | How (the content was created) | Is the use of automation, including AI-generation, self-evident to visitors through disclosures or in other ways? |
36 | How (the content was created) | Are you providing background about how automation or AI-generation was used to create content? |
37 | How (the content was created) | Are you explaining why automation or AI was seen as useful to produce content? |
Get content to evaluate
We now need a set of articles to evaluate and we can easily do so by crawling a certain website, and extracing the main content with custom extraction.
1sitemap = adv.sitemap_to_df('https://nbastats.pro/robots.txt')
2url_list = sitemap['loc'].sample(50)
adv.crawl(
url_list=url_list,
output_file='nbastats_crawl.jl',
custom_settings={
'LOG_FILE': 'nbastats_crawl.log',
},
xpath_selectors={
3 'player_description': '//div[@class="col-lg-10 col-md-9"]/text() | //div//span[@id="more"]/text()'
})
- 1
- Get the URLs from a sitemap
- 2
- Random list of URLs to crawl
- 3
- Special XPath selector to exract the article text
Sample rows and columns of the crawl dataset
url | h1 | player_description | |
---|---|---|---|
0 | https://nbastats.pro/player/John_Niemiera | John Niemiera Stats: NBA Career | Introducing John Niemiera: The Detroit Pistons' Sharpshooter@@When it comes to basketball, there are players who lea... |
1 | https://nbastats.pro/player/Maceo_Baston | Maceo Baston Stats: NBA Career | Maceo Baston: A Defensive Force on the Court@@When it comes to analyzing the impact of a basketball player, statisti... |
2 | https://nbastats.pro/player/Marko_Jaric | Marko Jaric Stats: NBA Career | Welcome to the profile page of Marko Jaric, a skilled and versatile NBA basketball player who has left his mark on t... |
3 | https://nbastats.pro/player/Elijah_Hughes | Elijah Hughes Stats: NBA Career | Elijah Hughes: Unveiling the Lesser-Known Talents of a Rising NBA Star@@In the world of professional basketball, the... |
4 | https://nbastats.pro/player/Lou_Tsioropoulos | Lou Tsioropoulos Stats: NBA Career | Meet Lou Tsioropoulos, a former NBA player who made his mark on the court during his time with the Boston Celtics. T... |
Evaluate the content with OpenAI’s API
For each article and its main heading:
Send the article, and its heading together with the set of questions to ask/check
Combine all responses in one DataFrame
Get averages and/or counts for each criterion
Send articles with questions
responses = []
1for url, h1, description in crawldf[['url', 'h1', 'player_description']].values:
print(h1)
completion = client.chat.completions.create(
model="gpt-4o",
2 temperature=0,
3 seed=123,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"""
Please answer the following questions about this article and its title.
Respond in JSON where questions are keys and answers are values.
Answers should be boolean only.
article_title: {h1}
aritcle_text: {description.replace('@@', ' ')}
questions: {content_guidelines.head(12)['question'].tolist()}
"""}
])
responses.append((url, h1, completion))
- 1
- Evaluate the article together with the main heading
- 2
- Set temperature at zero to minimize randomization (we want straightforward true/false answers)
- 3
- Set a custom seed to reproduce the same output given the same input
Combine responses
1response_dfs = []
for url, h1, response in responses:
df = pd.DataFrame(json.loads(response.dict()['choices'][0]['message']['content'][7:-3]).items())
df['title'] = h1
df['url'] = url
response_dfs.append(df)
evaluation = pd.concat(response_dfs,ignore_index=True).rename(columns={0: 'question', 1: 'answer'})
- 1
- Combine all responses into one DataFrame
Evaluation samples
question | answer | title | url | |
---|---|---|---|---|
0 | Does the content provide original information, reporting, research, or analysis? | True | John Niemiera Stats: NBA Career | https://nbastats.pro/player/John_Niemiera |
1 | Does the content provide a substantial, complete, or comprehensive description of the topic? | True | John Niemiera Stats: NBA Career | https://nbastats.pro/player/John_Niemiera |
2 | Does the content provide insightful analysis or interesting information that is beyond the obvious? | True | John Niemiera Stats: NBA Career | https://nbastats.pro/player/John_Niemiera |
12 | Does the content provide original information, reporting, research, or analysis? | True | Maceo Baston Stats: NBA Career | https://nbastats.pro/player/Maceo_Baston |
13 | Does the content provide a substantial, complete, or comprehensive description of the topic? | True | Maceo Baston Stats: NBA Career | https://nbastats.pro/player/Maceo_Baston |
14 | Does the content provide insightful analysis or interesting information that is beyond the obvious? | True | Maceo Baston Stats: NBA Career | https://nbastats.pro/player/Maceo_Baston |
24 | Does the content provide original information, reporting, research, or analysis? | False | Marko Jaric Stats: NBA Career | https://nbastats.pro/player/Marko_Jaric |
25 | Does the content provide a substantial, complete, or comprehensive description of the topic? | True | Marko Jaric Stats: NBA Career | https://nbastats.pro/player/Marko_Jaric |
26 | Does the content provide insightful analysis or interesting information that is beyond the obvious? | False | Marko Jaric Stats: NBA Career | https://nbastats.pro/player/Marko_Jaric |
Evaluation summary
Averages
Code
criterion | evaluation | |
---|---|---|
0 | Does the content have any spelling or stylistic issues? | 54% |
1 | Does the content provide a substantial, complete, or comprehensive description of the topic? | 84% |
2 | Does the content provide insightful analysis or interesting information that is beyond the obvious? | 54% |
3 | Does the content provide original information, reporting, research, or analysis? | 52% |
4 | Does the content provide substantial value when compared to other pages in search results? | 48% |
5 | Does the main heading or page title avoid exaggerating or being shocking in nature? | 100% |
6 | Does the main heading or page title provide a descriptive, helpful summary of the content? | 100% |
7 | If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide substantial additional value and originality? | 52% |
8 | Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don't get as much attention or care? | 0% |
9 | Is the content produced well, or does it appear sloppy or hastily produced? | 26% |
10 | Is this the sort of page you'd want to bookmark, share with a friend, or recommend? | 48% |
11 | Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book? | 48% |
Counts
Code
criterion | evaluation | |
---|---|---|
0 | Does the content have any spelling or stylistic issues? | 27 |
1 | Does the content provide a substantial, complete, or comprehensive description of the topic? | 42 |
2 | Does the content provide insightful analysis or interesting information that is beyond the obvious? | 27 |
3 | Does the content provide original information, reporting, research, or analysis? | 26 |
4 | Does the content provide substantial value when compared to other pages in search results? | 24 |
5 | Does the main heading or page title avoid exaggerating or being shocking in nature? | 50 |
6 | Does the main heading or page title provide a descriptive, helpful summary of the content? | 50 |
7 | If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide substantial additional value and originality? | 26 |
8 | Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don't get as much attention or care? | 0 |
9 | Is the content produced well, or does it appear sloppy or hastily produced? | 13 |
10 | Is this the sort of page you'd want to bookmark, share with a friend, or recommend? | 24 |
11 | Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book? | 24 |
It’s quite interesting interesting that zero articles were evaluated as mass-produced. Those articles were actually produced by ChatGPT itself. ChatGPT is transcending itself!
And now we have a structured report evaluating fifty sample articles, each according to twelve structured questions. We can easily filter and chck according to any criteria we want.
There are actually thirty eight guidelines to check, and we can do the same with the full set of questions and for all the URLs for a more detailed evaluation.