Sitemap as a guide for larger or more complex websites in Django

Django

Overview

Sitemaps are very important in making complex and large size websites of different types like EdTech, eCommerce, and so on across the internet. The Django sitemap framework automatically creates this XML file by expressing this information in Python code. In this article, we are going to see how sitemaps are used in the Django web development framework, which is very interesting and useful to know for web developers who design websites using Python and Django Development frameworks.

Importance of Sitemap

A Search Engine is a software system that enables users to locate information on the Internet. It crawls and indexes the site’s URLs to show them in their search results. Sitemap contains information about the pages, images, videos, and other files of the site. It not only provides information about the pages, videos, and other files that are important, but it also states when the page was last updated and how often the page is changed.

If the site’s pages are properly linked, Search Engines can usually discover most of the site. Proper linking means that all pages that deem important can be reached through some form of navigation, be that the site’s menu or links that are placed on pages. However, a sitemap can enhance the crawling of larger or more complex sites or more specialized files. As we know Django is used to build larger and more complex websites. That’s why It is important to know how to set a sitemap up in any Django project to help crawl and index the site in the search engines.

Installation

To install the sitemap app, these steps need to be followed:
1. Need to make sure that the site’s framework is already installed.
2. Adding ‘django.contrib.sitemaps’ to the INSTALLED_APPS setting

INSTALLED_APPS = [
    'django.contrib.sites',
    'django.contrib.sitemaps'
]

3. Need to make sure the TEMPLATES setting contains a DjangoTemplates backend whose APP_DIRS options is set to True. It’s in there by default, so only need to change this if that setting is changed.

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [
            os.path.join(BASE_DIR, 'templates')
        ],
        'APP_DIRS': True
    },
]

Initialization

To generate a sitemap in the website need to add the below line into URL conf:

from django.contrib.sitemaps import views
urlpatterns = [
    path('sitemap.xml', views.index, {'sitemaps': sitemaps})
]

This is for building a sitemap when a client accesses /sitemap.xml.
The location of the sitemap file is important, not the name. Search engines will only index links in the sitemap for the current URL level.
The sitemap view takes an extra, necessary argument: {‘sitemaps’: sitemaps}. Sitemaps should be a dictionary that maps a short section label (e.g., Course Batches or Course Faqs) to its Sitemap class (e.g., CourseBatchesSitemap or CourseFaqsSitemap). It may also map to an instance of a Sitemap class (e.g., CourseBatchesSitemap(sitemaps.Sitemap). We have generated the sitemap for an EdTech Website named EduMple. There are Batches and Faqs based on Courses. CourseBatchesSitemap and CourseFaqsSitemap are two of the other sitemap classes.

sitemaps = {
    'course-batches': CourseBatchesSitemap,
    'course-faqs': CourseFaqsSitemap
}

Sitemap Classes

A Sitemap class is a Python class that represents a ‘’section’’ of entries in the sitemap. For example, one Sitemap class could represent all the entries of the Batch(CourseBatchesSitemap).
In the simplest case, all these sections get lumped together into one sitemap.xml, but it’s also possible to use the Django framework to generate a sitemap index that references individual sitemap files, one per section.
Sitemap classes must subclass django.contrib.sitemaps.Sitemap. They can live anywhere in the codebase.

from django.contrib import sitemaps
from edumple.models import Courses
 
courses = Courses.objects.filter(deleted=0)
class CourseBatchesSitemap(sitemaps.Sitemap):
    changefreq = "daily"
    priority = 0.8
    protocol = "https"
 
    def items(self):
        return courses
 
    def lastmod(self, obj):
        return obj.modified
 
    def location(self, item):
        return reverse('courseBatches', args=[item.slug])

Sitemap Class Reference

A Sitemap class defines the following methods/attributes:
items:
Required. A method that returns a QuerySet of objects. The framework doesn’t care what type of objects they are; all that matters is that these objects get passed to the lastmod(), location(), changefreq(), and priority() methods.
location:
Optional. Either a method or attribute
For a method, it should return the absolute path for a given object as returned by items().
For an attribute, its value should be a string representing an absolute path to use for every object returned by items().
In both the above cases, “absolute path” means a URL that doesn’t include the protocol or domain.
Examples:

  • Good: ‘courses/cbse-class-6/batches’
  • Bad: ‘www.edumple.com/courses/cbse-class-6/batches’
  • Bad: ‘https://www.edumple.com/courses/cbse-class-6/batches’

If location isn’t provided, the framework will call the get_absolute_url() method on every object as returned by items().
For Example:
Course slug is “cbse-class-6” and in the URL:

urlpatterns = [
    path('courses/<slug:course_slug>/batches', views.courseBatches, name='courseBatches'),
    path('courses/<slug:course_slug>/faqs', views.courseFaqs, name='courseFaqs')
]

To specify a protocol other than ‘http’, the protocol needs to be used.
lastmod:
Optional. Either a method or attribute.
For a method, it should take one argument – an object as returned by items() – and return that object’s last-modified date/time as a DateTime.
For an attribute, its value should be a DateTime representing the last-modified date/time for every object returned by items()
changefreq:
Optional. Either a method or attribute.
For a method, it should take one argument – an object as returned by items() – and return that object’s change frequency as a string.
For an attribute, its value should be a string representing the change frequency of every object returned by items().
Possible values for changefreq, whether we use a method or attribute, are:

  • ‘always’
  • ‘hourly’
  • ‘daily’
  • ‘weekly’
  • ‘monthly’
  • ‘yearly’
  • ‘never’

Priority:
Optional. Either a method or attribute.
For a method, it should take one argument – an object as returned by items() – and return that object’s priority as either a string or float.
For an attribute, its value should be either a string or float representing the priority of every object returned by items().
Protocol:
Optional. This attribute defines the protocol (‘https’ or ‘http’) of the URLs in the sitemap. If it isn’t set, the protocol with which the sitemap was requested is used. If the sitemap is built outside the context of a request, the default is ‘http’
Some other class references are:

  • limit
  • i18n
  • languages
  • alternates
  • x_default

Sitemap Indexing

The sitemap framework also has the ability to create a sitemap index that references individual sitemap files, one per section defined in the sitemap dictionary.For that django.contrib.sitemaps.views.sitemap() needs to be used.

urlpatterns = [
path('<section>-sitemap.xml', views.sitemap, {'sitemaps': sitemaps}, name='django.contrib.sitemaps.views.sitemap')
]

Example

In edumple/setting.py:

INSTALLED_APPS = [
    "django.contrib.sites",
    "django.contrib.sitemaps"
]

TEMPLATES = [
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "DIRS": [
            os.path.join(BASE_DIR, "templates")
        ],
        "APP_DIRS": True
    },
]

In edumple/models.py:

class Courses(models.Model):
    slug = models.CharField(max_length=255)
    deleted = models.IntegerField(default=False)
    class Meta:
        db_table = "courses"
        verbose_name_plural = "courses"

In edumple/urls.py file:

from django.urls import reverse
from edumple.models import Courses
from django.urls import path
from django.contrib.sitemaps import views


courses = Courses.objects.filter(deleted=0)
protocol = "https"
class CourseBatchesSitemap(sitemaps.Sitemap):
    changefreq = "daily"
    priority = 0.8
    protocol = protocol

    def items(self):
        return courses

    def lastmod(self, obj):
        return obj.modified

    def location(self, item):
        return reverse("courseBatches", args=[item.slug])

class CourseFaqsSitemap(sitemaps.Sitemap):
    changefreq = "weekly"
    priority = 0.5
    protocol = protocol

    def items(self):
        return courses

    def lastmod(self, obj):
        return obj.modified

    def location(self, item):
        return reverse("courseFaqs", args=[item.slug])



sitemaps = {
    "course-batches": CourseBatchesSitemap,
    "course-faqs": CourseFaqsSitemap
}

urlpatterns = [
    path('sitemap.xml', views.index, {'sitemaps': sitemaps}),
path('<section>-sitemap.xml', views.sitemap, {'sitemaps': sitemaps}, name='django.contrib.sitemaps.views.sitemap'),
path('courses/<slug:course_slug>/batches', views.courseBatches, name='courseBatches'),
path('courses/<slug:course_slug>/faqs', views.courseFaqs, name='courseFaqs')
]

Output

by hitting “/sitemap.xml”:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>https://www.edumple.com/course-batches-sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.edumple.com/course-faqs-sitemap.xml</loc>
</sitemap>
</sitemapindex>

by hitting “/course-batches-sitemap.xml”:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.edumple.com/courses/cbse-class-6/batches</loc>
<lastmod>2022-06-24</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://www.edumple.com/courses/cbse-class-7/batches</loc>
<lastmod>2022-06-23</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>

by hitting “/course-faqs-sitemap.xml”

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.edumple.com/courses/cbse-class-6/faqs</loc>
<lastmod>2022-06-24</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://www.edumple.com/courses/cbse-class-7/faqs</loc>
<lastmod>2022-06-23</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>

Conclusion

A well-structured sitemap makes the website searchable by all search engines, providing users with more accurate search results when they search for keywords related to the content provided. The main advantage of implementing a website with the Sitemap class is that the website will receive more specifically focused organic traffic from search engines. This is why, most of the EdTech, eCommerce websites are using sitemaps to store data and get more organic traffic.

Authors

Author