Google Sitemap Addon - Are You Lying?

I certainly am lying to the search engines and you probably are too whereas we could be penalized for it.

It is said that it is best practice to generate a sitemap.xml on a daily basis or at least when new content has been added or old content has been deleted. So I have a cronjob set up to regenerate the sitemap everyday. The problem with the current Google Sitemap Addon is that for the tag, the current date is used at the time of generation and that is where the lie comes into play. I have hundreds of thousands of products and there is no way in hell that they are updated everyday with something new.

I have been noticing that search engines haven't been crawling nor indexing my sites very much and I have been trying to find a reason why. I don't know if this is the key reason but it's the only thing I have run across that could be the issue since everything else seems to be in order.

The fix doesn't look very simple since there are 5 different groups.

Homepage - There is nowhere to pull a date from so it would have to stay with current date which is okay with me because I do have content that changes pretty much on a daily basis.

Products - This is the most important. The date needs to be pulled from updated_timestamp else timestamp in the cscart_products table.

Categories - The date needs to be pulled from timestamp in the cscart_categories table. If new content is added, you would have to change the Creation date manually.

Pages - The date needs to be pulled from timestamp in the cscart_pages table. If new content is added, you would have to change the Creation date manually.

Brands/Extended Features - There is nowhere to pull a date from.

I was going to post this in bug tracker but I imagine that all they would say is "Working as designed".

The date for lastmod looks like it is coming from this function in /app/addons/google_sitemap/func.php.

function fn_google_sitemap_get_content($map_page = 0)
{
    $sitemap_settings = Registry::get('addons.google_sitemap');
    $location = fn_get_storefront_url(fn_get_storefront_protocol());
$lmod = date("Y-m-d", TIME);

Looking at the rest of the file, I am not sure of the best way to go about adding the timestamps for each group. If anyone has any ideas, please let me know. Otherwise I will keep working on it and update with any solutions.

Different objects store (or don't store) the last update timestamp I.e products now do, but pages don't (I don't think). So generally each object may or may not have an update_timestamp column in the db. If they do then that will be the last update. If not, then you might ave to use either the create date (usually just called timestamp) or you will have to use the current time (constant TIME or function time()).

To be correct, all objects should have a create timestamp and an update timestamp (but the update timestamp needs to only be set if data was actually changed, not just a save).

Hey Tony,

Yeah, I listed all of that in the first post.

Something needs to be done because having every single product, category, page, feature showing that it has new content everyday isn't going to cut it.

At least pulling the timestamp for products, pages and categories would be better than the current date. If you have any ideas how to implement it, I would appreciate it. I've tried a couple of things already without success.

The following is not tested and comes of the top of my balding head....

create a function to retrieve the appropriate timestamp. I.e.something like:

function my_mod_time($object, $object_id) {
    $field = 'timestamp';
    $table = "?:$object";
    $condition_column = preg_replace(';(.*)s$;$1;'),$object_id)."_id";
    switch($object) {
        case 'products':
            $field = 'updated_timestamp';
            break;
        case 'categories':
        case 'pages':
            // These only have a timestamp, no updated_timestamp field.
            break;
        // Add more 'objects' as needed.
    }
    return db_get_field("SELECT $field FROM $table WHERE $condition_column = $object_id");
}            

Then replace TIME with my_mod_time('products', $product_id) or whatever the context is.

You can pass this on to cs-cart if you want. Who knows, maybe while their busy with 4.4.1 they might be able to sneak in updated_timestamp for all objects and then use it. We could then write some addons that would allow a merchant to know who changed what and when. They should add an updated_by field that would be the user_id of the admin doing the update.

Note that adding those columns to the objects and then updating them could all be done via hooks so it doesn't really require core changes. Just some tedious work.

Tony, please correct your function. For example, if you use the "products" object, it will try to get updated_timestamp by products_id and error will be returned.

I didn't think of creating a function for it. Since it defines $lmod in the beginning of the function then calls it for each group I was thinking of using a series of if statements. Another thing that I just thought of is that the timestamp needs to be converted to date maybe with fn_timestamp_to_date.? Below is the complete default function.

function fn_google_sitemap_get_content($map_page = 0)
{
    $sitemap_settings = Registry::get('addons.google_sitemap');
    $location = fn_get_storefront_url(fn_get_storefront_protocol());
$lmod = date("Y-m-d", TIME);

// HEAD SECTION

$simple_head = <<

HEAD;

$simple_foot = <<
FOOT;
$index_map_url = <<
    $location/
    $lmod
    $sitemap_settings[site_change]
    $sitemap_settings[site_priority]
\n

HEAD;

// END HEAD SECTION

$parts = 0;
if ($sitemap_settings['include_categories'] == "Y") {
    $parts++;
    $get_categories = true;
}
if ($sitemap_settings['include_products'] == "Y") {
    $parts++;
    $get_products = true;
}
if ($sitemap_settings['include_pages'] == "Y") {
    $parts++;
    $get_pages = true;
}
if ($sitemap_settings['include_extended'] == "Y") {
    $parts++;
    $get_features = true;
}
if (fn_allowed_for('MULTIVENDOR') && $sitemap_settings['include_companies'] == 'Y') {
    $parts++;
    $get_companies = true;
}

fn_set_progress('parts', $parts);

// SITEMAP CONTENT
$link_counter = 1;
$file_counter = 1;

$sitemap_path = fn_get_files_dir_path(false) . 'google_sitemap/';
fn_rm($sitemap_path);
fn_mkdir($sitemap_path);

$file = fopen($sitemap_path . 'sitemap' . $file_counter . '.xml', "wb");
fwrite($file, $simple_head . $index_map_url);

$languages = db_get_hash_single_array("SELECT lang_code, name FROM ?:languages WHERE status = 'A'", array('lang_code', 'name'));

if (!empty($get_categories)) {
    $categories = db_get_fields("SELECT category_id FROM ?:categories WHERE FIND_IN_SET(?i, usergroup_ids) AND status = 'A' ?p", USERGROUP_ALL, fn_get_google_sitemap_company_condition('?:categories.company_id'));

    fn_set_progress('step_scale', count($categories));

    //Add the all active categories
    foreach ($categories as $category) {
        $links = fn_google_sitemap_generate_link('category', $category, $languages);
        $item = fn_google_sitemap_print_item_info($links, $lmod, $sitemap_settings['categories_change'], $sitemap_settings['categories_priority']);

        fn_google_sitemap_check_counter($file, $link_counter, $file_counter, $links, $simple_head, $simple_foot, 'categories');

        fwrite($file, $item);
    }

}

if (!empty($get_products)) {
    $total = ITEMS_PER_PAGE;
    $i = 0;

    $params = $_REQUEST;
    $params['custom_extend'] = array('categories');
    $params['sort_by'] = 'null';
    $params['only_short_fields'] = true; // NEEDED ONLY FOR NOT TO LOAD UNNECESSARY FIELDS FROM DB
    $params['area'] = 'C';

    $original_auth = Tygh::$app['session']['auth'];
    Tygh::$app['session']['auth'] = fn_fill_auth(array(), array(), false, 'C');

    fn_set_progress('step_scale', db_get_field("SELECT COUNT(*) FROM ?:products WHERE status = 'A'"));

    while ($params['pid'] = db_get_fields("SELECT product_id FROM ?:products WHERE status = 'A' ORDER BY product_id ASC LIMIT $i, $total")) {
        $i += $total;

        list($products) = fn_get_products($params, ITEMS_PER_PAGE);

        foreach ($products as $product) {
            $links = fn_google_sitemap_generate_link('product', $product['product_id'], $languages);
            $item = fn_google_sitemap_print_item_info($links, $lmod, $sitemap_settings['products_change'], $sitemap_settings['products_priority']);

            fn_google_sitemap_check_counter($file, $link_counter, $file_counter, $links, $simple_head, $simple_foot, 'products');

            fwrite($file, $item);
        }
    }
    unset($products);

    Tygh::$app['session']['auth'] = $original_auth;
}

if (!empty($get_pages)) {

    $page_types = fn_get_page_object_by_type();
    unset($page_types[PAGE_TYPE_LINK]);

    list($pages) = fn_get_pages(array(
        'simple' => true,
        'status' => 'A',
        'page_type' => array_keys($page_types)
    ));
    fn_set_progress('step_scale', count($pages));

    //Add the all active pages
    foreach ($pages as $page) {
        $links = fn_google_sitemap_generate_link('page', $page['page_id'], $languages, $page);
        $item = fn_google_sitemap_print_item_info($links, $lmod, $sitemap_settings['pages_change'], $sitemap_settings['pages_priority']);

        fn_google_sitemap_check_counter($file, $link_counter, $file_counter, $links, $simple_head, $simple_foot, 'pages');

        fwrite($file, $item);
    }
}

if (!empty($get_features)) {
    $vars = db_get_fields(
        "SELECT ?:product_feature_variants.variant_id FROM ?:product_feature_variants " .
        "LEFT JOIN ?:product_features ON (?:product_feature_variants.feature_id = ?:product_features.feature_id) " .
        "WHERE ?:product_features.feature_type = ?s AND ?:product_features.status = 'A'"
    , ProductFeatures::EXTENDED);
    fn_set_progress('step_scale', count($vars));

    //Add the all active extended features
    foreach ($vars as $var) {
        $links = fn_google_sitemap_generate_link('extended', $var, $languages);
        $item = fn_google_sitemap_print_item_info($links, $lmod, $sitemap_settings['extended_change'], $sitemap_settings['extended_priority']);

        fn_google_sitemap_check_counter($file, $link_counter, $file_counter, $links, $simple_head, $simple_foot, 'features');

        fwrite($file, $item);
    }
}

if (!empty($get_companies)) {
    $companies = db_get_fields("SELECT company_id FROM ?:companies WHERE status = 'A' ?p", fn_get_google_sitemap_company_condition('?:companies.company_id'));
    fn_set_progress('step_scale', count($companies));

    if (!empty($companies)) {
        foreach ($companies as $company_id) {
            $links = fn_google_sitemap_generate_link('companies', $company_id, $languages);
            $item = fn_google_sitemap_print_item_info($links, $lmod, $sitemap_settings['companies_change'], $sitemap_settings['companies_priority']);

            fn_google_sitemap_check_counter($file, $link_counter, $file_counter, $links, $simple_head, $simple_foot, 'companies');

            fwrite($file, $item);
        }
    }
}

fn_set_hook('sitemap_item', $sitemap_settings, $file, $lmod, $link_counter, $file_counter);

fwrite($file, $simple_foot);
fclose($file);

if ($file_counter == 1) {
    fn_rename($sitemap_path . 'sitemap' . $file_counter . '.xml', $sitemap_path . 'sitemap.xml');
} else {
    // Make a map index file

    $maps = '';
    $seo_enabled = Registry::get('addons.seo.status') == 'A' ? true : false;
    for ($i = 1; $i <= $file_counter; $i++) {
        if ($seo_enabled) {
            $name = $location . '/sitemap' . $i . '.xml';
        } else {
            $name = fn_url('xmlsitemap.view?page=' . $i, 'C', fn_get_storefront_protocol());
        }

        $name = htmlentities($name);
        $maps .= <<
    $name
    $lmod
\n

MAP;
}
$index_map = <<

$maps

HEAD;

    $file = fopen($sitemap_path . 'sitemap.xml', "wb");
    fwrite($file, $index_map);
    fclose($file);
}
fn_set_notification('N', __('notice'), __('google_sitemap.map_generated'));
exit();

}

Tony, please correct your function. For example, if you use the "products" object, it will try to get updated_timestamp by products_id and error will be returned.

I made an adjustment above to remove the plural. Again, this is untested.