Skip to content

fix: fetch full history for submodules to fix incorrect last-modified dates#783

Open
Aayushman-nvm wants to merge 3 commits intoprecice:masterfrom
Aayushman-nvm:issue404
Open

fix: fetch full history for submodules to fix incorrect last-modified dates#783
Aayushman-nvm wants to merge 3 commits intoprecice:masterfrom
Aayushman-nvm:issue404

Conversation

@Aayushman-nvm
Copy link

Pages from submodules were showing incorrect last-modified dates because actions/checkout does a shallow clone of submodules even when fetch-depth: 0 is set. This caused all submodule pages to show the same wrong date.

Fix: added a step to unshallow all submodules after checkout so Jekyll can
correctly determine the last commit date for each file.

Closes: Issue #404

Screenshots:
Before -
image

After -
image

@Aayushman-nvm
Copy link
Author

Still working on it

@MakisH MakisH added GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website labels Feb 23, 2026
@Aayushman-nvm
Copy link
Author

@MakisH For this issue i might've to modify vendor files... hook.rb to be exact
path: /vendor/bundle/ruby/3.3.0/gems/jekyll-last-modified-at-1.3.2/lib/jekyll-last-modified-at/hook.rb

is that fine?

@MakisH
Copy link
Member

MakisH commented Feb 23, 2026

Thanks for the contribution! That's indeed an annoying bug...

@MakisH For this issue i might've to modify vendor files... hook.rb to be exact path: /vendor/bundle/ruby/3.3.0/gems/jekyll-last-modified-at-1.3.2/lib/jekyll-last-modified-at/hook.rb

is that fine?

Doesn't sound right: /vendor/ is auto-generated and not even in this repository. Modifying it could cause all kinds of conflicts.

@Aayushman-nvm
Copy link
Author

Aayushman-nvm commented Feb 23, 2026

@MakisH haha true but i think u might wanna check this screenshot out:

deployed site:
image

My local site:
image

I made a slight change... just a single line change and now the dates are displayed as 14th jan... if u could clarify if 14th jan is ri8 then i think i might've fixed the issue

@Aayushman-nvm
Copy link
Author

for a context... I wrote a script (inject_dates.sh) that goes into each submodule, gets the real last-commit date for every markdown file using git, and injects it as last_modified_at into the frontmatter before Jekyll builds.

then changed = to ||= in hook.rb so the plugin respects existing frontmatter values instead of always overwriting with its own (broken for submodules) git detection.

added two new steps in deploy.yml... one of them is not yet commited

@Aayushman-nvm
Copy link
Author

@MakisH here's what ive done so far:

inject_dates.sh, a new file im adding to the repo root. This is a shell script that before Jekyll builds, goes into each submodule, finds every markdown file, gets its real last git commit date, and writes it into the file's frontmatter as last_modified_at.

.github/workflows/deploy.yml, in this file ive added 3 new steps: unshallow submodules so full git history is available, run inject_dates.sh, and patch the plugin's hook.rb to respect frontmatter instead of overwriting it.

The reason I patched patch = to ||= in hook.rb is that the plugin runs git log from the top level git directory, which has no history for files inside submodules so it falls back to filesystem mtime (always today's date). By changing to ||=, the plugin still works as before for all regular pages, but respects the last_modified_at we inject via inject_dates.sh for submodule files where the plugin's git detection fails. Pages that don't have last_modified_at in their frontmatter are completely unaffected.

@Aayushman-nvm
Copy link
Author

In case you’re going to test and review this branch locally, you need to run the following scripts in sequence:

  1. Run the date injection script:
bash inject_dates.sh
  1. Patch the jekyll-last-modified-at hook:
 sed -i 's/item\.data\["last_modified_at"\] = Determinator/item.data["last_modified_at"] ||= Determinator/' \
  vendor/bundle/ruby/*/gems/jekyll-last-modified-at-*/lib/jekyll-last-modified-at/hook.rb
  1. Start the Jekyll server:
bundle exec jekyll serve -l

We don’t need to worry about running these commands manually once the code is merged, they’re already configured in the deploy.yml files.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect last-modified dates displayed on website pages sourced from Git submodules. The issue stems from GitHub Actions' actions/checkout performing shallow clones of submodules even when fetch-depth: 0 is specified, resulting in all submodule pages showing the same incorrect date. The fix involves three coordinated changes: unshallowing all submodules after checkout, extracting correct last-commit dates from full Git history and injecting them into YAML frontmatter, and patching the jekyll-last-modified-at gem to respect these frontmatter values instead of computing them from the (incomplete) shallow Git history.

Changes:

  • Added a new shell script (inject_dates.sh) that traverses all submodules, extracts the true last-commit date for each markdown file from Git history, and injects or updates the last_modified_at field in YAML frontmatter
  • Modified the deployment workflow to unshallow submodules, run the date injection script, and patch the jekyll-last-modified-at Ruby gem to respect frontmatter dates
  • Implemented a workaround for the actions/checkout shallow clone limitation by combining Git history extraction with runtime gem patching

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
inject_dates.sh New bash script that extracts last-commit dates from Git history and injects them into YAML frontmatter of markdown files in all submodules
.github/workflows/deploy.yml Adds workflow steps to unshallow submodules, run the date injection script, and patch the jekyll-last-modified-at gem to use frontmatter dates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fsimonis
Copy link
Member

Hi, I have some questions:

  1. Why do you unshallow each submodule instead of replacing the checkout action with a standard deep git clone? At this point we just need a simple clone and disable all the features that the action does.
  2. Why doesn't jekyll-last-modified-at detect the correct times now? The information is in git.
  3. What is the point of patching files and jekyll-dependencies for a feature that will be removed as part of the GSoC project?

@Aayushman-nvm
Copy link
Author

Hi @fsimonis :
Answering ur questions -

1:- The workflow already used actions/checkout consistently for all three checkouts. Replacing just the website checkout with a plain git clone while keeping the action for the other two would be inconsistent. As a contributor, I wanted minimal changes to the existing working CI rather than restructuring it. The unshallow step was the least invasive addition to make submodule history available. I'm happy to replace it with a plain deep clone if you prefer.

2:- Even though the history is in git, the plugin architecturally can't reach it. Looking at determinator.rb, it runs git log with --git-dir set to git.top_level_directory the parent repo's .git directory (This repo). From the parent repo's perspective, a submodule is just a commit pointer, not a folder with individual file histories. So when the plugin computes relative_path_from_git_dir for say imported/tutorials/quickstart/README.md, it looks for that path in the parent repo's git history... finds nothing... and falls back to mtime which is always today in CI. The information is in git but inside each submodule's own .git, which the plugin never looks into.

3:- My thought was the migration to Hugo will probably take some time and the Jekyll site will remain in production the entire time. Hugo will be developed in a separate branch and only replace Jekyll once it's a complete and tested replica, after which Bootstrap and UI work still follows. Until that point all pages will show wrong dates and users will see that... and for a documentation site that's not great. The project will start in end of May with main coding work beginning in May / June, so users would be seeing wrong dates for several months while the Jekyll site is still live.
I also looked into whether this approach carries over to Hugo and it doesn't. Hugo with enableGitInfo = true handles submodule dates natively without any of these workarounds, so this fix is intentionally Jekyll-only and throwaway, meant only to cover the gap between now and when the Hugo migration goes live in production... keeping what users see on the site in mind. That said, the call is completely yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSoC Contributed in the context of the Google Summer of Code technical Technical issues on the website

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants