Skip to content

Conversation

@cj-zhukov
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@cj-zhukov
Copy link
Contributor Author

High-Level Overview

This PR is an exploratory step to evaluate whether using a parser combinator library (nom) improves the clarity and robustness of the example documentation parsing logic.

In a previous PR #19750, the parsing of subcommands and example metadata in main.rs docs was implemented using ad-hoc string manipulation. While that approach works, this PR experiments with replacing that logic using nom for two functions:

  • parse_subcommand_line
  • parse_metadata_line

Personally, I found the nom-based implementation easier to read, reason about, and maintain. Expressing the grammar declaratively with a parser tool feels more natural for this kind of structured input, and the intent of the parsing logic is clearer compared to manual string slicing and conditionals.

That said, this PR is intentionally limited in scope. nom is currently used only for these two parsing helpers, and introducing a new dependency for such a narrow use case may not be justified on its own. The main open question is whether DataFusion would benefit from using nom more broadly for similar parsing tasks in the future.

If the project sees value in adopting nom for other parsing needs, this PR could serve as a small, contained starting point. Otherwise, it may be reasonable to stick with the existing ad-hoc approach to avoid dependency overhead.

Feedback on whether this trade-off is worthwhile is very welcome.

@cj-zhukov
Copy link
Contributor Author

I'd like to keep the parser simple for now. Currently, it can't handle extra symbols like () in the description of an example. In practice, only one group udf has this case, so I updated its README to remove the parentheses.

I'm happy to improve the parser in the future to handle such cases more robustly if needed. For now, this keeps the code readable and avoids unnecessary complexity.

@cj-zhukov
Copy link
Contributor Author

@Jefffrey since you helped with previous PRs related to example docs generation, it would be great if you could take a look at this one as well. Your feedback or any improvements would be much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant