Changelog#

Unreleased #

Added#

Changed#

Fixed#

Deprecated#

Removed#

Security#

[0.2.6] - 2026-5-20#

Added#

PR #97
- Docstrings and type checking added to headers where needed.

Changed#

PR #97
- All references to do_verbose were replaced with logger.info or logger.debug.
- Existing docstrings are compliant with numpydoc standards.

Removed#

PR #97
- Removed base.py. All functions were moved to other core libraries or to a new utility module, core_utils.py, with no storage dependencies.

[0.2.5] - 2026-3-31#

Removed#

PR #93
- Removed ML/BERT code, documentation, and tests.

[0.2.4] - 2025-12-15#

Added#

PR #92
- Update bibcat llm metrics to collect bibcodes for confusion matrix cells
PR #86
- github action for auto-release when git tag pushed and minor doc updates

Changed#

PR #80
- Refactored check_truematch for readability

Fixed#

Unpinned tensorflow-metal version.

[0.2.2] - 2025-09-29#

Added#

PR #82
- a field_validator to ensure that the LLM output classification is only one of the allowed keywords

Changed#

PR #82
- Updated Confusion matrix (CM) plot text annotation of the mission names.

Fixed#

PR #82
- The bug in grouped_df["llm_mission"] = mission_df["llm_mission"].str.upper() in prepare_output() was fixed. This bug caused KeyError:nan error due to mismatched index between mission_df and grouped_df when there were both the papertypes for the same bibcode in grouped_df.
- When there is no llm output for a bibcode, but human classification exists, the output still outputs human classification in summay_output.
- Updated metrics.py and stats.py to account human classifications but not to count when source paper is not found;

0.2.1 - 2025-09-26#

Changed#

PR #85
- Changed Sphinx theme to book and updated documentation and updated docs.

0.2.0 - 2025-09-22#

Removed#

PR #48
- Removed conda env file.
PR #9
- Deleted test_bibcat.py
- Deleted the same set of test gloabal variables assigned in multiple test scripts
PR #7
- Deleted all previous codes and files for a fresh start

Changed#

PR #78
- The combined dataset link is updated to use updated papertrack with flagship gold sample verdict
PR #77
- Reorganizing the bibcat CLI commands
  - All llm-based grouped under llm sub-command
  - Batch llm commands grouped under llm batch sub-command
  - All _ or - command names shortened, e.g. run-gpt to llm run, or audit_llm to llm audit
  - Added a new ml sub-command group and moved the NLP cli commands underneath
PR #69
- Expanding the list of keyword objects in parameters.py
- Fix a bug that falsely identify mission names used in kw_mission in user_prompt, in_text, and hallucinated_by_llm in summary_output due to uppercasing mission names and the relevant tests. Missions that we pass into identify_missions_in text() need to be original case so that paper processing correctly handles ambiguous keyword phrases.
- A minor update on user_prompt to spell out IUE
- Pip installation updates in README.md
PR #68 -Inconsistent_classifications.json was revised and separated from bibcat stats-llms
- Updated metrics_summary.json to include confusion matrix metrics
PR #66
- Moved _process_database_ambig, _extract_core_from_phrase, _streamline_phrase, and _check_truematch from base.py to paper.py
- Updated tests to read from Paper() object instead of Base() object
PR #64 Update ROC input and docs
PR #63 Refactored to use newer OpenAI Responses API and remove deprecated Assistants API.
PR #62 Update metrics.py and its pytest
PR #61 Update InfoModel response with enum
PR #56adds the MAST mission simple keyword text match to the user prompt
PR #54 Sanitizing keywords
PR #53 ROC curve bug fixes, add more evaluation metrics, etc
PR #47 New calculations for evaluation confidence values for multiple GPT runs
PR #46
- Grouping the BERT model method into the pretrained folder
- Created PRETRAINED_README.md and updated the main README.md
PR #29
- Refactored the ML classifier to allow for other tensorflow models, and for adding other libraries, e.g. pytorch, down the line.
PR #23
- Setting a new config for the directory of papers for operational classification with a fake JSON file
- Refactored fetch_paper.py
- Other relevant updates and minor updates
PR #22, PR #23
- The is_keyword method is replaced with the identify_keyword method.
PR # 21
- evaluate and classify are now separate CLI options.
PR #19 PR #20
- get_config() error fix
- _add_word() temporary fix
- merger erorr fix for config parameters
PR #18
- Fix ddict type errors
PR #16 # 17
- Consolidated all config into a single bibcat_config.yaml YAML file.
- Moved bibcat output to outside the package directory
- Added support for user custom configuration and settings
- Migrated code to use new config object, a dottable dictionary to retain the old config syntax
PR #14
- fixed various type annotation errors while refactoring classify_papers.py and other related modules such as performance.py or operator.py.
- all output results will be saved under a subdirectory of the given model run in the output directory.
- classify_papers.py will produce both evaluation results and classification results per method, rather than combined results of both the RB and ML methods. This way will allow users to choose a classification method using CLI once CLI is enabled.
PR #13
- Enabling build_model.py to be both a module and a main script.
PR #12
- Refactoring build_model.py has started, the first part includes to
  - extract generate_direcotry_TVT() from core/classifiers/textdata.py to create a stand-alone module, split_dataset.py
  - modify to store the training, validation, and test (TVT) data set directories under the data/partitioned_datasets directory
- The second part in refactoring required some relevant changes to implement the new modules and updating build_module.py accordingly.
  - build_model.py,base.py, operator.py, config.py, etc.
PR #11
- Renamed create_model.py to build_model.py
- Updated README.md
- Updated config.py to create variables to support the new script, build_dataset.py
PR #10
- Renamed test_core to core
- Renamed test_data to data
PR #9
- The test global variables are called directly in the script rather than using redundantly reassigned to other variables.
- Moved test Keyword-object lookup variables to parameters.py
PR #8
- Refactored classes.py into several individual class files below, which were moved to the new folder names, core and core/classifiers.
  - core: base.py, grammar.py, keyword.py, operator.py, paper.py, performance.py
  - core/classifiers:
    - _Classfier(): ClassifierBase() in textdata.py,
    - Classifier_ML: MachineLearningClassifier() in ml.py,
    - Classifier_Rules: RuleBasedClassifier() in rules.py
- Continued formatting and styling these class files using Ruff and the line length set to 120
- Updated module updates according to the refactoring
- Updated CHANGELOG.MD and pyproject.toml
PR #7
- Updated the main README file
- updated formatting and styling

Added#

PR #83
- added .readthedoc.yaml for the readthedoc documentation pages
PR #81
- Added support for chunk planning/submission for large batches to OpenAI Batch API
PR #79
- Adding new batch cli commands for submitting a batch job using OpenAI’s Batch API
- Added new bibcat llm batch submit and bibcat llm batch retreive for submitting and retrieving batch jobs
PR #74 Add a bash script to run bibcat multiple batch input files serially
PR #68 Add a new CLI, audit-llms to create a json file to store failure modes stats and the breakdown information for failed bibcodes.
PR #48
- Set up Sphinx autodoc build
PR #44
- Updated LLM prompt to include its rationale and reasoning in the output
- Switch to OpenAI Structured Response output, using pydantic models to control output
PR #43
- pre-commit-hook setup
- GitHub CI/CD action pipeline for linting/formatting and pytests
PR #40
- Add stats-llm.py to output statistics results from the evaluation summary output and the operational gpt results
- pytests (test_stats_llm.py) and llm README.md updated
PR #38
- Add option to run gpt-batch multiple times
PR #35
- Implement performance evaluation metrics and plots
PR #34
- Added a summary output code for evaluation
PR #32
- Added unit test for build_dataset.py
PR #31
- Implemented ChatGPT agent prompt engineering approach to classify papers
- Added a basic classification output
PR #27
- Added a CLI option to build the combined dataset from the papertrack data and papertext (from ADS) data and refactored build_dataset.py.
- Enabled dynamic version control
- Readme update: clarify the workflow in Quick Start; the use of fetching papers using the do_evaluation keyword when bibcat classify and bibcat evaluate
PR #18
- Added new click cli for bibcat
PR #14
- Refactored classify_papers.py and created a few modules, which are called in classify_papers.py. These modules could be executed based on CLI options once they are employed.
  - fetch_papers.py : fetching papers from the dir_test data directory to the bibcat pipeline. This needs an update to fetch operational data using the dir_datasets argument in this module.
  - operate_classifier.py: the main purpose of this module is to use only one method, classify the input papers, and output classification results as a JSON file for operation.
  - evaluate_basic_performance.py : this module employes two performance functions to evaluate test paper classification and produce relevant files and a confusion matrix if a ML method is used.
- created fakedata.txt in /bibcat/data/operational_data/ to test operational classification with simple ascii text
- created fake_testdata.json, which has paper classification with its associated simple text, for testing and performance evaluation.
- included additional VS code ruff setting to pyproject.toml
PR #12
- The second part of refactoring build_model.py includes
  - create a new module, model_settings.py to set up various model related variables. This eventually will relocating other model related variables in config.py to this module in the near future.
  - Created streamline_dataset.py to streamline the source data equipped to be ML input dataset. It does load the source dataset and streamline the dataset.
  - Created partition_dataset.py to split the streamlined dataset into the train, validation, and test dataset for DL models.
PR # 11
- Create a new script to build the input dataset. It’s called build_dataset.py
- Added some information about the data folder in README.rst
- Added init.py
PR #9
- test_bibcat.py was refactored into several sub test scripts.
  - tests/test_core/test_base.py
  - tests/test_core/test_grammar.py
  - tests/test_core/test_keyword.py
  - tests/test_core/test_operator.py
  - tests/test_core/test_paper.py
  - tests/test_data/test_dataset.py
PR #8
- Created a new folder named core to store all refactored class scripts
- Added more description to each class script and other main scripts.
PR #7
- Started with open astronomy cookiecutter template for bibcat
- Re-organized the file structure (e.g., bibcat/bibcat/) and modified the file names
  - bibcat_classes.py to classes.py
  - bibcat_config.py to config.py
  - bibcat_parameters.py to parameters.py
  - bibcat_tests.py to test_bibcat.py
- Refactor classes.py into several individual class script under the core directory
- Created two main scripts
  - create_model.py : this script can be run to create a new training model
  - classify_papers.py : this script will fetch input papers, classify them into the designated paper categories, and produce performance evaluation materials such as confusion matrix and plots
- Created CHANGELOG.md

0.1.0 - 2024-01-29#

Initial tag to preserve code before refactor

Changelog

Contents

Changelog#

Unreleased#

Added#

Changed#

Fixed#

Deprecated#

Removed#

Security#

[0.2.6] - 2026-5-20#

Added#

Changed#

Removed#

[0.2.5] - 2026-3-31#

Removed#

[0.2.4] - 2025-12-15#

Added#

Changed#

Fixed#

[0.2.2] - 2025-09-29#

Added#

Changed#

Fixed#

0.2.1 - 2025-09-26#

Changed#

0.2.0 - 2025-09-22#

Removed#

Changed#

Added#

0.1.0 - 2024-01-29#

Unreleased #