Changelog#

Unreleased#

Added#

Changed#

Fixed#

Deprecated#

Removed#

Security#

[0.2.6] - 2026-5-20#

Added#

  • PR #97

    • Docstrings and type checking added to headers where needed.

Changed#

  • PR #97

    • All references to do_verbose were replaced with logger.info or logger.debug.

    • Existing docstrings are compliant with numpydoc standards.

Removed#

  • PR #97

    • Removed base.py. All functions were moved to other core libraries or to a new utility module, core_utils.py, with no storage dependencies.

[0.2.5] - 2026-3-31#

Removed#

  • PR #93

    • Removed ML/BERT code, documentation, and tests.

[0.2.4] - 2025-12-15#

Added#

  • PR #92

    • Update bibcat llm metrics to collect bibcodes for confusion matrix cells

  • PR #86

    • github action for auto-release when git tag pushed and minor doc updates

Changed#

  • PR #80

    • Refactored check_truematch for readability

Fixed#

  • Unpinned tensorflow-metal version.

[0.2.2] - 2025-09-29#

Added#

  • PR #82

    • a field_validator to ensure that the LLM output classification is only one of the allowed keywords

Changed#

  • PR #82

    • Updated Confusion matrix (CM) plot text annotation of the mission names.

Fixed#

  • PR #82

    • The bug in grouped_df["llm_mission"] = mission_df["llm_mission"].str.upper() in prepare_output() was fixed. This bug caused KeyError:nan error due to mismatched index between mission_df and grouped_df when there were both the papertypes for the same bibcode in grouped_df.

    • When there is no llm output for a bibcode, but human classification exists, the output still outputs human classification in summay_output.

    • Updated metrics.py and stats.py to account human classifications but not to count when source paper is not found;

0.2.1 - 2025-09-26#

Changed#

  • PR #85

    • Changed Sphinx theme to book and updated documentation and updated docs.

0.2.0 - 2025-09-22#

Removed#

  • PR #48

    • Removed conda env file.

  • PR #9

    • Deleted test_bibcat.py

    • Deleted the same set of test gloabal variables assigned in multiple test scripts

  • PR #7

    • Deleted all previous codes and files for a fresh start

Changed#

  • PR #78

    • The combined dataset link is updated to use updated papertrack with flagship gold sample verdict

  • PR #77

    • Reorganizing the bibcat CLI commands

      • All llm-based grouped under llm sub-command

      • Batch llm commands grouped under llm batch sub-command

      • All _ or - command names shortened, e.g. run-gpt to llm run, or audit_llm to llm audit

      • Added a new ml sub-command group and moved the NLP cli commands underneath

  • PR #69

    • Expanding the list of keyword objects in parameters.py

    • Fix a bug that falsely identify mission names used in kw_mission in user_prompt, in_text, and hallucinated_by_llm in summary_output due to uppercasing mission names and the relevant tests. Missions that we pass into identify_missions_in text() need to be original case so that paper processing correctly handles ambiguous keyword phrases.

    • A minor update on user_prompt to spell out IUE

    • Pip installation updates in README.md

  • PR #68 -Inconsistent_classifications.json was revised and separated from bibcat stats-llms

    • Updated metrics_summary.json to include confusion matrix metrics

  • PR #66

    • Moved _process_database_ambig, _extract_core_from_phrase, _streamline_phrase, and _check_truematch from base.py to paper.py

    • Updated tests to read from Paper() object instead of Base() object

  • PR #64 Update ROC input and docs

  • PR #63 Refactored to use newer OpenAI Responses API and remove deprecated Assistants API.

  • PR #62 Update metrics.py and its pytest

  • PR #61 Update InfoModel response with enum

  • PR #56adds the MAST mission simple keyword text match to the user prompt

  • PR #54 Sanitizing keywords

  • PR #53 ROC curve bug fixes, add more evaluation metrics, etc

  • PR #47 New calculations for evaluation confidence values for multiple GPT runs

  • PR #46

    • Grouping the BERT model method into the pretrained folder

    • Created PRETRAINED_README.md and updated the main README.md

  • PR #29

    • Refactored the ML classifier to allow for other tensorflow models, and for adding other libraries, e.g. pytorch, down the line.

  • PR #23

    • Setting a new config for the directory of papers for operational classification with a fake JSON file

    • Refactored fetch_paper.py

    • Other relevant updates and minor updates

  • PR #22, PR #23

    • The is_keyword method is replaced with the identify_keyword method.

  • PR # 21

    • evaluate and classify are now separate CLI options.

  • PR #19 PR #20

    • get_config() error fix

    • _add_word() temporary fix

    • merger erorr fix for config parameters

  • PR #18

    • Fix ddict type errors

  • PR #16 # 17

    • Consolidated all config into a single bibcat_config.yaml YAML file.

    • Moved bibcat output to outside the package directory

    • Added support for user custom configuration and settings

    • Migrated code to use new config object, a dottable dictionary to retain the old config syntax

  • PR #14

    • fixed various type annotation errors while refactoring classify_papers.py and other related modules such as performance.py or operator.py.

    • all output results will be saved under a subdirectory of the given model run in the output directory.

    • classify_papers.py will produce both evaluation results and classification results per method, rather than combined results of both the RB and ML methods. This way will allow users to choose a classification method using CLI once CLI is enabled.

  • PR #13

    • Enabling build_model.py to be both a module and a main script.

  • PR #12

    • Refactoring build_model.py has started, the first part includes to

      • extract generate_direcotry_TVT() from core/classifiers/textdata.py to create a stand-alone module, split_dataset.py

      • modify to store the training, validation, and test (TVT) data set directories under the data/partitioned_datasets directory

    • The second part in refactoring required some relevant changes to implement the new modules and updating build_module.py accordingly.

      • build_model.py,base.py, operator.py, config.py, etc.

  • PR #11

    • Renamed create_model.py to build_model.py

    • Updated README.md

    • Updated config.py to create variables to support the new script, build_dataset.py

  • PR #10

    • Renamed test_core to core

    • Renamed test_data to data

  • PR #9

    • The test global variables are called directly in the script rather than using redundantly reassigned to other variables.

    • Moved test Keyword-object lookup variables to parameters.py

  • PR #8

    • Refactored classes.py into several individual class files below, which were moved to the new folder names, core and core/classifiers.

    • Continued formatting and styling these class files using Ruff and the line length set to 120

    • Updated module updates according to the refactoring

    • Updated CHANGELOG.MD and pyproject.toml

  • PR #7

    • Updated the main README file

    • updated formatting and styling

Added#

  • PR #83

    • added .readthedoc.yaml for the readthedoc documentation pages

  • PR #81

    • Added support for chunk planning/submission for large batches to OpenAI Batch API

  • PR #79

    • Adding new batch cli commands for submitting a batch job using OpenAI’s Batch API

    • Added new bibcat llm batch submit and bibcat llm batch retreive for submitting and retrieving batch jobs

  • PR #74 Add a bash script to run bibcat multiple batch input files serially

  • PR #68 Add a new CLI, audit-llms to create a json file to store failure modes stats and the breakdown information for failed bibcodes.

  • PR #48

    • Set up Sphinx autodoc build

  • PR #44

    • Updated LLM prompt to include its rationale and reasoning in the output

    • Switch to OpenAI Structured Response output, using pydantic models to control output

  • PR #43

    • pre-commit-hook setup

    • GitHub CI/CD action pipeline for linting/formatting and pytests

  • PR #40

    • Add stats-llm.py to output statistics results from the evaluation summary output and the operational gpt results

    • pytests (test_stats_llm.py) and llm README.md updated

  • PR #38

    • Add option to run gpt-batch multiple times

  • PR #35

    • Implement performance evaluation metrics and plots

  • PR #34

    • Added a summary output code for evaluation

  • PR #32

    • Added unit test for build_dataset.py

  • PR #31

    • Implemented ChatGPT agent prompt engineering approach to classify papers

    • Added a basic classification output

  • PR #27

    • Added a CLI option to build the combined dataset from the papertrack data and papertext (from ADS) data and refactored build_dataset.py.

    • Enabled dynamic version control

    • Readme update: clarify the workflow in Quick Start; the use of fetching papers using the do_evaluation keyword when bibcat classify and bibcat evaluate

  • PR #18

    • Added new click cli for bibcat

  • PR #14

    • Refactored classify_papers.py and created a few modules, which are called in classify_papers.py. These modules could be executed based on CLI options once they are employed.

      • fetch_papers.py : fetching papers from the dir_test data directory to the bibcat pipeline. This needs an update to fetch operational data using the dir_datasets argument in this module.

      • operate_classifier.py: the main purpose of this module is to use only one method, classify the input papers, and output classification results as a JSON file for operation.

      • evaluate_basic_performance.py : this module employes two performance functions to evaluate test paper classification and produce relevant files and a confusion matrix if a ML method is used.

    • created fakedata.txt in /bibcat/data/operational_data/ to test operational classification with simple ascii text

    • created fake_testdata.json, which has paper classification with its associated simple text, for testing and performance evaluation.

    • included additional VS code ruff setting to pyproject.toml

  • PR #12

    • The second part of refactoring build_model.py includes

      • create a new module, model_settings.py to set up various model related variables. This eventually will relocating other model related variables in config.py to this module in the near future.

      • Created streamline_dataset.py to streamline the source data equipped to be ML input dataset. It does load the source dataset and streamline the dataset.

      • Created partition_dataset.py to split the streamlined dataset into the train, validation, and test dataset for DL models.

  • PR # 11

    • Create a new script to build the input dataset. It’s called build_dataset.py

    • Added some information about the data folder in README.rst

    • Added init.py

  • PR #9

    • test_bibcat.py was refactored into several sub test scripts.

      • tests/test_core/test_base.py

      • tests/test_core/test_grammar.py

      • tests/test_core/test_keyword.py

      • tests/test_core/test_operator.py

      • tests/test_core/test_paper.py

      • tests/test_data/test_dataset.py

  • PR #8

    • Created a new folder named core to store all refactored class scripts

    • Added more description to each class script and other main scripts.

  • PR #7

    • Started with open astronomy cookiecutter template for bibcat

    • Re-organized the file structure (e.g., bibcat/bibcat/) and modified the file names

    • Refactor classes.py into several individual class script under the core directory

    • Created two main scripts

      • create_model.py : this script can be run to create a new training model

      • classify_papers.py : this script will fetch input papers, classify them into the designated paper categories, and produce performance evaluation materials such as confusion matrix and plots

    • Created CHANGELOG.md

0.1.0 - 2024-01-29#

Initial tag to preserve code before refactor