Changelog#
Unreleased#
Added#
Changed#
Fixed#
Deprecated#
Removed#
Security#
[0.2.6] - 2026-5-20#
Added#
-
Docstrings and type checking added to headers where needed.
Changed#
-
All references to do_verbose were replaced with logger.info or logger.debug.
Existing docstrings are compliant with numpydoc standards.
Removed#
[0.2.5] - 2026-3-31#
Removed#
-
Removed ML/BERT code, documentation, and tests.
[0.2.4] - 2025-12-15#
Added#
Changed#
-
Refactored check_truematch for readability
Fixed#
Unpinned tensorflow-metal version.
[0.2.2] - 2025-09-29#
Added#
-
a field_validator to ensure that the LLM output classification is only one of the allowed keywords
Changed#
-
Updated Confusion matrix (CM) plot text annotation of the mission names.
Fixed#
-
The bug in
grouped_df["llm_mission"] = mission_df["llm_mission"].str.upper()in prepare_output() was fixed. This bug caused KeyError:nan error due to mismatched index betweenmission_dfandgrouped_dfwhen there were both the papertypes for the same bibcode ingrouped_df.When there is no llm output for a bibcode, but human classification exists, the output still outputs human classification in summay_output.
Updated metrics.py and stats.py to account human classifications but not to count when source paper is not found;
0.2.1 - 2025-09-26#
Changed#
-
Changed Sphinx theme to
bookand updated documentation and updated docs.
0.2.0 - 2025-09-22#
Removed#
Changed#
-
The combined dataset link is updated to use updated papertrack with flagship gold sample verdict
-
Reorganizing the bibcat CLI commands
All llm-based grouped under
llmsub-commandBatch llm commands grouped under
llm batchsub-commandAll
_or-command names shortened, e.g.run-gpttollm run, oraudit_llmtollm auditAdded a new
mlsub-command group and moved the NLP cli commands underneath
-
Expanding the list of keyword objects in
parameters.pyFix a bug that falsely identify mission names used in
kw_missionin user_prompt,in_text, andhallucinated_by_llmin summary_output due to uppercasing mission names and the relevant tests. Missions that we pass intoidentify_missions_in text()need to be original case so that paper processing correctly handles ambiguous keyword phrases.A minor update on
user_promptto spell out IUEPip installation updates in
README.md
PR #68 -
Inconsistent_classifications.jsonwas revised and separated frombibcat stats-llmsUpdated
metrics_summary.jsonto include confusion matrix metrics
PR #64 Update ROC input and docs
PR #63 Refactored to use newer OpenAI Responses API and remove deprecated Assistants API.
PR #62 Update
metrics.pyand its pytestPR #61 Update InfoModel response with enum
PR #56adds the MAST mission simple keyword text match to the user prompt
PR #54 Sanitizing keywords
PR #53 ROC curve bug fixes, add more evaluation metrics, etc
PR #47 New calculations for evaluation confidence values for multiple GPT runs
-
Grouping the BERT model method into the pretrained folder
Created PRETRAINED_README.md and updated the main README.md
-
Refactored the ML classifier to allow for other
tensorflowmodels, and for adding other libraries, e.g.pytorch, down the line.
-
Setting a new config for the directory of papers for operational classification with a fake JSON file
Refactored
fetch_paper.pyOther relevant updates and minor updates
-
The
is_keywordmethod is replaced with theidentify_keywordmethod.
-
evaluateandclassifyare now separate CLI options.
-
get_config()error fix_add_word()temporary fixmergererorr fix for config parameters
-
Fix ddict type errors
-
Consolidated all config into a single
bibcat_config.yamlYAML file.Moved
bibcatoutput to outside the package directoryAdded support for user custom configuration and settings
Migrated code to use new
configobject, a dottable dictionary to retain the old config syntax
-
fixed various type annotation errors while refactoring
classify_papers.pyand other related modules such asperformance.pyoroperator.py.all output results will be saved under a subdirectory of the given model run in the
outputdirectory.classify_papers.py will produce both evaluation results and classification results per method, rather than combined results of both the RB and ML methods. This way will allow users to choose a classification method using CLI once CLI is enabled.
-
Enabling build_model.py to be both a module and a main script.
-
Refactoring build_model.py has started, the first part includes to
extract generate_direcotry_TVT() from
core/classifiers/textdata.pyto create a stand-alone module,split_dataset.pymodify to store the training, validation, and test (TVT) data set directories under the
data/partitioned_datasetsdirectory
The second part in refactoring required some relevant changes to implement the new modules and updating build_module.py accordingly.
build_model.py,base.py,operator.py,config.py, etc.
-
Renamed
test_coretocoreRenamed
test_datatodata
-
The test global variables are called directly in the script rather than using redundantly reassigned to other variables.
Moved test Keyword-object lookup variables to parameters.py
-
Refactored classes.py into several individual class files below, which were moved to the new folder names,
coreandcore/classifiers.core: base.py, grammar.py, keyword.py, operator.py, paper.py, performance.pycore/classifiers:_Classfier(): ClassifierBase() in textdata.py,
Classifier_ML: MachineLearningClassifier() in ml.py,
Classifier_Rules: RuleBasedClassifier() in rules.py
Continued formatting and styling these class files using
Ruffand the line length set to 120Updated module updates according to the refactoring
Updated CHANGELOG.MD and pyproject.toml
-
Updated the main README file
updated formatting and styling
Added#
-
added .readthedoc.yaml for the readthedoc documentation pages
-
Added support for chunk planning/submission for large batches to OpenAI Batch API
-
Adding new batch cli commands for submitting a batch job using OpenAI’s Batch API
Added new
bibcat llm batch submitandbibcat llm batch retreivefor submitting and retrieving batch jobs
PR #74 Add a bash script to run bibcat multiple batch input files serially
PR #68 Add a new CLI,
audit-llmsto create a json file to store failure modes stats and the breakdown information for failed bibcodes.-
Set up Sphinx autodoc build
-
Updated LLM prompt to include its rationale and reasoning in the output
Switch to OpenAI Structured Response output, using pydantic models to control output
-
pre-commit-hook setup
GitHub CI/CD action pipeline for linting/formatting and pytests
-
Add
stats-llm.pyto output statistics results from the evaluation summary output and the operational gpt resultspytests (
test_stats_llm.py) and llmREADME.mdupdated
-
Add option to run gpt-batch multiple times
-
Implement performance evaluation metrics and plots
-
Added a summary output code for evaluation
-
Added unit test for
build_dataset.py
-
Implemented ChatGPT agent prompt engineering approach to classify papers
Added a basic classification output
-
Added a CLI option to build the combined dataset from the papertrack data and papertext (from ADS) data and refactored
build_dataset.py.Enabled dynamic version control
Readme update: clarify the workflow in Quick Start; the use of fetching papers using the
do_evaluationkeyword whenbibcat classifyandbibcat evaluate
-
Added new
clickcli forbibcat
-
Refactored
classify_papers.pyand created a few modules, which are called inclassify_papers.py. These modules could be executed based on CLI options once they are employed.fetch_papers.py: fetching papers from thedir_testdata directory to the bibcat pipeline. This needs an update to fetch operational data using thedir_datasetsargument in this module.operate_classifier.py: the main purpose of this module is to use only one method, classify the input papers, and output classification results as a JSON file for operation.evaluate_basic_performance.py: this module employes two performance functions to evaluate test paper classification and produce relevant files and a confusion matrix if a ML method is used.
created
fakedata.txtin/bibcat/data/operational_data/to test operational classification with simple ascii textcreated
fake_testdata.json, which has paper classification with its associated simple text, for testing and performance evaluation.included additional VS code ruff setting to
pyproject.toml
-
The second part of refactoring
build_model.pyincludescreate a new module,
model_settings.pyto set up various model related variables. This eventually will relocating other model related variables inconfig.pyto this module in the near future.Created
streamline_dataset.pyto streamline the source data equipped to be ML input dataset. It does load the source dataset and streamline the dataset.Created
partition_dataset.pyto split the streamlined dataset into the train, validation, and test dataset for DL models.
-
Create a new script to build the input dataset. It’s called build_dataset.py
Added some information about the data folder in README.rst
Added init.py
-
test_bibcat.py was refactored into several sub test scripts.
tests/test_core/test_base.py
tests/test_core/test_grammar.py
tests/test_core/test_keyword.py
tests/test_core/test_operator.py
tests/test_core/test_paper.py
tests/test_data/test_dataset.py
-
Created a new folder named
coreto store all refactored class scriptsAdded more description to each class script and other main scripts.
-
Started with open astronomy cookiecutter template for bibcat
Re-organized the file structure (e.g., bibcat/bibcat/) and modified the file names
bibcat_classes.py to classes.py
bibcat_config.py to config.py
bibcat_parameters.py to parameters.py
bibcat_tests.py to test_bibcat.py
Refactor classes.py into several individual class script under the
coredirectoryCreated two main scripts
create_model.py : this script can be run to create a new training model
classify_papers.py : this script will fetch input papers, classify them into the designated paper categories, and produce performance evaluation materials such as confusion matrix and plots
Created CHANGELOG.md
0.1.0 - 2024-01-29#
Initial tag to preserve code before refactor