You can use papers from https://openreview.net/ as your database! Here's a helper that fetches a list of all papers from a selected conference (like ICLR, ICML, NeurIPS), queries this list to find relevant papers using LLM, and downloads those relevant papers to a local directory which can be used with paper-qa on the next step. Install openreview-py
with
and get your username and password from the website. You can put them into .env
file under OPENREVIEW_USERNAME
and OPENREVIEW_PASSWORD
variables, or pass them in the code directly.
It's been a while since we've tested this - so let us know if it runs into issues!
If you use Zotero to organize your personal bibliography, you can use the paperqa.contrib.ZoteroDB
to query papers from your library, which relies on pyzotero.
Install pyzotero
via the zotero
extra for this feature:
First, note that PaperQA2 parses the PDFs of papers to store in the database, so all relevant papers should have PDFs stored inside your database. You can get Zotero to automatically do this by highlighting the references you wish to retrieve, right clicking, and selecting "Find Available PDFs". You can also manually drag-and-drop PDFs onto each reference.
To download papers, you need to get an API key for your account.
Get your library ID, and set it as the environment variable ZOTERO_USER_ID
.
For personal libraries, this ID is given here at the part "Your userID for use in API calls is XXXXXX".
For group libraries, go to your group page https://www.zotero.org/groups/groupname
, and hover over the settings link. The ID is the integer after /groups/. (h/t pyzotero!)
Create a new API key here and set it as the environment variable ZOTERO_API_KEY
.
The key will need read access to the library.
With this, we can download papers from our library and add them to PaperQA2:
which will download the first 20 papers in your Zotero database and add them to the Docs
object.
We can also do specific queries of our Zotero library and iterate over the results:
You can read more about the search syntax by typing zotero.iterate?
in IPython.
If you want to search for papers outside of your own collection, I've found an unrelated project called paper-scraper that looks like it might help. But beware, this project looks like it uses some scraping tools that may violate publisher's rights or be in a gray area of legality.