Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you planning to release the relevance judgements? #55

Closed
vitojph opened this issue Oct 4, 2019 · 3 comments
Closed

Are you planning to release the relevance judgements? #55

vitojph opened this issue Oct 4, 2019 · 3 comments

Comments

@vitojph
Copy link

@vitojph vitojph commented Oct 4, 2019

Hi,

Fantastic initiative, thanks a lot :-)

Are you planning to publish the relevance judgements, ie, the 4k expert relevance annotations?

@vitojph vitojph changed the title Are you planning to relase the relevance judgements? Are you planning to release the relevance judgements? Oct 4, 2019
@hamelsmu
Copy link
Member

@hamelsmu hamelsmu commented Oct 7, 2019

I am not sure we are going to release that. I'll let my colleagues chime in on that:

@mmjb @hohsiangwu @mallamanis

@mallamanis
Copy link
Collaborator

@mallamanis mallamanis commented Oct 8, 2019

Hi,
I am against publicly releasing the annotations at this point. By having them "hidden" behind the leaderboard evaluation we are in less danger of overfitting on the dataset (or someone "cheating" by looking in the test set). The test set is quite small and sooner or later solutions will start overfitting it.

Having said that, (a) I think that we should eventually release them (e.g. after a year or so) and/or (b) share them with individual when they have a good reason (e.g. an alternate use case) and they verbally agree not to share the testset further and not to use the testset for the CodeSearchNet challenge.

Let me know what you think.

@vitojph
Copy link
Author

@vitojph vitojph commented Oct 8, 2019

Hi @mallamanis! I understand your reasons to keep the annotations away from curious eyes, especially when the competition just got started. But, anyway I encourage you folks to release them in the near future in order to foster evaluation of NLP techniques applied to search engines.

AFAIK, it's quite difficult to find freely available datasets and annotations to fully evaluate information retrieval systems. TREC collection is one of them but your data collection would definitely add a lot of value for a different domain.

Thanks anyway for your effort :-)

@hamelsmu hamelsmu pinned this issue Oct 10, 2019
@hamelsmu hamelsmu closed this Oct 15, 2019
@hamelsmu hamelsmu unpinned this issue Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.