Word Embeddings Class Added #2198

sprintyaf · 2020-07-13T12:00:23Z


        word embeddings added

TravisBuddy · 2020-07-13T12:03:06Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: ceec83a0-c500-11ea-af19-3b271fec1f42


        spelling error

TravisBuddy · 2020-07-13T12:07:24Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 66eb3070-c501-11ea-af19-3b271fec1f42


        requirements updated

TravisBuddy · 2020-07-13T12:12:20Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 187097e0-c502-11ea-af19-3b271fec1f42


        small changes

TravisBuddy · 2020-07-13T12:16:47Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: b7c021d0-c502-11ea-af19-3b271fec1f42


        small changes vol2

TravisBuddy · 2020-07-13T12:39:33Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: e5e2b610-c505-11ea-af19-3b271fec1f42

cclauss

Please read CONTRIBUTING.md.

natural_language_processing/word_embeddings/usage.py

natural_language_processing/word_embeddings/word_embeddings.py


        small changes vol3

natural_language_processing/word_embeddings/word_embeddings.py

TravisBuddy · 2020-07-13T12:52:51Z

Hey @sprintyaf,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: c19c4b70-c507-11ea-af19-3b271fec1f42

cclauss · 2020-07-13T13:31:13Z

You might run psf/black on this code because Travis CI says:

$ flake8 --ignore=E203,W503 --max-complexity=25 --max-line-length=88 --statistics --count .

./natural_language_processing/word_embeddings/usage.py:36:43: E261 at least two spaces before inline comment
./natural_language_processing/word_embeddings/word_embeddings.py:28:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:33:89: E501 line too long (125 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:38:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:44:89: E501 line too long (92 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:49:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:54:89: E501 line too long (100 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:59:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:69:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:91:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:98:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:106:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:115:89: E501 line too long (94 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:119:5: E303 too many blank lines (2)
./natural_language_processing/word_embeddings/word_embeddings.py:122:68: W291 trailing whitespace
./natural_language_processing/word_embeddings/word_embeddings.py:136:72: E251 unexpected spaces around keyword / parameter equals
./natural_language_processing/word_embeddings/word_embeddings.py:136:74: E251 unexpected spaces around keyword / parameter equals
./natural_language_processing/word_embeddings/word_embeddings.py:136:89: E501 line too long (93 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:144:89: E501 line too long (118 > 88 characters)
./natural_language_processing/word_embeddings/word_embeddings.py:152:5: E303 too many blank lines (2)
2     E251 unexpected spaces around keyword / parameter equals
1     E261 at least two spaces before inline comment
10    E303 too many blank lines (2)
6     E501 line too long (125 > 88 characters)
1     W291 trailing whitespace
20


        flake8 tests passed

cclauss · 2020-07-13T13:40:21Z

Please avoid backslash line termination as discussed in PEP8 because a whitespace character to the right of the backslash can break the script on a change that is invisibile to the user.

TravisBuddy · 2020-07-13T13:42:58Z

Travis tests have failed

Hey @sprintyaf,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: c152e230-c50e-11ea-af19-3b271fec1f42


        stopword downnload

TravisBuddy · 2020-07-13T13:57:55Z

Travis tests have failed

Hey @sprintyaf,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: d8000510-c510-11ea-af19-3b271fec1f42


        backslashes removed

TravisBuddy · 2020-07-13T14:13:15Z

Travis tests have failed

Hey @sprintyaf,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: fc302bc0-c512-11ea-af19-3b271fec1f42

sprintyaf · 2020-07-13T14:16:54Z

Hi @cclauss,
First of all thanks for your support.
This algorithm is not deterministic and depends on the data. That's why I can't predict what answer would it return. I don't think that doctests are possible in this case.


        without doctests

cclauss · 2020-07-13T14:27:20Z

assert 70 <= my_function() <= 100, "my_function() should return a result > 70"

sprintyaf · 2020-07-13T14:32:12Z

assert 70 <= my_function() <= 100, "my_function() should return a result > 70"

Agree, but my functions return words. It can return any word. You can't test them like this.

cclauss · 2020-07-13T14:48:13Z

closest_words() can return different results with the same input data??

natural_language_processing/word_embeddings/word_embeddings.py

cclauss · 2020-07-13T14:57:02Z

Please add tests to the deterministic functions and place a comment explaining why a doctest is impossible on the non deterministic functions.


        about doctests + small changes

ruppysuppy · 2020-07-15T05:06:00Z

natural_language_processing/word_embeddings/word_embeddings.py

@@ -33,8 +37,7 @@ def analogy(self, x1: str, x2: str, y1: str) -> str:
        x1, x2, y1 = x1.lower(), x2.lower(), y1.lower()
        error_msg = 'every word must be in the vocabulary'
        assert all(self.is_in_vocab(x) for x in (x1, x2, y1)), error_msg
-        result = self._wv.most_similar(positive=[y1, x2], negative=[x1])[0][0]
-        return result
+        return self._wv.most_similar(positive=[y1, x2], negative=[x1])[0][0]

    def n_similarity(self, list1: list, list2: list) -> float:


Add proper type hints in the functions (see here)

Please add proper type hints in the functions...

ruppysuppy · 2020-07-15T05:06:00Z

natural_language_processing/word_embeddings/word_embeddings.py

@@ -54,8 +56,7 @@ def similarity(self, w1: str, w2: str) -> float:
        w1, w2 = w1.lower(), w2.lower()
        error_msg = 'both words must be in the vocabulary'
        assert self.is_in_vocab(w1) and self.is_in_vocab(w2), error_msg


If the assertions are for tests, use the doctest format (see here)

These assertions are for runtime checks... If the user gives me garbage then I should complain. These garbage values should also be tested in the doctests.


        type hints corrected, doctests and pretrained model added

TravisBuddy · 2020-07-16T16:15:25Z

Travis tests have failed

Hey @sprintyaf,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 8bfb7cf0-c77f-11ea-999c-bf5bae978323

sprintyaf · 2020-07-16T16:20:52Z

@cclauss @ruppysuppy
I added type hints and doctests. For doctests I needed to upload a pretrained model. But it looks like it's not visible during testing. How could I fix this?

cclauss · 2020-08-14T05:42:07Z

natural_language_processing/word_embeddings/word_embeddings.py

+                parsed = WordVectors._parse_document(text)
+                documents.append(parsed)
+            except IOError:
+                continue


Please put lines 166 to 173 in a separate function and then do
documents = [get_document_from_file(filename) for filename in glob.glob(full_path)]

cclauss · 2020-08-14T05:43:24Z

requirements.txt

@@ -17,3 +17,5 @@ sklearn
 sympy
 tensorflow
 xgboost


Please insert requirements in alphabetical order to reduce the likelihood of duplicate entries.

word embeddings added

Loading status checks…

f067b0a

spelling error

Loading status checks…

c44bb0b

requirements updated

Loading status checks…

5ad5be4

small changes

Loading status checks…

5630c8a

small changes vol2

Loading status checks…

84b9b54

cclauss requested changes Jul 13, 2020

View changes

small changes vol3

Loading status checks…

7b93ac4

cclauss reviewed Jul 13, 2020

View changes

natural_language_processing/word_embeddings/word_embeddings.py Outdated Show resolved Hide resolved

flake8 tests passed

Loading status checks…

fb3e148

stopword downnload

Loading status checks…

73f1650

backslashes removed

Loading status checks…

417b029

without doctests

Loading status checks…

a0cfa7a

cclauss reviewed Jul 13, 2020

View changes

natural_language_processing/word_embeddings/word_embeddings.py Outdated Show resolved Hide resolved

cclauss reviewed Jul 13, 2020

View changes

natural_language_processing/word_embeddings/word_embeddings.py Outdated Show resolved Hide resolved

cclauss reviewed Jul 13, 2020

View changes

natural_language_processing/word_embeddings/word_embeddings.py Outdated Show resolved Hide resolved

cclauss reviewed Jul 13, 2020

View changes

natural_language_processing/word_embeddings/word_embeddings.py Show resolved Hide resolved

about doctests + small changes

Loading status checks…

bb548ba

ruppysuppy reviewed Jul 15, 2020

View changes

type hints corrected, doctests and pretrained model added

Loading status checks…

8345ce1

cclauss reviewed Aug 14, 2020

View changes

requirements.txt

@@ -17,3 +17,5 @@ sklearn

sympy

tensorflow

xgboost

This comment has been minimized.

Sign in to view

cclauss Aug 14, 2020
Member

Please insert requirements in alphabetical order to reduce the likelihood of duplicate entries.

TheAlgorithms / Python

Word Embeddings Class Added #2198

Word Embeddings Class Added #2198

sprintyaf commented Jul 13, 2020 •

edited

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

cclauss left a comment

TravisBuddy commented Jul 13, 2020

cclauss commented Jul 13, 2020 •

edited

cclauss commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

sprintyaf commented Jul 13, 2020

cclauss commented Jul 13, 2020

sprintyaf commented Jul 13, 2020

cclauss commented Jul 13, 2020 •

edited

cclauss commented Jul 13, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

TravisBuddy commented Jul 16, 2020

sprintyaf commented Jul 16, 2020

This comment has been minimized.

This comment has been minimized.

TheAlgorithms / Python

Join GitHub today

Word Embeddings Class Added #2198

Word Embeddings Class Added #2198

Conversation

sprintyaf commented Jul 13, 2020 • edited

Checklist:

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: ceec83a0-c500-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: 66eb3070-c501-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: 187097e0-c502-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: b7c021d0-c502-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: e5e2b610-c505-11ea-af19-3b271fec1f42

cclauss left a comment

TravisBuddy commented Jul 13, 2020

TravisBuddy Request Identifier: c19c4b70-c507-11ea-af19-3b271fec1f42

cclauss commented Jul 13, 2020 • edited

cclauss commented Jul 13, 2020

TravisBuddy commented Jul 13, 2020

Travis tests have failed

TravisBuddy Request Identifier: c152e230-c50e-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

Travis tests have failed

TravisBuddy Request Identifier: d8000510-c510-11ea-af19-3b271fec1f42

TravisBuddy commented Jul 13, 2020

Travis tests have failed

TravisBuddy Request Identifier: fc302bc0-c512-11ea-af19-3b271fec1f42

sprintyaf commented Jul 13, 2020

cclauss commented Jul 13, 2020

sprintyaf commented Jul 13, 2020

cclauss commented Jul 13, 2020 • edited

cclauss commented Jul 13, 2020

This comment has been minimized.

ruppysuppy Jul 15, 2020

This comment has been minimized.

cclauss Jul 15, 2020 • edited

This comment has been minimized.

ruppysuppy Jul 15, 2020

This comment has been minimized.

cclauss Jul 15, 2020

TravisBuddy commented Jul 16, 2020

Travis tests have failed

TravisBuddy Request Identifier: 8bfb7cf0-c77f-11ea-999c-bf5bae978323

sprintyaf commented Jul 16, 2020

This comment has been minimized.

cclauss Aug 14, 2020

This comment has been minimized.

cclauss Aug 14, 2020

sprintyaf commented Jul 13, 2020 •

edited

cclauss commented Jul 13, 2020 •

edited

cclauss commented Jul 13, 2020 •

edited

cclauss Jul 15, 2020 •

edited