Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent.futures.process.BrokenProcessPool #6

Closed
ajavadia opened this issue Dec 23, 2018 · 16 comments · Fixed by #24
Closed

concurrent.futures.process.BrokenProcessPool #6

ajavadia opened this issue Dec 23, 2018 · 16 comments · Fixed by #24

Comments

@ajavadia
Copy link
Member

original issue from @tigerjack: Qiskit/qiskit#1590

Informations

  • Qiskit Terra version: 0.7.0
  • Python version: 3.7.1
  • Operating system: ArchLinux

What is the current behavior?

I got an exception on two different machines, but I'm not sure if it's related to qiskit-terra or python itself. The exception arise when I invoke the job.result() method.

File "/home/simone/LinuxData/virtualenvs/qiskit_env/lib/python3.7/site-packages/qiskit/providers/aer/aerjob.py", line 39, in _wrapper
return func(self, *args, **kwargs)
File "/home/simone/LinuxData/virtualenvs/qiskit_env/lib/python3.7/site-packages/qiskit/providers/aer/aerjob.py", line 98, in result
return self._future.result(timeout=timeout)
File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I found other similar issue which suggested to use the if __name__ == "__main__": statement, but this is already there: the whole program code is under this statement.

Steps to reproduce the problem

The problem appears randomly when I run my qiskit program. It's a huge project, and sometimes the program works seamlessly, while others fail with this message.

What is the expected behavior?

The program should always run without problems.

@chriseclectic
Copy link
Member

Does this only happen with the Aer simulators, or also with the python simulators when run on linux?

@tigerjack
Copy link

@chriseclectic what do you mean by python simulators? statevector_simulator and unitary_simulator? The error appears also with those.
I was thinking that maybe the error is related to the number of qubits; at the moment, my algorithm uses 34 qubits, and from the qasm page I read that I need 275GB of memory. I tried also to use the online qasm simulator, but even in this case I get the error

qiskit.providers.exceptions.JobError: 'Invalid job state. The job should be DONE but it is JobStatus.ERROR'

@nonhermitian
Copy link
Contributor

You would need 256Gb for 34 qubits, just to store the state vector. If using the state vector simulator, there is also a copy being made from the c-array that holds the state vector to the nested-list format in the Result object. So this would be doubled. Using the Unitary simulator would reduce the number of qubits allowed to 15 for the same memory size, and there is still a copy. Regardless, you also need to run your OS, store the quantum circuit, etc at the same time.

@tigerjack
Copy link

@nonhermitian Yes, I'm aware of that and with some other tricks I reduced the number to 33, i.e. 137 GB. However, do you think that the error message is related to this? Do you know which is the maximum number of qubits available for the online qasm simulator?

@nonhermitian
Copy link
Contributor

The online sim is 32. It could be related to memory, but in that case I would expect your computer to first freeze up as the computer starts to utilize swap space. Does this occur?

@tigerjack
Copy link

@nonhermitian Never, the process is really fast even on the server. I mean, I get the error just a few seconds after the launch of the algorithm on the local backend.

@nonhermitian
Copy link
Contributor

Then it is likely not memory related.

@tigerjack
Copy link

@nonhermitian I don't know. Is it possible that there is some kind of mechanism in the Aer provider or in python itself which prevent executions when there is not enough memory?

@nonhermitian
Copy link
Contributor

In Python, no. In Aer, probably not. Try running your favorite resource manager, top for example, and see what the memory allocation is doing.

@chriseclectic
Copy link
Member

By python simulators I am referring to the BasicAer provider simulators in qiskit-terra. If it happens with them as well it may it is likely an issue with Python/ProcessPool or the BaseJob classes themselves.

Note that for the BasisAer provider simulators I added checks that will throw an exception before starting the simulation if the number of qubits is greater than can be stored in physical memory, but that check isn't on the Aer provider simulators at the moment. What is the available RAM on the system you are trying to run on?

@tigerjack
Copy link

@nonhermitian nothing relevant, it doesn't seems to even allocate memory.

@chriseclectic as you said, because of the check, the error is different. I now have a circuit with 33 qubits, but from the error message the qasm_simulator of BasicAer has a maximum of 24 qubits.
Btw, the server has 126 GB of RAM, no swap enabled, so it shouldn't be able to run the 33 qubits circuit either.

@nonhermitian
Copy link
Contributor

So the AerJob issue is not a memory issue. It is breaking before the simulator starts to allocate space.

The memory needed for storing a raw state vector with double precision complex numbers is:

(2**n_qubits)*16/(1024**3)

so with 128Gb you should be able to do 33 provided that nothing else is using any memory.

@tigerjack
Copy link

tigerjack commented Jan 3, 2019

@nonhermitian From here I read 137 GB, so maybe it's not possible even in this case. I'll try to optimize the algorithm even more, but I'm not sure that the memory could be the problem here. As I was saying, the used memory doesn't seem to even grow a little: the program just crash with the error above, as if a thread was killed beforehand.

@chriseclectic
Copy link
Member

Actually the python BasicAer simulators only support a maximum of 24 qubits regardless of available memory.

I would try testing (on the Aer simulator) with a some random circuit that uses less qubits (start at 29 or 30) and incrementally increase the number to see when this error first happens.

@tigerjack
Copy link

tigerjack commented Jan 3, 2019

@chriseclectic the critical point on the server with 125 GiB of RAM and no swap seems to be 32. With 33 qubits the same error appears again.
On my system with just 4 GiB of RAM and 9 GiB of swap the error starts to appear with 30 qubits, while 29 of them just freeze after consuming all the memory.
This results seems to be in line with the requirements specified in the qasm documentation, so in the end it seems to be a memory-related problem.

@nonhermitian
Copy link
Contributor

Indeed it does. Incidentally, a search confirms this:

nipreps/fmriprep#1207

Although many things could trigger this exception, perhaps grabbing it and returning a possible memory warning would help.

eladgoldman pushed a commit to eladgoldman/qiskit-aer that referenced this issue May 14, 2019
 reverse string of matrices for correct expectation value
dcmckayibm added a commit to dcmckayibm/qiskit-aer that referenced this issue Nov 3, 2019
derwind pushed a commit to derwind/qiskit-aer that referenced this issue Nov 23, 2022
Patataman added a commit to Patataman/qiskit-aer that referenced this issue Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants