Skip to content Skip to sidebar Skip to footer

Google Cloud Dataflow Job Throws Alert After Few Hours

Running a DataFlow streaming job using 2.11.0 release. I get the following authentication error after few hours: File 'streaming_twitter.py', line 188, in File 'st

Solution 1:

Setting os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json' only works locally with the DirectRunner. Once deploying to a distributed runner like Dataflow, each worker won't be able to find the local file /tmp/key.json.

If you want each worker to use a specific service account, you can tell Beam which service account to use to identify workers.

First, grant the roles/dataflow.worker role to the service account you want your workers to use. There is no need to download the service account key file :)

Then if you're letting PipelineOptions parse your command line arguments, you can simply use the service_account_email option, and specify it like --service_account_email your-email@your-project.iam.gserviceaccount.com when running your pipeline.

The service account pointed by your GOOGLE_APPLICATION_CREDENTIALS is simply used to start the job, but each worker uses the service account specified by the service_account_email. If a service_account_email is not passed, it defaults to the email from your GOOGLE_APPLICATION_CREDENTIALS file.

Post a Comment for "Google Cloud Dataflow Job Throws Alert After Few Hours"