Uncategorized

OCR and extracting text using AWS Textract

We are developing an app that requires scanning documents and extracting key information; the extraction is then used to update the relevant user data. We looked at several technologies including AWS Textract, Azure Computer Vision, Google Lens and the open-source technology, Tesseract. All were feature-rich and have certain strengths, but in our case, the documents to be scanned are multipage and heavy on tabular & form data. Due to the amount of structured data, we decided to go with Textract.

Textract uses OCR to auto-detect printed text, handwriting, and numbers. All extracted data is returned in a polygon frame with bounding box coordinates. You can detect key-value pairs and context making it easy to import extracted data. Textract also preserves the composition of data stored in tables. This is helpful for documents that are composed of structured data, such as financial reports and medical records. You then load the extracted data using a predefined database schema. Textract can extract data with high confidence scores, whether the text is free-form or embedded in tables. Amazon Textract uses ML to understand the context of invoices and receipts and automatically extracts relevant data such as vendor, invoice #, price, total amount, and payment terms. Textract also uses ML to understand the context of identity documents such as passports and drivers’ licenses without the need for templates or configuration. When extracting information from documents, Amazon Textract returns a confidence score so you can make informed decisions about how to use the results. Amazon Textract is directly integrated with Amazon Augmented AI (A2I) so you can implement a human review process when the confidence score is low.

The UI/UX flow requires that documents are scanned using the camera. The document is uploaded to an S3 bucket. A Lambda function is invoked to call the AWS Textract API. Behind the scenes, AWS Textract processes the document and spits out a very long JSON that describes the contents, location in the document, and metadata. Along with the JSON, Textract also creates a CSV file containing all structured data. Upon completion, Textract notifies our callback function that stores the extracted structured data. We then invoke another service to run that data against our matching model, extract the data needed, and update the database.

Textract supports both synchronous and asynchronous calls. The synchronous design is to support small mostly single-page documents and we can get near real-time responses. However, we had to go with the asynchronous call since most of our documents are multiple pages. The main drawback of asynchronous processing is that it can take several minutes, negatively affecting the user experience. Breaking the document into single pages and scanning them via synchronous calls is a possibility, but there is a lot of overhead going that route.

Automating an IOS CI/CD pipeline

We are currently developing an IOS app for a client using React Native. We are using the AWS technology stack as the backend with Amplify, Aurora Serverless, Code Commit, and various other AWS services. Our DevOps team attempted to automate a continuous integration (CI) continuous delivery (CD) process and faced some challenges. I will note what we tried and the issues we faced, both financial and technical, and see if anyone has other suggestions.

The software development and DevOps teams are working remotely and distributed across multiple countries. In order to create the iOS app build, we needed a virtual Mac server to install Xcode and the required dependencies to create the build file. We looked at an AWS EC2 Instance with MacOS but the costs at $2,000/month were too high for our client. So we started looking for third-party solutions that provided IOS app build service as well as CI/CD pipelines. We looked at CodeMagic, CircleCI, and Semaphore CI and concluded that CodeMagic was the only choice because the others did not support AWS CodeCommit. Codemagic also provided both build functionality as well as pipelines to Apple Connect. Additionally, CodeMagic is free for a single user with 500 minutes of build time with Xcode running on a Mac Mini, which works fine for our initial setup and development. Since we are below that threshold, it’s free for now.

DevOps easily integrated CodeMagic with AWS Code Commit with a very simple process. But then tried to configure a webhook for automated builds but was unable to connect. CodeMagic tech support said that at the moment they do not support continuous integration when connected to AWS CodeCommit. However, our team is already working on adding the webhook support and it will be featured shortly!!

The idea was to fully automate the process but we ran into two problems. 1. Code Magic does not fully support AWS. We have to fetch the code manually. 2. AWS does not support IOS builds and pipelines. So at this point, our DevOps team is manually triggering the builds.

Update 11/28: CodeMagic is now supporting webhooks for AWS Code Commit. I will report back if this allows us to automate the build process.

Review of AWS Aurora Serverless v1

Anyone evaluating AWS Aurora Serverless v2 beta? We’ve been using v1 the last few months and had the following concerns.

1. Does not support RDS proxy leading to a high number of connections from lambda and other services
2. Lack of redundancy/failover
3. Only Supports MySQL version 5.6, which is very old
4. Event_scheduler is not supported
5. Number of connections utilize more ACU
6. Occasional timeouts during scaling
7. Instance warmup takes time

UPDATE: AWS will be releasing v2 in 2022 that will hopefully address some of these issues.