Cloud Foundry Advisory Board Meeting, November 2015
Below are technical meeting notes in chronological order. Enjoy!
UAA (Sree Tummidi, Pivotal)
- UAA Release 2.6.2 out. Authorize endpoint now supports token based authentication in addition to the regular web based authentication flow. This is used for authenticating third party SSH clients to CloudFoundry.
- UAA SAML Integration proposal out. No review comments from the community yet. We will proceed with the integration as planned.
- UAA Release 2.7.0.2 released. Contains the backwards compatibility fix for id_token and the ability to disable internal user management when LDAP/SAML stores are in use.
- Work underway for UAA 2.7.1 release with complete support for the Invitations API.
- Statsd support added for UAA metrics.
- Started the UAA & SAML Integration work to handle User Claims and expose them in the ID_Token and support to map SAML Group memberships to OAuth Scopes.
MEGA (Amit Gupta, Pivotal)
- Rewriting Consul startup scripts in Go, with unit tests, to allow us to continue to work on robustness testing for BOSH-deployed Consul.
- Finished investigating reducing the size of our default recommended EC2 and RDS instances.
- Did some credential rotation drills, and generated stories for tooling to facilitate quick, safer responses when needed (focused around Concourse).
- Shipped several versions of cf-release, latest is v 224.
- Completed work on tool to create CF deployment manifest composed of multiple releases (currently only supports AWS). Will leverage said tool, along with previous work in BOSH, in our deployment pipelines.
- Investigating Consul flakiness on OpenStack.
- Ongoing work on CATS (better debug output, support targeting Diego backend, etc.).
- Will try a third pass at having cf-release consume UAA via the uaa-release.
Loggregator (Jim Campbell, Pivotal)
- Steady progress toward TCP/TLS Metron->Doppler
- Load tests as part of this effort have been valuable at exposing some Metron edge cases.
CLI (Dies Koper, Fujitsu)
- Dies has taken over the PM responsibilities from Greg Oehmen for the CLI team after completing a “PM Dojo” on October 29.
- Two remote engineers have rolled off the team to ease scheduling of team meetings as the new PM is based in Sydney, Australia. This brings us down to two engineers. Fortunately Kris has ramped up quickly.
- Cut v6.13.0, which pulls Diego support into the main tool (i.e., the Diego plugin is no longer needed)
Expected to cut v6.14.0 November 18, which would allow Org and Space Mgrs to manage roles of their users (used to be admin-only). - Completed a number of stories to improve first impression user experience:
- Simplification of the Download page (making 64 bit installers and binaries more prominent than the edge and 32 bit releases)
- Examples on how to download them with curl
- Improved filenames (added release version in filename, etc.)
- Simplified version returned by `cf -v` and made it semver compliant
- Various bug fixes around help usage, user reported GH issues
- Upcoming are V3 API and Routing stories after a temporary drop in feature development during Thanksgiving, CF Summit Shanghai, and holiday vacations.
Routing Services Core BOSH CAPI
- All not reported upon in detail as people weren’t on the call.
- There was a mention with respect to tcp routing, tcp routing implementing multiple container reports.
- There was also a little discussion with respect to BOSH that cpi releases cannot be compiled yet, so best to avoid updating frequently as this can be a pain.
CAPI (Dieu Cao, Pivotal)
- Completed XTP with Routing on Route Services and now working on TCP Routing
- Removed route registration from CC, now using the route registrar job
- Completed adding app instance limits to space quotas
- Fixed incorrect response codes on deletes for nested end points
- Completed CLI PR to add support for purging a single service instance
- Completed switching v3 to action controller
- easier onboarding for developers use to Rails applications
- now have access to Rails framework features and documentationPlan to finish out work on support of private brokers for space developers in the next couple of weeks.
- Plan to work on detailing set of epic for CAPI to support Elastic Clusters
- Plan to work on an initial proposal for MVP support of Tasks in Cloud Controller
Diego (Eric Malm, Pivotal)
- Conducted 10K-instance, 100-cell experiment: observed acceptable times for bulk loops, workload scheduling
- Developed benchmark test suite for 200K-app data; investigating issues with etcd v2 at that scale
- Proposed work to bind-mount cached downloads, to use with buildpacks for better staging performance
- Improving cell-state response times by having executor cache Garden data
- Helped validate stability, performance of Garden-Linux transition from btrfs to aufs on loop devices
- Support staging, running images only from v2 Docker registries. Dropped support from v1 registries.
- Can supply optional whitelist of insecure Docker registries for staging. This is BOSH-configurable.
- Consolidating acceptance-test coverage: CATs now backend-configurable, plan to merge DATs into CATs
- Developing automated test suite to verify downtimeless upgrades from Diego 0.1434.0 to latest
- Various improvements to logging, metrics, manifest generation
Garden (Dr. Julz, IBM)
- Garden-Linux. Identified serious container-creation performance issues when using btrfs under load, especially when using disk quotas
- Have switched from btrfs (back) to aufs as the filesystem, using loop devices for quotas
- MVP implementation to support buildpack based apps delivered, now fleshing out features required for Docker image support
- Ran long-running and high load tests with a full diego environment to assure ourselves of stability and performance of new solution
- Bringing these performance tests into a CI to catch any regressions going forward and to give us more ability to catch these high-load/multi-factor issues early
- Still, some disk-related performance issues with aufs under load, but much better than btrfs. Current best theory is that long running cells run out of guaranteed IOPS.
Guardian
- OC S/runC
- Slow progress due to only one pair working on this part time. Hoping to get back to at least one full time pair next week.
Abacus (Dr. Max, IBM)
- Various bug fixes reported by Bluemix
- Work on common UI and dashboard started
- Update to various NPM modules
- Work on moving pipeline to Concourse started
Lattice (David Wadden, Pivotal Labs)
- Lattice v0.6.0 (Diego 0.1434.0) released
- Built with packer-bosh instead of hand-rolled Upstart scripts
- Temporarily discontinued platform(s) being reviewed for prioritization
- USER directives are honored from docker images (can also specify manually)
- ltc can define HTTP routes to fully-qualified domains and context paths 104 217
- Windows support added for Lattice in latest nightly builds
- vagrant up / terraform apply brings up Linux cluster from Windows
- ltc works from Windows
- Added ltc sync which auto-updates the CLI off the cluster
- Added ltc version which prints dependencies’ versions from the cluster
Flintstone (Simon Moser, IBM)
- jruby work completed and put on hold for now
- A question of memory use/allocation came up, but it is not focused on this now
- Discussion of API on Bluemix—memory response times, etc., identifying top 20 api calls and further performance investigation.
- Memory use/allocation not a focus.
- Focused on where to get hit the most from Bluemix
- Started to set up a SoftLayer environment.
The next monthly CAB meeting will be scheduled for Wednesday, December 9 at 8 AM Pacific Time, although it’s always a good practice to be aware that changes must sometimes be made.
Don’t forget that the Cloud Foundry Summit is coming to Shanghai (China) on December 2-3.